You are here

Statistical physics inspired methods to assign statistical significance in bioinformatics and proteomics: From sequence comparison to mass spectrometry based peptide sequencing

Download pdf | Full Screen View

Date Issued:
2006
Summary:
After the sequencing of many complete genomes, we are in a post-genomic era in which the most important task has changed from gathering genetic information to organizing the mass of data as well as under standing how components interact with each other. The former is usually undertaking using bioinformatics methods, while the latter task is generally termed proteomics. Success in both parts demands correct statistical significance assignments for results found. In my dissertation. I study two concrete examples: global sequence alignment statistics and peptide sequencing/identification using mass spectrometry. High-performance liquid chromatography coupled to a mass spectrometer (HPLC/MS/MS), enabling peptide identifications and thus protein identifications, has become the tool of choice in large-scale proteomics experiments. Peptide identification is usually done by database searches methods. The lack of robust statistical significance assignment among current methods motivated the development of a novel de novo algorithm, RAId, whose score statistics then provide statistical significance for high scoring peptides found in our custom, enzyme-digested peptide library. The ease of incorporating post-translation modifications is another important feature of RAId. To organize the massive protein/DNA data accumulated, biologists often cluster proteins according to their similarity via tools such as sequence alignment. Homologous proteins share similar domains. To assess the similarity of two domains usually requires alignment from head to toe, ie. a global alignment. A good alignment score statistics with an appropriate null model enable us to distinguish the biologically meaningful similarity from chance similarity. There has been much progress in local alignment statistics, which characterize score statistics when alignments tend to appear as a short segment of the whole sequence. For global alignment, which is useful in domain alignment, there is still much room for exploration/improvement. Here we present a variant of the direct polymer problem in random media (DPRM) to study the score distribution of global alignment. We have demonstrate that upon proper transformation the score statistics can be characterized by Tracy-Widom distributions, which correspond to the distributions for the largest eigenvalue of various ensembles of random matrices.
Title: Statistical physics inspired methods to assign statistical significance in bioinformatics and proteomics: From sequence comparison to mass spectrometry based peptide sequencing.
117 views
31 downloads
Name(s): Alves, Gelio
Florida Atlantic University, Degree Grantor
Yu, Yi-Kuo, Thesis Advisor
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Issuance: monographic
Date Issued: 2006
Publisher: Florida Atlantic University
Place of Publication: Boca Raton, Fla.
Physical Form: application/pdf
Extent: 149 p.
Language(s): English
Summary: After the sequencing of many complete genomes, we are in a post-genomic era in which the most important task has changed from gathering genetic information to organizing the mass of data as well as under standing how components interact with each other. The former is usually undertaking using bioinformatics methods, while the latter task is generally termed proteomics. Success in both parts demands correct statistical significance assignments for results found. In my dissertation. I study two concrete examples: global sequence alignment statistics and peptide sequencing/identification using mass spectrometry. High-performance liquid chromatography coupled to a mass spectrometer (HPLC/MS/MS), enabling peptide identifications and thus protein identifications, has become the tool of choice in large-scale proteomics experiments. Peptide identification is usually done by database searches methods. The lack of robust statistical significance assignment among current methods motivated the development of a novel de novo algorithm, RAId, whose score statistics then provide statistical significance for high scoring peptides found in our custom, enzyme-digested peptide library. The ease of incorporating post-translation modifications is another important feature of RAId. To organize the massive protein/DNA data accumulated, biologists often cluster proteins according to their similarity via tools such as sequence alignment. Homologous proteins share similar domains. To assess the similarity of two domains usually requires alignment from head to toe, ie. a global alignment. A good alignment score statistics with an appropriate null model enable us to distinguish the biologically meaningful similarity from chance similarity. There has been much progress in local alignment statistics, which characterize score statistics when alignments tend to appear as a short segment of the whole sequence. For global alignment, which is useful in domain alignment, there is still much room for exploration/improvement. Here we present a variant of the direct polymer problem in random media (DPRM) to study the score distribution of global alignment. We have demonstrate that upon proper transformation the score statistics can be characterized by Tracy-Widom distributions, which correspond to the distributions for the largest eigenvalue of various ensembles of random matrices.
Identifier: 9780542565601 (isbn), 12194 (digitool), FADT12194 (IID), fau:9101 (fedora)
Note(s): Thesis (Ph.D.)--Florida Atlantic University, 2006.
Subject(s): Molecular biology--Data processing
Bioinformatics
Proteomics
Genomics
Held by: Florida Atlantic University Libraries
Persistent Link to This Record: http://purl.flvc.org/fcla/dt/12194
Sublocation: Digital Library
Use and Reproduction: Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU
Is Part of Series: Florida Atlantic University Digital Library Collections.