Current Search: Yu, Yi-Kuo (x)
View All Items
- Title
- Gapless alignment revisited.
- Creator
- Raley, Elizabeth Anne, Florida Atlantic University, Yu, Yi-Kuo
- Abstract/Description
-
The purpose of sequence alignment is to detect mutual similarity, characterized by the so-called "alignment score", between sequences compared. To quantitatively assess the confidence level of an alignment result requires the knowledge of alignment score statistics under a certain null model and is the central issue in sequence alignment. In this thesis, the score statistics of Markov null model were revisited and the score statistics of non-Markov null model were investigated for two state...
Show moreThe purpose of sequence alignment is to detect mutual similarity, characterized by the so-called "alignment score", between sequences compared. To quantitatively assess the confidence level of an alignment result requires the knowledge of alignment score statistics under a certain null model and is the central issue in sequence alignment. In this thesis, the score statistics of Markov null model were revisited and the score statistics of non-Markov null model were investigated for two state-of-the-art algorithms, namely, the gapless Smith-Waterman and Hybrid algorithms. These two algorithms were further used to find highly related signals in unrelated sequences and in weakly related sequences corresponding, respectively, to Markov null model and non-Markov null model. The confidence levels of these models were also studied. Since the sequence similarity we are interested in comes from evolutionary history, we also investigated the relationship between sequence alignment, the tool to find similarity, and evolution. The average evolution distance between the daughter sequences was found and compared with their expected values, for individual trees and as an average over many trees.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12856
- Subject Headings
- Bioinformatics, Amino acid sequence--Databases, Markov processes
- Format
- Document (PDF)
- Title
- Statistical physics inspired methods to assign statistical significance in bioinformatics and proteomics: From sequence comparison to mass spectrometry based peptide sequencing.
- Creator
- Alves, Gelio, Florida Atlantic University, Yu, Yi-Kuo
- Abstract/Description
-
After the sequencing of many complete genomes, we are in a post-genomic era in which the most important task has changed from gathering genetic information to organizing the mass of data as well as under standing how components interact with each other. The former is usually undertaking using bioinformatics methods, while the latter task is generally termed proteomics. Success in both parts demands correct statistical significance assignments for results found. In my dissertation. I study two...
Show moreAfter the sequencing of many complete genomes, we are in a post-genomic era in which the most important task has changed from gathering genetic information to organizing the mass of data as well as under standing how components interact with each other. The former is usually undertaking using bioinformatics methods, while the latter task is generally termed proteomics. Success in both parts demands correct statistical significance assignments for results found. In my dissertation. I study two concrete examples: global sequence alignment statistics and peptide sequencing/identification using mass spectrometry. High-performance liquid chromatography coupled to a mass spectrometer (HPLC/MS/MS), enabling peptide identifications and thus protein identifications, has become the tool of choice in large-scale proteomics experiments. Peptide identification is usually done by database searches methods. The lack of robust statistical significance assignment among current methods motivated the development of a novel de novo algorithm, RAId, whose score statistics then provide statistical significance for high scoring peptides found in our custom, enzyme-digested peptide library. The ease of incorporating post-translation modifications is another important feature of RAId. To organize the massive protein/DNA data accumulated, biologists often cluster proteins according to their similarity via tools such as sequence alignment. Homologous proteins share similar domains. To assess the similarity of two domains usually requires alignment from head to toe, ie. a global alignment. A good alignment score statistics with an appropriate null model enable us to distinguish the biologically meaningful similarity from chance similarity. There has been much progress in local alignment statistics, which characterize score statistics when alignments tend to appear as a short segment of the whole sequence. For global alignment, which is useful in domain alignment, there is still much room for exploration/improvement. Here we present a variant of the direct polymer problem in random media (DPRM) to study the score distribution of global alignment. We have demonstrate that upon proper transformation the score statistics can be characterized by Tracy-Widom distributions, which correspond to the distributions for the largest eigenvalue of various ensembles of random matrices.
Show less - Date Issued
- 2006
- PURL
- http://purl.flvc.org/fcla/dt/12194
- Subject Headings
- Molecular biology--Data processing, Bioinformatics, Proteomics, Genomics
- Format
- Document (PDF)