You are here
Studies on information-theoretics based data-sequence pattern-discriminant algorithms: Applications in bioinformatic data mining
- Date Issued:
- 2003
- Summary:
- This research refers to studies on information-theoretic (IT) aspects of data-sequence patterns and developing thereof discriminant algorithms that enable distinguishing the features of underlying sequence patterns having characteristic, inherent stochastical attributes. The application potentials of such algorithms include bioinformatic data mining efforts. Consistent with the scope of the study as above, considered in this research are specific details on information-theoretics and entropy considerations vis-a-vis sequence patterns (having stochastical attributes) such as DNA sequences of molecular biology. Applying information-theoretic concepts (essentially in Shannon's sense), the following distinct sets of metrics are developed and applied in the algorithms developed for data-sequence pattern-discrimination applications: (i) Divergence or cross-entropy algorithms of Kullback-Leibler type and of general Czizar class; (ii) statistical distance measures; (iii) ratio-metrics; (iv) Fisher type linear-discriminant measure and (v) complexity metric based on information redundancy. These measures are judiciously adopted in ascertaining codon-noncodon delineations in DNA sequences that consist of crisp and/or fuzzy nucleotide domains across their chains. The Fisher measure is also used in codon-noncodon delineation and in motif detection. Relevant algorithms are used to test DNA sequences of human and some bacterial organisms. The relative efficacy of the metrics and the algorithms is determined and discussed. The potentials of such algorithms in supplementing the prevailing methods are indicated. Scope for future studies is identified in terms of persisting open questions.
Title: | Studies on information-theoretics based data-sequence pattern-discriminant algorithms: Applications in bioinformatic data mining. |
5071 views
20 downloads |
---|---|---|
Name(s): |
Arredondo, Tomas Vidal. Florida Atlantic University, Degree grantor Neelakanta, Perambur S., Thesis advisor College of Engineering and Computer Science Department of Computer and Electrical Engineering and Computer Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Issuance: | monographic | |
Date Issued: | 2003 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 395 p. | |
Language(s): | English | |
Summary: | This research refers to studies on information-theoretic (IT) aspects of data-sequence patterns and developing thereof discriminant algorithms that enable distinguishing the features of underlying sequence patterns having characteristic, inherent stochastical attributes. The application potentials of such algorithms include bioinformatic data mining efforts. Consistent with the scope of the study as above, considered in this research are specific details on information-theoretics and entropy considerations vis-a-vis sequence patterns (having stochastical attributes) such as DNA sequences of molecular biology. Applying information-theoretic concepts (essentially in Shannon's sense), the following distinct sets of metrics are developed and applied in the algorithms developed for data-sequence pattern-discrimination applications: (i) Divergence or cross-entropy algorithms of Kullback-Leibler type and of general Czizar class; (ii) statistical distance measures; (iii) ratio-metrics; (iv) Fisher type linear-discriminant measure and (v) complexity metric based on information redundancy. These measures are judiciously adopted in ascertaining codon-noncodon delineations in DNA sequences that consist of crisp and/or fuzzy nucleotide domains across their chains. The Fisher measure is also used in codon-noncodon delineation and in motif detection. Relevant algorithms are used to test DNA sequences of human and some bacterial organisms. The relative efficacy of the metrics and the algorithms is determined and discussed. The potentials of such algorithms in supplementing the prevailing methods are indicated. Scope for future studies is identified in terms of persisting open questions. | |
Identifier: | 9780496568734 (isbn), 12057 (digitool), FADT12057 (IID), fau:8970 (fedora) | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): |
College of Engineering and Computer Science Thesis (Ph.D.)--Florida Atlantic University, 2003. |
|
Subject(s): |
Data mining Bioinformatics Discriminant analysis Information theory in biology |
|
Held by: | Florida Atlantic University Libraries | |
Sublocation: | Digital Library | |
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FADT12057 | |
Use and Reproduction: | Copyright © is held by the author, with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |