You are here
Ensemble Learning Algorithms for the Analysis of Bioinformatics Data
- Date Issued:
- 2016
- Summary:
- Developments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset size, noisy data, and complexity of data in terms of hard to distinguish decision boundaries between classes within the data. In recognition of the aforementioned challenges, this dissertation utilizes a variety of machine-learning and data-mining techniques, such as ensemble classification algorithms in conjunction with data sampling and feature selection techniques to alleviate these problems, while improving the classification results of models built on these datasets. However, in building classification models researchers and practitioners encounter the challenge that there is not a single classifier that performs relatively well in all cases. Thus, numerous classification approaches, such as ensemble learning methods, have been developed to address this problem successfully in a majority of circumstances. Ensemble learning is a promising technique that generates multiple classification models and then combines their decisions into a single final result. Ensemble learning often performs better than single-base classifiers in performing classification tasks. This dissertation conducts thorough empirical research by implementing a series of case studies to evaluate how ensemble learning techniques can be utilized to enhance overall classification performance, as well as improve the generalization ability of ensemble models. This dissertation investigates ensemble learning techniques of the boosting, bagging, and random forest algorithms, and proposes a number of modifications to the existing ensemble techniques in order to improve further the classification results. This dissertation examines the effectiveness of ensemble learning techniques on accounting for challenging characteristics of class imbalance and difficult-to-learn class decision boundaries. Next, it looks into ensemble methods that are relatively tolerant to class noise, and not only can account for the problem of class noise, but improves classification performance. This dissertation also examines the joint effects of data sampling along with ensemble techniques on whether sampling techniques can further improve classification performance of built ensemble models.
Title: | Ensemble Learning Algorithms for the Analysis of Bioinformatics Data. |
283 views
151 downloads |
---|---|---|
Name(s): |
Fazelpour, Alireza, author Khoshgoftaar, Taghi M., Thesis advisor Florida Atlantic University, Degree grantor College of Engineering and Computer Science Department of Computer and Electrical Engineering and Computer Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Date Created: | 2016 | |
Date Issued: | 2016 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 177 p. | |
Language(s): | English | |
Summary: | Developments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset size, noisy data, and complexity of data in terms of hard to distinguish decision boundaries between classes within the data. In recognition of the aforementioned challenges, this dissertation utilizes a variety of machine-learning and data-mining techniques, such as ensemble classification algorithms in conjunction with data sampling and feature selection techniques to alleviate these problems, while improving the classification results of models built on these datasets. However, in building classification models researchers and practitioners encounter the challenge that there is not a single classifier that performs relatively well in all cases. Thus, numerous classification approaches, such as ensemble learning methods, have been developed to address this problem successfully in a majority of circumstances. Ensemble learning is a promising technique that generates multiple classification models and then combines their decisions into a single final result. Ensemble learning often performs better than single-base classifiers in performing classification tasks. This dissertation conducts thorough empirical research by implementing a series of case studies to evaluate how ensemble learning techniques can be utilized to enhance overall classification performance, as well as improve the generalization ability of ensemble models. This dissertation investigates ensemble learning techniques of the boosting, bagging, and random forest algorithms, and proposes a number of modifications to the existing ensemble techniques in order to improve further the classification results. This dissertation examines the effectiveness of ensemble learning techniques on accounting for challenging characteristics of class imbalance and difficult-to-learn class decision boundaries. Next, it looks into ensemble methods that are relatively tolerant to class noise, and not only can account for the problem of class noise, but improves classification performance. This dissertation also examines the joint effects of data sampling along with ensemble techniques on whether sampling techniques can further improve classification performance of built ensemble models. | |
Identifier: | FA00004588 (IID) | |
Degree granted: | Dissertation (Ph.D.)--Florida Atlantic University, 2016. | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): | Includes bibliography. | |
Subject(s): |
Bioinformatics. Data mining -- Technological innovations. Machine learning. |
|
Held by: | Florida Atlantic University Libraries | |
Sublocation: | Digital Library | |
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FA00004588 | |
Use and Reproduction: | Copyright © is held by the author, with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |