Current Search: Shanab, Ahmad Abu (x)
-
-
Title
-
Analysis of machine learning algorithms on bioinformatics data of varying quality.
-
Creator
-
Shanab, Ahmad Abu, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
-
Abstract/Description
-
One of the main applications of machine learning in bioinformatics is the construction of classification models which can accurately classify new instances using information gained from previous instances. With the help of machine learning algorithms (such as supervised classification and gene selection) new meaningful knowledge can be extracted from bioinformatics datasets that can help in disease diagnosis and prognosis as well as in prescribing the right treatment for a disease. One...
Show moreOne of the main applications of machine learning in bioinformatics is the construction of classification models which can accurately classify new instances using information gained from previous instances. With the help of machine learning algorithms (such as supervised classification and gene selection) new meaningful knowledge can be extracted from bioinformatics datasets that can help in disease diagnosis and prognosis as well as in prescribing the right treatment for a disease. One particular challenge encountered when analyzing bioinformatics datasets is data noise, which refers to incorrect or missing values in datasets. Noise can be introduced as a result of experimental errors (e.g. faulty microarray chips, insufficient resolution, image corruption, and incorrect laboratory procedures), as well as other errors (errors during data processing, transfer, and/or mining). A special type of data noise called class noise, which occurs when an instance/example is mislabeled. Previous research showed that class noise has a detrimental impact on machine learning algorithms (e.g. worsened classification performance and unstable feature selection). In addition to data noise, gene expression datasets can suffer from the problems of high dimensionality (a very large feature space) and class imbalance (unequal distribution of instances between classes). As a result of these inherent problems, constructing accurate classification models becomes more challenging.
Show less
-
Date Issued
-
2015
-
PURL
-
http://purl.flvc.org./fau/fd/FA00004425, http://purl.flvc.org/fau/fd/FA00004425
-
Subject Headings
-
Artificial intelligence, Bioinformatics, Machine learning, System design, Theory of computation
-
Format
-
Document (PDF)