Current Search: Computational learning theory (x)
View All Items
- Title
- An empirical study of combining techniques in software quality classification.
- Creator
- Eroglu, Cemal., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
In the literature, there has been limited research that systematically investigates the possibility of exercising a hybrid approach by simply learning from the output of numerous base-level learners. We analyze a hybrid learning approach upon the systems that had previously been worked with twenty-four different classifiers. Instead of relying on only one classifier's judgment, it is expected that taking into account the opinions of several learners is a wise decision. Moreover, by using...
Show moreIn the literature, there has been limited research that systematically investigates the possibility of exercising a hybrid approach by simply learning from the output of numerous base-level learners. We analyze a hybrid learning approach upon the systems that had previously been worked with twenty-four different classifiers. Instead of relying on only one classifier's judgment, it is expected that taking into account the opinions of several learners is a wise decision. Moreover, by using clustering techniques some base-level classifiers were eliminated from the hybrid learner input. We had three different experiments each with a different number of base-level classifiers. We empirically show that the hybrid learning approach generally yields better performance than the best selected base-level learners and majority voting under some conditions.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13162
- Subject Headings
- Computer software--Testing, Computer software--Quality control, Computational learning theory, Machine learning, Digital computer simulation
- Format
- Document (PDF)
- Title
- Analysis of machine learning algorithms on bioinformatics data of varying quality.
- Creator
- Shanab, Ahmad Abu, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
One of the main applications of machine learning in bioinformatics is the construction of classification models which can accurately classify new instances using information gained from previous instances. With the help of machine learning algorithms (such as supervised classification and gene selection) new meaningful knowledge can be extracted from bioinformatics datasets that can help in disease diagnosis and prognosis as well as in prescribing the right treatment for a disease. One...
Show moreOne of the main applications of machine learning in bioinformatics is the construction of classification models which can accurately classify new instances using information gained from previous instances. With the help of machine learning algorithms (such as supervised classification and gene selection) new meaningful knowledge can be extracted from bioinformatics datasets that can help in disease diagnosis and prognosis as well as in prescribing the right treatment for a disease. One particular challenge encountered when analyzing bioinformatics datasets is data noise, which refers to incorrect or missing values in datasets. Noise can be introduced as a result of experimental errors (e.g. faulty microarray chips, insufficient resolution, image corruption, and incorrect laboratory procedures), as well as other errors (errors during data processing, transfer, and/or mining). A special type of data noise called class noise, which occurs when an instance/example is mislabeled. Previous research showed that class noise has a detrimental impact on machine learning algorithms (e.g. worsened classification performance and unstable feature selection). In addition to data noise, gene expression datasets can suffer from the problems of high dimensionality (a very large feature space) and class imbalance (unequal distribution of instances between classes). As a result of these inherent problems, constructing accurate classification models becomes more challenging.
Show less - Date Issued
- 2015
- PURL
- http://purl.flvc.org./fau/fd/FA00004425, http://purl.flvc.org/fau/fd/FA00004425
- Subject Headings
- Artificial intelligence, Bioinformatics, Machine learning, System design, Theory of computation
- Format
- Document (PDF)
- Title
- Machine learning techniques for alleviating inherent difficulties in bioinformatics data.
- Creator
- Dittman, David, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In response to the massive amounts of data that make up a large number of bioinformatics datasets, it has become increasingly necessary for researchers to use computers to aid them in their endeavors. With difficulties such as high dimensionality, class imbalance, noisy data, and difficult to learn class boundaries, being present within the data, bioinformatics datasets are a challenge to work with. One potential source of assistance is the domain of data mining and machine learning, a field...
Show moreIn response to the massive amounts of data that make up a large number of bioinformatics datasets, it has become increasingly necessary for researchers to use computers to aid them in their endeavors. With difficulties such as high dimensionality, class imbalance, noisy data, and difficult to learn class boundaries, being present within the data, bioinformatics datasets are a challenge to work with. One potential source of assistance is the domain of data mining and machine learning, a field which focuses on working with these large amounts of data and develops techniques to discover new trends and patterns that are hidden within the data and to increases the capability of researchers and practitioners to work with this data. Within this domain there are techniques designed to eliminate irrelevant or redundant features, balance the membership of the classes, handle errors found in the data, and build predictive models for future data.
Show less - Date Issued
- 2015
- PURL
- http://purl.flvc.org/fau/fd/FA00004362, http://purl.flvc.org/fau/fd/FA00004362
- Subject Headings
- Artificial intelligence, Bioinformatics, Machine learning, System design, Theory of computation
- Format
- Document (PDF)
- Title
- Video and Image Analysis using Statistical and Machine Learning Techniques.
- Creator
- Luo, Qiming, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Digital videos and images are effective media for capturing spatial and ternporal information in the real world. The rapid growth of digital videos has motivated research aimed at developing effective algorithms, with the objective of obtaining useful information for a variety of application areas, such as security, commerce, medicine, geography, etc. This dissertation presents innovative and practical techniques, based on statistics and machine learning, that address some key research...
Show moreDigital videos and images are effective media for capturing spatial and ternporal information in the real world. The rapid growth of digital videos has motivated research aimed at developing effective algorithms, with the objective of obtaining useful information for a variety of application areas, such as security, commerce, medicine, geography, etc. This dissertation presents innovative and practical techniques, based on statistics and machine learning, that address some key research problems in video and image analysis, including video stabilization, object classification, image segmentation, and video indexing. A novel unsupervised multi-scale color image segmentation algorithm is proposed. The basic idea is to apply mean shift clustering to obtain an over-segmentation, and then merge regions at multiple scales to minimize the MDL criterion. The performance on the Berkeley segmentation benchmark compares favorably with some existing approaches. This algorithm can also operate on one-dimensional feature vectors representing each frame in ocean survey videos, which results in a novel framework for building a hierarchical video index. The advantage is to provide the user with the flexibility of browsing the videos at arbitrary levels of detail, which makes it more efficient for users to browse a long video in order to find interesting information based on the hierarchical index. Also, an empirical study on classification of ships in surveillance videos is presented. A comparative performance study on three classification algorithms is conducted. Based on this study, an effective feature extraction and classification algorithm for classifying ships in coastline surveillance videos is proposed. Finally, an empirical study on video stabilization is presented, which includes a comparative performance study on four motion estimation methods and three motion correction methods. Based on this study, an effective real-time video stabilization algorithm for coastline surveillance is proposed, which involves a novel approach to reduce error accumulation.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012574
- Subject Headings
- Image processing--Digital techniques, Electronic surveillance, Computational learning theory
- Format
- Document (PDF)
- Title
- Evolutionary Methods for Mining Data with Class Imbalance.
- Creator
- Drown, Dennis J., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different...
Show moreClass imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012515
- Subject Headings
- Combinatorial group theory, Data mining, Machine learning, Data structure (Computer science)
- Format
- Document (PDF)
- Title
- An Empirical Study of Random Forests for Mining Imbalanced Data.
- Creator
- Golawala, Moiz M., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Skewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed...
Show moreSkewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed datasets. This work presents a comprehensive suite of experimentation evaluating the effectiveness of random forests for learning from imbalanced data. Reasonable parameter settings (for the Weka implementation) for ensemble size and number of random features selected are determined through experimentation oil 10 datasets. Further, the application of seven different data sampling techniques that are common methods for handling imbalanced data, in conjunction with RF, is also assessed. Finally, RF is benchmarked against 10 other commonly-used machine learning algorithms, and is shown to provide very strong performance. A total of 35 imbalanced datasets are used, and over one million classifiers are constructed in this work.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012520
- Subject Headings
- Data mining--Case studies, Machine learning--Case studies, Data structure (Computer science), Trees (Graph theory)--Case studies
- Format
- Document (PDF)