Current Search: Data mining -- Technological innovations (x)
View All Items
- Title
- Ensemble Learning Algorithms for the Analysis of Bioinformatics Data.
- Creator
- Fazelpour, Alireza, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Developments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset...
Show moreDevelopments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset size, noisy data, and complexity of data in terms of hard to distinguish decision boundaries between classes within the data. In recognition of the aforementioned challenges, this dissertation utilizes a variety of machine-learning and data-mining techniques, such as ensemble classification algorithms in conjunction with data sampling and feature selection techniques to alleviate these problems, while improving the classification results of models built on these datasets. However, in building classification models researchers and practitioners encounter the challenge that there is not a single classifier that performs relatively well in all cases. Thus, numerous classification approaches, such as ensemble learning methods, have been developed to address this problem successfully in a majority of circumstances. Ensemble learning is a promising technique that generates multiple classification models and then combines their decisions into a single final result. Ensemble learning often performs better than single-base classifiers in performing classification tasks. This dissertation conducts thorough empirical research by implementing a series of case studies to evaluate how ensemble learning techniques can be utilized to enhance overall classification performance, as well as improve the generalization ability of ensemble models. This dissertation investigates ensemble learning techniques of the boosting, bagging, and random forest algorithms, and proposes a number of modifications to the existing ensemble techniques in order to improve further the classification results. This dissertation examines the effectiveness of ensemble learning techniques on accounting for challenging characteristics of class imbalance and difficult-to-learn class decision boundaries. Next, it looks into ensemble methods that are relatively tolerant to class noise, and not only can account for the problem of class noise, but improves classification performance. This dissertation also examines the joint effects of data sampling along with ensemble techniques on whether sampling techniques can further improve classification performance of built ensemble models.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004588
- Subject Headings
- Bioinformatics., Data mining -- Technological innovations., Machine learning.
- Format
- Document (PDF)
- Title
- Real-Time Data Analytics and Optimization for Computational Advertising.
- Creator
- Liu, Hui, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Online advertising has built a market of hundreds of billions of dollars and still continues to grow. With well developed techniques in big data storage, data mining and analytics, online advertising is able to reach targeted audiences e ctively. Real- time bidding refers to the buying and selling of online ad impressions through ad inventory auctions which occur in the time it takes a webpage to load. How to de- termine the bidding price and how to allocate the budget of advertising is the...
Show moreOnline advertising has built a market of hundreds of billions of dollars and still continues to grow. With well developed techniques in big data storage, data mining and analytics, online advertising is able to reach targeted audiences e ctively. Real- time bidding refers to the buying and selling of online ad impressions through ad inventory auctions which occur in the time it takes a webpage to load. How to de- termine the bidding price and how to allocate the budget of advertising is the key to successful ad campaigns. Both of these aspects are fundamental to most campaign optimizations and we will introduce both of them in this thesis. For bidding price determination, we improved the estimation of CTR (Click Through Rate) (one of the most important factors of determining the bidding price) by using a re ned hierar- chical tree structure for the estimation. The result of the experiment and the A/B test showed our proposal can provide stable improvement. For budget allocation, we introduce SCO (Single Campaign Optimization) and CCO (Cross Campaign Opti- mization). SCO has been applied by our commercial partner while CCO needs more research. We will rst introduce the methods of SCO and then give our proposal about CCO. We modeled CCO as a LP (Linear Programming) problem as well as designed an e ective procedure to implement optimal impressions distribution. Our simulation showed our proposal can signi cantly increase global Gross Pro t (GP).
Show less - Date Issued
- 2017
- PURL
- http://purl.flvc.org/fau/fd/FA00004940, http://purl.flvc.org/fau/fd/FA00004940
- Subject Headings
- Internet marketing--Technological innovations., Internet advertising--Technological innovations., Data mining., Web usage mining., Business--Data processing.
- Format
- Document (PDF)
- Title
- Asset identification using image descriptors.
- Creator
- Friedel, Reena Ursula., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Asset management is a time consuming and error prone process. Information Technology (IT) personnel typically perform this task manually by visually inspecting assets to identify misplaced assets. If this process is automated and provided to IT personnel it would prove very useful in keeping track of assets in a server rack. A mobile based solution is proposed to automate this process. The asset management application on the tablet captures images of assets and searches an annotated database...
Show moreAsset management is a time consuming and error prone process. Information Technology (IT) personnel typically perform this task manually by visually inspecting assets to identify misplaced assets. If this process is automated and provided to IT personnel it would prove very useful in keeping track of assets in a server rack. A mobile based solution is proposed to automate this process. The asset management application on the tablet captures images of assets and searches an annotated database to identify the asset. We evaluate the matching performance and speed of asset matching using three different image feature descriptors. Methods to reduce feature extraction and matching complexity were developed. Performance and accuracy tradeoffs were studied, domain specific problems were identified, and optimizations for mobile platforms were made. The results show that the proposed methods reduce complexity of asset matching by 67% when compared to the matching process using unmodified image feature descriptors.
Show less - Date Issued
- 2012
- PURL
- http://purl.flvc.org/FAU/3342051
- Subject Headings
- Data mining, Technological innovations, Mobile computing, User-centered system design, Application software, Development
- Format
- Document (PDF)
- Title
- Collabortive filtering using machine learning and statistical techniques.
- Creator
- Su, Xiaoyuan., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Collaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputation-boosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a model-based CF technique, TAN-ELR CF, and two hybrid CF...
Show moreCollaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputation-boosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a model-based CF technique, TAN-ELR CF, and two hybrid CF algorithms, sequential mixture CF and joint mixture CF. Empirical results show that our proposed CF algorithms have very good predictive performances. In the investigation of applying imputation techniques in mining incomplete data, we proposed imputation-helped classifiers, and VCI predictors (voting on classifications from imputed learning sets), both of which resulted in significant improvement in classification performance for incomplete data over conventional machine learned classifiers, including kNN, neural network, one rule, decision table, SVM, logistic regression, decision tree (C4.5), random forest, and decision list (PART), and the well known Bagging predictors. The main imputation techniques involved in these algorithms include EM (expectation maximization) and BMI (Bayesian multiple imputation).
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/FAU/186301
- Subject Headings
- Filters (Mathematics), Machine learning, Data mining, Technological innovations, Database management, Combinatorial group theory
- Format
- Document (PDF)
- Title
- Classification techniques for noisy and imbalanced data.
- Creator
- Napolitano, Amri E., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are...
Show moreMachine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
Show less - Date Issued
- 2009
- PURL
- http://purl.flvc.org/FAU/369201
- Subject Headings
- Combinatorial group theory, Data mining, Technological innovations, Decision trees, Machine learning, Filters (Mathematics)
- Format
- Document (PDF)
- Title
- Feature selection techniques and applications in bioinformatics.
- Creator
- Dittman, David, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Possibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of...
Show morePossibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of various feature selection techniques and classifier in different scenarios from bioinformatics. Overall, this thesis shows the importance of the use of feature selection in bioinformatics.
Show less - Date Issued
- 2011
- PURL
- http://purl.flvc.org/FAU/3175016
- Subject Headings
- Bioinformatifcs, Data mining, Technological innovations, Computational biology, Combinatorial group theory, Filters (Mathematics), Ranking and selection (Statistics)
- Format
- Document (PDF)
- Title
- Stability analysis of feature selection approaches with low quality data.
- Creator
- Altidor, Wilker., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
One of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in...
Show moreOne of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in terms of their robustness to class noise. It presents a noise-based stability analysis that measures the degree of agreement between a feature ranking techniques output on a clean dataset versus its outputs on the same dataset but corrupted with different combinations of noise level and noise distribution. It then considers classification performances from models built with a subset of the original features obtained after applying feature ranking techniques on noisy data. It proposes the focused ensemble feature ranking as a noise-tolerant approach to feature selection and compares focused ensembles with general ensembles in terms of the ability of the selected features to withstand the impact of class noise when used to build classification models. Finally, it explores three approaches for addressing the combined problem of high dimensionality and class imbalance. Collectively, this research shows the importance of considering class noise when performing feature selection.
Show less - Date Issued
- 2011
- PURL
- http://purl.flvc.org/FAU/3174501
- Subject Headings
- Data mining, Technological innovations, Combinatorial group theory, Filters (Mathematics), Ranking and selection (Statistics)
- Format
- Document (PDF)