Current Search: Decision trees (x)
View All Items
- Title
- Alleviating class imbalance using data sampling: Examining the effects on classification algorithms.
- Creator
- Napolitano, Amri E., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Imbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling...
Show moreImbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling techniques using five different classifiers and four different datasets. The work addresses which sampling techniques produce the best performance in the presence of class unbalance, which classifiers are most robust to the problem, as well as which sampling techniques perform better or worse with each classifier. Extensive statistical analysis of these results is provided, in addition to an examination of the qualitative effects of the sampling techniques on the types of predictions made by the C4.5 classifier.
Show less - Date Issued
- 2006
- PURL
- http://purl.flvc.org/fcla/dt/13413
- Subject Headings
- Combinatorial group theory, Data mining, Decision trees, Machine learning
- Format
- Document (PDF)
- Title
- A comprehensive comparative study of multiple classification techniques for software quality estimation.
- Creator
- Puppala, Kishore., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Reliability and quality are desired features in industrial software applications. In some cases, they are absolutely essential. When faced with limited resources, software project managers will need to allocate such resources to the most fault prone areas. The ability to accurately classify a software module as fault-prone or not fault-prone enables the manager to make an informed resource allocation decision. An accurate quality classification avoids wasting resources on modules that are not...
Show moreReliability and quality are desired features in industrial software applications. In some cases, they are absolutely essential. When faced with limited resources, software project managers will need to allocate such resources to the most fault prone areas. The ability to accurately classify a software module as fault-prone or not fault-prone enables the manager to make an informed resource allocation decision. An accurate quality classification avoids wasting resources on modules that are not fault-prone. It also avoids missing the opportunity to correct faults relatively early in the development cycle, when they are less costly. This thesis seeks to introduce the classification algorithms (classifiers) that are implemented in the WEKA software tool. WEKA (Waikato Environment for Knowledge Analysis) was developed at the University of Waikato in New Zealand. An empirical investigation is performed using a case study at a real-world system.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/13039
- Subject Headings
- Software engineering, Computer software--Quality control, Decision trees
- Format
- Document (PDF)
- Title
- A comparative study of classification algorithms for network intrusion detection.
- Creator
- Wang, Yunling., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
As network-based computer systems play increasingly vital roles in modern society, they have become the targets of criminals. Network security has never been more important a subject than in today's extensively interconnected computer world. Intrusion Detection Systems (IDS) have been used along with the data mining techniques to detect intrusions. In this thesis, we present a comparative study of intrusion detection using a decision-tree learner (C4.5), two rule-based learners (ripper and...
Show moreAs network-based computer systems play increasingly vital roles in modern society, they have become the targets of criminals. Network security has never been more important a subject than in today's extensively interconnected computer world. Intrusion Detection Systems (IDS) have been used along with the data mining techniques to detect intrusions. In this thesis, we present a comparative study of intrusion detection using a decision-tree learner (C4.5), two rule-based learners (ripper and ridor), a learner to combine decision trees and rules (PART), and two instance-based learners (IBK and Nnge). We investigate and compare the performance of IDSs based on the six techniques, with respect to a case study of the DAPAR KDD-1999 network intrusion detection project. Investigation results demonstrated that data mining techniques are very useful in the area of intrusion detection.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13102
- Subject Headings
- Computer networks--Security measures, Data mining, Decision trees
- Format
- Document (PDF)
- Title
- Three-group software quality classification modeling with TREEDISC algorithm.
- Creator
- Liu, Yongbin., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Maintaining superior quality and reliability of software systems is important nowadays. Software quality modeling detects fault-prone modules and enables us to achieve high quality in software system by focusing on fewer modules, because of limited resources and budget. Tree-based modeling is a simple and effective method that predicts the fault proneness in software systems. In this thesis, we introduce TREEDISC modeling technique with a three-group classification rule to predict the quality...
Show moreMaintaining superior quality and reliability of software systems is important nowadays. Software quality modeling detects fault-prone modules and enables us to achieve high quality in software system by focusing on fewer modules, because of limited resources and budget. Tree-based modeling is a simple and effective method that predicts the fault proneness in software systems. In this thesis, we introduce TREEDISC modeling technique with a three-group classification rule to predict the quality of software modules. A general classification rule is applied and validated. The three impact parameters, group number, minimum leaf size and significant level, are thoroughly evaluated. An optimization procedure is conducted and empirical results are presented. Conclusions about the impact factors as well as the robustness of our research are performed. TREEDISC modeling technique with three-group classification has proved to be an efficient and convincing method in software quality control.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/13008
- Subject Headings
- Computer software--Quality control, Software measurement, Decision trees
- Format
- Document (PDF)
- Title
- Classification techniques for noisy and imbalanced data.
- Creator
- Napolitano, Amri E., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are...
Show moreMachine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
Show less - Date Issued
- 2009
- PURL
- http://purl.flvc.org/FAU/369201
- Subject Headings
- Combinatorial group theory, Data mining, Technological innovations, Decision trees, Machine learning, Filters (Mathematics)
- Format
- Document (PDF)
- Title
- Partitioning filter approach to noise elimination: An empirical study in software quality classification.
- Creator
- Rebours, Pierre., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
This thesis presents two new noise filtering techniques which improve the quality of training datasets by removing noisy data. The training dataset is first split into subsets, and base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. The Multiple-Partitioning Filter combines several classifiers on each split. The Iterative-Partitioning Filter only uses...
Show moreThis thesis presents two new noise filtering techniques which improve the quality of training datasets by removing noisy data. The training dataset is first split into subsets, and base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. The Multiple-Partitioning Filter combines several classifiers on each split. The Iterative-Partitioning Filter only uses one base learner, but goes through multiple iterations. The amount of noise removed is varied by tuning the filtering level or the number of iterations. Empirical studies on a high assurance software project compare the effectiveness of our noise removal approaches with two other filters, the Cross-Validation Filter and the Ensemble Filter. Our studies suggest that using several base classifiers as well as performing several iterations with a conservative scheme may improve the efficiency of the filter.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13110
- Subject Headings
- Software measurement, Computer software--Quality control, Decision trees, Recursive partitioning
- Format
- Document (PDF)