Current Search: Khoshgoftaar, Taghi M. (x)
View All Items
Pages
- Title
- Comparison of Data Sampling Approaches for Imbalanced Bioinformatics Data.
- Creator
- Dittman, David, Wald, Randall, Napolitano, Amri E., Graduate College, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Class imbalance is a frequent problem found in bioinformatics datasets. Unfortunately, the minority class is usually also the class of interest. One of the methods to improve this situation is data sampling. There are a number of different data sampling methods, each with their own strengths and weaknesses, which makes choosing one a difficult prospect. In our work we compare three data sampling techniques Random Undersampling, Random Oversampling, and SMOTE on six bioinformatics datasets...
Show moreClass imbalance is a frequent problem found in bioinformatics datasets. Unfortunately, the minority class is usually also the class of interest. One of the methods to improve this situation is data sampling. There are a number of different data sampling methods, each with their own strengths and weaknesses, which makes choosing one a difficult prospect. In our work we compare three data sampling techniques Random Undersampling, Random Oversampling, and SMOTE on six bioinformatics datasets with varying levels of class imbalance. Additionally, we apply two different classifiers to the problem 5-NN and SVM, and use feature selection to reduce our datasets to 25 features prior to applying sampling. Our results show that there is very little difference between the data sampling techniques, although Random Undersampling is the most frequent top performing data sampling technique for both of our classifiers. We also performed statistical analysis which confirms that there is no statistical difference between the techniques. Therefore, our recommendation is to use Random Undersampling when choosing a data sampling technique, because it is less computationally expensive to implement than SMOTE and it also reduces the size of the dataset, which will improve subsequent computational costs without sacrificing classification performance.
Show less - Date Issued
- 2014
- PURL
- http://purl.flvc.org/fau/fd/FA00005811
- Format
- Document (PDF)
- Title
- An Empirical Study of Random Forests for Mining Imbalanced Data.
- Creator
- Golawala, Moiz M., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Skewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed...
Show moreSkewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed datasets. This work presents a comprehensive suite of experimentation evaluating the effectiveness of random forests for learning from imbalanced data. Reasonable parameter settings (for the Weka implementation) for ensemble size and number of random features selected are determined through experimentation oil 10 datasets. Further, the application of seven different data sampling techniques that are common methods for handling imbalanced data, in conjunction with RF, is also assessed. Finally, RF is benchmarked against 10 other commonly-used machine learning algorithms, and is shown to provide very strong performance. A total of 35 imbalanced datasets are used, and over one million classifiers are constructed in this work.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012520
- Subject Headings
- Data mining--Case studies, Machine learning--Case studies, Data structure (Computer science), Trees (Graph theory)--Case studies
- Format
- Document (PDF)
- Title
- A review of the stability of feature selection techniques for bioinformatics data.
- Creator
- Awada, Wael, Khoshgoftaar, Taghi M., Dittman, David, Wald, Randall, Napolitano, Amri E., Graduate College
- Date Issued
- 2013-04-12
- PURL
- http://purl.flvc.org/fcla/dt/3361293
- Subject Headings
- Bioinformatics, DNA microarrays, Data mining
- Format
- Document (PDF)
- Title
- An Empirical Study of Performance Metrics for Classifier Evaluation in Machine Learning.
- Creator
- Bruhns, Stefan, Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
A variety of classifiers for solving classification problems is available from the domain of machine learning. Commonly used classifiers include support vector machines, decision trees and neural networks. These classifiers can be configured by modifying internal parameters. The large number of available classifiers and the different configuration possibilities result in a large number of combinatiorrs of classifier and configuration settings, leaving the practitioner with the problem of...
Show moreA variety of classifiers for solving classification problems is available from the domain of machine learning. Commonly used classifiers include support vector machines, decision trees and neural networks. These classifiers can be configured by modifying internal parameters. The large number of available classifiers and the different configuration possibilities result in a large number of combinatiorrs of classifier and configuration settings, leaving the practitioner with the problem of evaluating the performance of different classifiers. This problem can be solved by using performance metrics. However, the large number of available metrics causes difficulty in deciding which metrics to use and when comparing classifiers on the basis of multiple metrics. This paper uses the statistical method of factor analysis in order to investigate the relationships between several performance metrics and introduces the concept of relative performance which has the potential to case the process of comparing several classifiers. The relative performance metric is also used to evaluate different support vector machine classifiers and to determine if the default settings in the Weka data mining tool are reasonable.
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/fau/fd/FA00012508
- Subject Headings
- Machine learning, Computer algorithms, Pattern recognition systems, Data structures (Computer science), Kernel functions, Pattern perception--Data processing
- Format
- Document (PDF)
- Title
- An Empirical Study of Ordinal and Non-ordinal Classification Algorithms for Intrusion Detection in WLANs.
- Creator
- Gopalakrishnan, Leelakrishnan, Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Ordinal classification refers to an important category of real world problems, in which the attributes of the instances to be classified and the classes are linearly ordered. Many applications of machine learning frequently involve situations exhibiting an order among the different categories represented by the class attribute. In ordinal classification the class value is converted into a numeric quantity and regression algorithms are applied to the transformed data. The data is later...
Show moreOrdinal classification refers to an important category of real world problems, in which the attributes of the instances to be classified and the classes are linearly ordered. Many applications of machine learning frequently involve situations exhibiting an order among the different categories represented by the class attribute. In ordinal classification the class value is converted into a numeric quantity and regression algorithms are applied to the transformed data. The data is later translated back into a discrete class value in a postprocessing step. This thesis is devoted to an empirical study of ordinal and non-ordinal classification algorithms for intrusion detection in WLANs. We used ordinal classification in conjunction with nine classifiers for the experiments in this thesis. All classifiers are parts of the WEKA machinelearning workbench. The results indicate that most of the classifiers give similar or better results with ordinal classification compared to non-ordinal classification.
Show less - Date Issued
- 2006
- PURL
- http://purl.flvc.org/fau/fd/FA00012521
- Subject Headings
- Wireless LANs--Security measures, Computer networks--Security measures, Data structures (Computer science), Multivariate analysis
- Format
- Document (PDF)
- Title
- Classification of software quality using Bayesian belief networks.
- Creator
- Dong, Yuhong., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
In today's competitive environment for software products, quality has become an increasingly important asset to software development organizations. Software quality models are tools for focusing efforts to find faults early in the development. Delaying corrections can lead to higher costs. In this research, the classification Bayesian Networks modelling technique was used to predict the software quality by classifying program modules either as fault-prone or not fault-prone. A general...
Show moreIn today's competitive environment for software products, quality has become an increasingly important asset to software development organizations. Software quality models are tools for focusing efforts to find faults early in the development. Delaying corrections can lead to higher costs. In this research, the classification Bayesian Networks modelling technique was used to predict the software quality by classifying program modules either as fault-prone or not fault-prone. A general classification rule was applied to yield classification Bayesian Belief Network models. Six classification Bayesian Belief Network models were developed based on quality metrics data records of two very large window application systems. The fit data set was used to build the model and the test data set was used to evaluate the model. The first two models used median based data cluster technique, the second two models used median as critical value to cluster metrics using Generalized Boolean Discriminant Function and the third two models used Kolniogorov-Smirnov test to select the critical value to cluster metrics using Generalized Boolean Discriminant Function; All six models used the product metrics (FAULT or CDCHURN) as predictors.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12918
- Subject Headings
- Computer software--Quality control, Software measurement, Bayesian statistical decision theory
- Format
- Document (PDF)
- Title
- Combining decision trees for software quality classification: An empirical study.
- Creator
- Geleyn, Erik., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
The increased reliance on computer systems in the modern world has created a need for engineering reliability control of computer systems to the highest standards. Software quality classification models are one of the important tools to achieve high reliability. They can be used to calibrate software metrics-based models to predict whether software modules are fault-prone or not. Timely use of such models can aid in detecting faults early in the life cycle. Individual classifiers may be...
Show moreThe increased reliance on computer systems in the modern world has created a need for engineering reliability control of computer systems to the highest standards. Software quality classification models are one of the important tools to achieve high reliability. They can be used to calibrate software metrics-based models to predict whether software modules are fault-prone or not. Timely use of such models can aid in detecting faults early in the life cycle. Individual classifiers may be improved by using the combined decision from multiple classifiers. Several algorithms implement this concept and are investigated in this thesis. These combined learners provide the software quality modeling community with accurate, robust, and goal oriented models. This study presents a comprehensive comparative evaluation of meta learners using a strong and a weak learner, C4.5 and Decision Stump, respectively. Two case studies of industrial software systems are used in our empirical investigations.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12898
- Subject Headings
- Computer software--Quality control, Software measurement
- Format
- Document (PDF)
- Title
- Classification of software quality using tree modeling with the S-Plus algorithm.
- Creator
- Deng, Jianyu., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
In today's competitive environment for software products, quality has become an increasingly important asset to software development organizations. Software quality models are tools for focusing efforts to find faults early in the development. Delaying corrections can lead to higher costs. In this research, the classification tree modeling technique was used to predict the software quality by classifying program modules either as fault-prone or not fault-prone. The S-Plus regression tree...
Show moreIn today's competitive environment for software products, quality has become an increasingly important asset to software development organizations. Software quality models are tools for focusing efforts to find faults early in the development. Delaying corrections can lead to higher costs. In this research, the classification tree modeling technique was used to predict the software quality by classifying program modules either as fault-prone or not fault-prone. The S-Plus regression tree algorithm and a general classification rule were applied to yield classification tree models. Two classification tree models were developed based on four consecutive releases of a very large legacy telecommunications system. The first release was used as the training data set and the subsequent three releases were used as evaluation data sets. The first model used twenty-four product metrics and four execution metrics as candidate predictors. The second model added fourteen process metrics as candidate predictors.
Show less - Date Issued
- 1999
- PURL
- http://purl.flvc.org/fcla/dt/15707
- Subject Headings
- Computer software--Quality control, Software measurement, Computer software--Evaluation
- Format
- Document (PDF)
- Title
- Choosing software reliability models.
- Creator
- Woodcock, Timothy G., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
One of the important problems which software engineers face is how to determine which software reliability model should be used for a particular system. Some recent attempts to compare different models used complementary graphical and analytical techniques. These techniques require an excessive amount of time for plotting the data and running the analyses, and they are still rather subjective as to which model is best. So another technique needs to be found that is simpler and yet yields a...
Show moreOne of the important problems which software engineers face is how to determine which software reliability model should be used for a particular system. Some recent attempts to compare different models used complementary graphical and analytical techniques. These techniques require an excessive amount of time for plotting the data and running the analyses, and they are still rather subjective as to which model is best. So another technique needs to be found that is simpler and yet yields a less subjective measure of goodness of fit. The Akaike Information Criterion (AIC) is proposed as a new approach for selecting the best model. The performance of AIC is measured by Monte-Carlo simulation and by comparison to published data sets. The AIC chooses the correct model 95% of the time.
Show less - Date Issued
- 1989
- PURL
- http://purl.flvc.org/fcla/dt/14561
- Subject Headings
- Computer software--Testing, Computer software--Reliability
- Format
- Document (PDF)
- Title
- Evaluating indirect and direct classification techniques for network intrusion detection.
- Creator
- Ibrahim, Nawal H., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Increasing aggressions through cyber terrorism pose a constant threat to information security in our day to day life. Implementing effective intrusion detection systems (IDSs) is an essential task due to the great dependence on networked computers for the operational control of various infrastructures. Building effective IDSs, unfortunately, has remained an elusive goal owing to the great technical challenges involved, and applied data mining techniques are increasingly being utilized in...
Show moreIncreasing aggressions through cyber terrorism pose a constant threat to information security in our day to day life. Implementing effective intrusion detection systems (IDSs) is an essential task due to the great dependence on networked computers for the operational control of various infrastructures. Building effective IDSs, unfortunately, has remained an elusive goal owing to the great technical challenges involved, and applied data mining techniques are increasingly being utilized in attempts to overcome the difficulties. This thesis presents a comparative study of the traditional "direct" approaches with the recently explored "indirect" approaches of classification which use class binarization and combiner techniques for intrusion detection. We evaluate and compare the performance of IDSs based on various data mining algorithms, in the context of a well known network intrusion evaluation data set. It is empirically shown that data mining algorithms when applied using the indirect classification approach yield better intrusion detection models.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13128
- Subject Headings
- Computer networks--Security measures, Computer security, Software measurement, Data mining
- Format
- Document (PDF)
- Title
- Evolutionary Methods for Mining Data with Class Imbalance.
- Creator
- Drown, Dennis J., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different...
Show moreClass imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012515
- Subject Headings
- Combinatorial group theory, Data mining, Machine learning, Data structure (Computer science)
- Format
- Document (PDF)
- Title
- Predicting decay in program modules of legacy software systems.
- Creator
- Joshi, Dhaval Kunvarabhai., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Legacy software systems may go through many releases. It is important to ensure that the reliability of a system improves with subsequent releases. Methods are needed to identify decaying software modules, i.e., modules for which quality decreases with each system release. Early identification of such modules during the software life cycle allows us to focus quality improvement efforts in a more productive manner, by reducing resources wasted for testing and improving the entire system. We...
Show moreLegacy software systems may go through many releases. It is important to ensure that the reliability of a system improves with subsequent releases. Methods are needed to identify decaying software modules, i.e., modules for which quality decreases with each system release. Early identification of such modules during the software life cycle allows us to focus quality improvement efforts in a more productive manner, by reducing resources wasted for testing and improving the entire system. We present a scheme to classify modules in three groups---Decayed, Improved, and Unchanged---based on a three-group software quality classification method. This scheme is applied to three different case studies, using a case-based reasoning three-group classification model. The model identifies decayed modules, and is validated over different releases. The main goal of this work is to focus on the evolution of program modules of a legacy software system to identify modules that are difficult to maintain and may need to be reengineered.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12899
- Subject Headings
- Software reengineering, Computer software--Quality control, Software measurement, Software maintenance
- Format
- Document (PDF)
- Title
- Resource-sensitive intrusion detection models for network traffic.
- Creator
- Abushadi, Mohamed E., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Network security is an important subject in today's extensively interconnected computer world. The industry, academic institutions, small and large businesses and even residences are now greatly at risk from the increasing onslaught of computer attacks. Such malicious efforts cause damage ranging from mere violation of confidentiality and issues of privacy up to actual financial loss if business operations are compromised, or even further, loss of human lives in the case of mission-critical...
Show moreNetwork security is an important subject in today's extensively interconnected computer world. The industry, academic institutions, small and large businesses and even residences are now greatly at risk from the increasing onslaught of computer attacks. Such malicious efforts cause damage ranging from mere violation of confidentiality and issues of privacy up to actual financial loss if business operations are compromised, or even further, loss of human lives in the case of mission-critical networked computer applications. Intrusion Detection Systems (IDS) have been used along with the help of data mining modeling efforts to detect intruders, yet with the limitation of organizational resources it is unreasonable to inspect every network alarm raised by the IDS. Modified Expected Cost of Misclassification ( MECM) is a model selection measure that is resource-aware and cost-sensitive at the same time, and has proven to be effective for the identification of the best resource-based intrusion detection model.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/13054
- Subject Headings
- Computer networks--Security measures--Automation, Computers--Access control, Data mining, Computer security
- Format
- Document (PDF)
- Title
- An empirical study of a three-group classification model using case-based reasoning.
- Creator
- Bhupathiraju, Sajan S., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Reliability is becoming a very important and competitive factor for software-based products. Software metrics-based quality estimation models provide a systematic and scientific approach to detect software faults early in the life cycle, improving software reliability. Classification models for software quality estimation usually classify observations into two groups. This thesis presents an empirical study of an algorithm for software quality classification using three groups: Three-Group...
Show moreReliability is becoming a very important and competitive factor for software-based products. Software metrics-based quality estimation models provide a systematic and scientific approach to detect software faults early in the life cycle, improving software reliability. Classification models for software quality estimation usually classify observations into two groups. This thesis presents an empirical study of an algorithm for software quality classification using three groups: Three-Group Classification Model using Case-Based Reasoning (CBR). The basic idea behind the algorithm is that it uses the commonly used two-group classification technique three times. It can also be implemented with other quality estimation methods, such as Logistic Regression, Regression Trees, etc. This work evaluates the obtained quality with that from the Discriminant Analysis method. Empirical studies were conducted using an inspection data set, collected from a telecommunications system. It was observed that CBR performs better than Discriminant Analysis.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12903
- Subject Headings
- Software measurement, Computer software--Quality control
- Format
- Document (PDF)
- Title
- An empirical study of a three-group software quality classification model.
- Creator
- Cherukuri, Reena., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Maintaining superior quality and reliability of software systems is an important issue in software reliability engineering. Software quality estimation models based on software metrics provide a systematic and scientific way to detect fault-prone modules and enable us to achieve high quality in software systems by focusing on high-risk modules within limited resources and budget. In previous works, classification models for software quality usually classified modules into two groups, fault...
Show moreMaintaining superior quality and reliability of software systems is an important issue in software reliability engineering. Software quality estimation models based on software metrics provide a systematic and scientific way to detect fault-prone modules and enable us to achieve high quality in software systems by focusing on high-risk modules within limited resources and budget. In previous works, classification models for software quality usually classified modules into two groups, fault-prone or not fault-prone. This thesis presents a new technique for classifying modules into three groups, i.e., high-risk, medium-risk, and low-risk groups. This new technique calibrates three-group models according to the resources available, which makes it different from other classification techniques. The proposed three-group classification method proved to be efficient and useful for resource utilization in software quality control.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/13004
- Subject Headings
- Software measurement, Computer software--Quality control
- Format
- Document (PDF)
- Title
- An empirical study of combining techniques in software quality classification.
- Creator
- Eroglu, Cemal., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
In the literature, there has been limited research that systematically investigates the possibility of exercising a hybrid approach by simply learning from the output of numerous base-level learners. We analyze a hybrid learning approach upon the systems that had previously been worked with twenty-four different classifiers. Instead of relying on only one classifier's judgment, it is expected that taking into account the opinions of several learners is a wise decision. Moreover, by using...
Show moreIn the literature, there has been limited research that systematically investigates the possibility of exercising a hybrid approach by simply learning from the output of numerous base-level learners. We analyze a hybrid learning approach upon the systems that had previously been worked with twenty-four different classifiers. Instead of relying on only one classifier's judgment, it is expected that taking into account the opinions of several learners is a wise decision. Moreover, by using clustering techniques some base-level classifiers were eliminated from the hybrid learner input. We had three different experiments each with a different number of base-level classifiers. We empirically show that the hybrid learning approach generally yields better performance than the best selected base-level learners and majority voting under some conditions.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13162
- Subject Headings
- Computer software--Testing, Computer software--Quality control, Computational learning theory, Machine learning, Digital computer simulation
- Format
- Document (PDF)
- Title
- An empirical study of source code complexity and source code modifications during testing and maintenance.
- Creator
- De Gramont, Anne H., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Since maintenance is the most expensive phase of the software life cycle, detecting most of the errors as early as possible in the software development effort can provide substantial savings. This study investigates the behavior of complexity metrics during testing and maintenance, and their relationship to modifications made to the software. Interface complexity causes most of the change activities during integration testing and maintenance, while size causes most of the changes during unit...
Show moreSince maintenance is the most expensive phase of the software life cycle, detecting most of the errors as early as possible in the software development effort can provide substantial savings. This study investigates the behavior of complexity metrics during testing and maintenance, and their relationship to modifications made to the software. Interface complexity causes most of the change activities during integration testing and maintenance, while size causes most of the changes during unit testing. Principal component analysis groups 16 complexity metrics into four domains. Changes in domain pattern are observed throughout the software life cycle. Using those domains as input, regression analysis shows that software complexity measures collected as early as the unit testing phase can identify and predict change prone modules. With a low rate of misclassification, discriminant analysis further confirms that complexity metrics provide a strong indication of the changes made to a module during testing and maintenance.
Show less - Date Issued
- 1994
- PURL
- http://purl.flvc.org/fcla/dt/15089
- Subject Headings
- Computer software--Development, Software maintenance, Source code (Computer science)
- Format
- Document (PDF)
- Title
- An improved neural net-based approach for predicting software quality.
- Creator
- Guasti, Peter John., Florida Atlantic University, Khoshgoftaar, Taghi M., Pandya, Abhijit S.
- Abstract/Description
-
Accurately predicting the quality of software is a major problem in any software development project. Software engineers develop models that provide early estimates of quality metrics which allow them to take action against emerging quality problems. Most often the predictive models are based upon multiple regression analysis which become unstable when certain data assumptions are not met. Since neural networks require no data assumptions, they are more appropriate for predicting software...
Show moreAccurately predicting the quality of software is a major problem in any software development project. Software engineers develop models that provide early estimates of quality metrics which allow them to take action against emerging quality problems. Most often the predictive models are based upon multiple regression analysis which become unstable when certain data assumptions are not met. Since neural networks require no data assumptions, they are more appropriate for predicting software quality. This study proposes an improved neural network architecture that significantly outperforms multiple regression and other neural network attempts at modeling software quality. This is demonstrated by applying this approach to several large commercial software systems. After developing neural network models, we develop regression models on the same data. We find that the neural network models surpass the regression models in terms of predictive quality on the data sets considered.
Show less - Date Issued
- 1995
- PURL
- http://purl.flvc.org/fcla/dt/15134
- Subject Headings
- Neural networks (Computer science), Computer software--Development, Computer software--Quality control, Software engineering
- Format
- Document (PDF)
- Title
- An empirical study of resource-based selection of rule-based software quality classification models.
- Creator
- Herzberg, Angela., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Software managers are under pressure to deliver reliable and high quality software, within a limited time and budget. To achieve this goal, they can be aided by different modeling techniques that allow them to predict the quality of software, so that the improvement efforts can be directed to software modules that are more likely to be fault-prone. Also, different projects have different resource availability constraints, and being able to select a model that is suitable for a specific...
Show moreSoftware managers are under pressure to deliver reliable and high quality software, within a limited time and budget. To achieve this goal, they can be aided by different modeling techniques that allow them to predict the quality of software, so that the improvement efforts can be directed to software modules that are more likely to be fault-prone. Also, different projects have different resource availability constraints, and being able to select a model that is suitable for a specific resource constraint allows software managers to direct enhancement techniques more effectively and efficiently. In our study, we use Rule-Based Modeling ( RBM) to predict the likelihood of a module being fault-prone and the Modified Expected Cost of Misclassification (MECM ) measure to select the models that are suitable, in the context of the given resource constraints. This empirical study validates MECM as a measure to select an appropriate RBM model.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12968
- Subject Headings
- Software measurement, Computer software--Quality control
- Format
- Document (PDF)
- Title
- An empirical study of module order models.
- Creator
- Adipat, Boonlit., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Most software reliability approaches classify modules as fault-prone or not fault-prone by way of a predetermined threshold. However, it may not be practical to predefine a threshold because the amount of resources for reliability enhancement may be unknown. Therefore, a module-order model (MOM) predicting the rank order of modules can be used to solve this problem. The objective of this research is to make an empirical study of MOMs based on five different underlying quantitative software...
Show moreMost software reliability approaches classify modules as fault-prone or not fault-prone by way of a predetermined threshold. However, it may not be practical to predefine a threshold because the amount of resources for reliability enhancement may be unknown. Therefore, a module-order model (MOM) predicting the rank order of modules can be used to solve this problem. The objective of this research is to make an empirical study of MOMs based on five different underlying quantitative software quality models. We examine the benefits of principal components analysis with MOM and demonstrate that better accuracy of underlying techniques does not always yield better performance with MOM. Three case studies of large industrial software systems were conducted. The results confirm that MOM can create efficient models using different underlying techniques that provide various accuracy when predicting a quantitative software quality factor over the data sets.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12783
- Subject Headings
- Computer software--Quality control, Software measurement
- Format
- Document (PDF)