Current Search: Khoshgoftaar, Taghi M. (x)
View All Items
Pages
- Title
- Modeling software quality with classification trees using principal components analysis.
- Creator
- Shan, Ruqun., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Software quality models often have raw software metrics as the input data for predicting quality. Raw metrics are usually highly correlated with one another and thus may result in unstable models. Principal components analysis is a statistical method to improve model stability. This thesis presents a series of studies on a very large legacy telecommunication system. The system has significantly more than ten million lines of code written in a high level language similar to Pascal. Software...
Show moreSoftware quality models often have raw software metrics as the input data for predicting quality. Raw metrics are usually highly correlated with one another and thus may result in unstable models. Principal components analysis is a statistical method to improve model stability. This thesis presents a series of studies on a very large legacy telecommunication system. The system has significantly more than ten million lines of code written in a high level language similar to Pascal. Software quality models were developed to predict the class of each module either as fault-prone or as not fault-prone. We found out that the models based on principal components analysis were more robust than those based on raw metrics. We also found out that software process metrics can significantly improve the predictive accuracy of software quality models.
Show less - Date Issued
- 1999
- PURL
- http://purl.flvc.org/fcla/dt/15714
- Subject Headings
- Principal components analysis, Computer software--Quality control, Software engineering
- Format
- Document (PDF)
- Title
- Modeling software quality with TREEDISC algorithm.
- Creator
- Yuan, Xiaojing, Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Software quality is crucial both to software makers and customers. However, in reality, improvement of quality and reduction of costs are often at odds. Software modeling can help us to detect fault-prone software modules based on software metrics, so that we can focus our limited resources on fewer modules and lower the cost but still achieve high quality. In the present study, a tree classification modeling technique---TREEDISC was applied to three case studies. Several major contributions...
Show moreSoftware quality is crucial both to software makers and customers. However, in reality, improvement of quality and reduction of costs are often at odds. Software modeling can help us to detect fault-prone software modules based on software metrics, so that we can focus our limited resources on fewer modules and lower the cost but still achieve high quality. In the present study, a tree classification modeling technique---TREEDISC was applied to three case studies. Several major contributions have been made. First, preprocessing of raw data was adopted to solve the computer memory problem and improve the models. Secondly, TREEDISC was thoroughly explored by examining the roles of important parameters in modeling. Thirdly, a generalized classification rule was introduced to balance misclassification rates and decrease type II error, which is considered more costly than type I error. Fourthly, certainty of classification was addressed. Fifthly, TREEDISC modeling was validated over multiple releases of software product.
Show less - Date Issued
- 1999
- PURL
- http://purl.flvc.org/fcla/dt/15718
- Subject Headings
- Computer software--Quality control, Computer simulation, Software engineering
- Format
- Document (PDF)
- Title
- Implementation of a three-group classification model using case-based reasoning.
- Creator
- Song, Huiming., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Reliability is becoming a very important and competitive factor for software products. Software quality models based on software metrics provide a systematic and scientific way to detect software faults early and to improve software reliability. Classification models for software quality usually classify observations using two groups. This thesis presents a new algorithm for classification using three groups, i.e., Three-Group Classification Model using Case Based Reasoning. The basic idea...
Show moreReliability is becoming a very important and competitive factor for software products. Software quality models based on software metrics provide a systematic and scientific way to detect software faults early and to improve software reliability. Classification models for software quality usually classify observations using two groups. This thesis presents a new algorithm for classification using three groups, i.e., Three-Group Classification Model using Case Based Reasoning. The basic idea behind the algorithm is that it uses the commonly used two-group classification method three times. This algorithm can be implemented with other techniques such as logistic regression, classification tree models, etc. This work compares its quality with the Discriminant Analysis method. We find that our new method performs much better than Discriminant Analysis. We also show that the addition of object-oriented software measures yielded a model that a practitioner may actually prefer over the simpler procedural measures model.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12816
- Subject Headings
- Software measurement, Computer software--Quality control
- Format
- Document (PDF)
- Title
- Partitioning filter approach to noise elimination: An empirical study in software quality classification.
- Creator
- Rebours, Pierre., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
This thesis presents two new noise filtering techniques which improve the quality of training datasets by removing noisy data. The training dataset is first split into subsets, and base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. The Multiple-Partitioning Filter combines several classifiers on each split. The Iterative-Partitioning Filter only uses...
Show moreThis thesis presents two new noise filtering techniques which improve the quality of training datasets by removing noisy data. The training dataset is first split into subsets, and base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. The Multiple-Partitioning Filter combines several classifiers on each split. The Iterative-Partitioning Filter only uses one base learner, but goes through multiple iterations. The amount of noise removed is varied by tuning the filtering level or the number of iterations. Empirical studies on a high assurance software project compare the effectiveness of our noise removal approaches with two other filters, the Cross-Validation Filter and the Ensemble Filter. Our studies suggest that using several base classifiers as well as performing several iterations with a conservative scheme may improve the efficiency of the filter.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13110
- Subject Headings
- Software measurement, Computer software--Quality control, Decision trees, Recursive partitioning
- Format
- Document (PDF)
- Title
- Techniques for combining binary classifiers: A comparative study in network intrusion detection systems.
- Creator
- Lin, Hua., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
We discuss a set of indirect combining techniques for addressing multi-category classification problems that have been used in many domains, but not for intrusion detection systems. In contrast to the indirect combining techniques, direct techniques generally extend associated binary classifiers to handle multi-category classification problems. An indirect combining technique decomposes the original multi-category problem into, based on some criteria, multiple binary-category problems. We...
Show moreWe discuss a set of indirect combining techniques for addressing multi-category classification problems that have been used in many domains, but not for intrusion detection systems. In contrast to the indirect combining techniques, direct techniques generally extend associated binary classifiers to handle multi-category classification problems. An indirect combining technique decomposes the original multi-category problem into, based on some criteria, multiple binary-category problems. We investigated two different approaches for building the binary classifiers. The results of the binary classifiers are then merged using a combining technique---three different combining techniques were studied. We implement some of the indirect combining techniques proposed in recent literature, and apply them to a case study of the DARPA KDD-1999 network intrusion detection project. The results demonstrate the usefulness of using indirect combining techniques for the multi-category classification problem of intrusion detection systems.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13111
- Subject Headings
- Computer networks--Security measures, Computer security, Computers--Access control, Electronic countermeasures, Fuzzy systems
- Format
- Document (PDF)
- Title
- An empirical study of analogy-based software fault prediction.
- Creator
- Sundaresh, Nandini., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Ensuring quality and reliability in software is important with its growing use in day to day life. Having an estimate of the number of faults in software modules early in their life cycles will enable software project managers to direct testing efforts in those considered risky and reduce the waste of resources in testing the entire software system. Case-based reasoning, abbreviated CBR, is one of the methods which predicts the number of faults in a software. The scope of this thesis is two...
Show moreEnsuring quality and reliability in software is important with its growing use in day to day life. Having an estimate of the number of faults in software modules early in their life cycles will enable software project managers to direct testing efforts in those considered risky and reduce the waste of resources in testing the entire software system. Case-based reasoning, abbreviated CBR, is one of the methods which predicts the number of faults in a software. The scope of this thesis is two-fold. First, it empirically investigates the effects of the different factors on the predictive accuracy of CBR. Experiments were done to compare different similarity functions, solution processes, and maximum number of nearest neighbors. Second, it compares the predictive accuracy of CBR models with multiple linear regression and artificial neural network models. The average absolute error and average relative error are used to determine the model with a high accuracy of prediction.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12749
- Subject Headings
- Computer software--Quality control, Software measurement
- Format
- Document (PDF)
- Title
- Video and Image Analysis using Statistical and Machine Learning Techniques.
- Creator
- Luo, Qiming, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Digital videos and images are effective media for capturing spatial and ternporal information in the real world. The rapid growth of digital videos has motivated research aimed at developing effective algorithms, with the objective of obtaining useful information for a variety of application areas, such as security, commerce, medicine, geography, etc. This dissertation presents innovative and practical techniques, based on statistics and machine learning, that address some key research...
Show moreDigital videos and images are effective media for capturing spatial and ternporal information in the real world. The rapid growth of digital videos has motivated research aimed at developing effective algorithms, with the objective of obtaining useful information for a variety of application areas, such as security, commerce, medicine, geography, etc. This dissertation presents innovative and practical techniques, based on statistics and machine learning, that address some key research problems in video and image analysis, including video stabilization, object classification, image segmentation, and video indexing. A novel unsupervised multi-scale color image segmentation algorithm is proposed. The basic idea is to apply mean shift clustering to obtain an over-segmentation, and then merge regions at multiple scales to minimize the MDL criterion. The performance on the Berkeley segmentation benchmark compares favorably with some existing approaches. This algorithm can also operate on one-dimensional feature vectors representing each frame in ocean survey videos, which results in a novel framework for building a hierarchical video index. The advantage is to provide the user with the flexibility of browsing the videos at arbitrary levels of detail, which makes it more efficient for users to browse a long video in order to find interesting information based on the hierarchical index. Also, an empirical study on classification of ships in surveillance videos is presented. A comparative performance study on three classification algorithms is conducted. Based on this study, an effective feature extraction and classification algorithm for classifying ships in coastline surveillance videos is proposed. Finally, an empirical study on video stabilization is presented, which includes a comparative performance study on four motion estimation methods and three motion correction methods. Based on this study, an effective real-time video stabilization algorithm for coastline surveillance is proposed, which involves a novel approach to reduce error accumulation.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012574
- Subject Headings
- Image processing--Digital techniques, Electronic surveillance, Computational learning theory
- Format
- Document (PDF)
- Title
- A comprehensive comparative study of multiple classification techniques for software quality estimation.
- Creator
- Puppala, Kishore., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Reliability and quality are desired features in industrial software applications. In some cases, they are absolutely essential. When faced with limited resources, software project managers will need to allocate such resources to the most fault prone areas. The ability to accurately classify a software module as fault-prone or not fault-prone enables the manager to make an informed resource allocation decision. An accurate quality classification avoids wasting resources on modules that are not...
Show moreReliability and quality are desired features in industrial software applications. In some cases, they are absolutely essential. When faced with limited resources, software project managers will need to allocate such resources to the most fault prone areas. The ability to accurately classify a software module as fault-prone or not fault-prone enables the manager to make an informed resource allocation decision. An accurate quality classification avoids wasting resources on modules that are not fault-prone. It also avoids missing the opportunity to correct faults relatively early in the development cycle, when they are less costly. This thesis seeks to introduce the classification algorithms (classifiers) that are implemented in the WEKA software tool. WEKA (Waikato Environment for Knowledge Analysis) was developed at the University of Waikato in New Zealand. An empirical investigation is performed using a case study at a real-world system.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/13039
- Subject Headings
- Software engineering, Computer software--Quality control, Decision trees
- Format
- Document (PDF)
- Title
- A comparative study of classification algorithms for network intrusion detection.
- Creator
- Wang, Yunling., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
As network-based computer systems play increasingly vital roles in modern society, they have become the targets of criminals. Network security has never been more important a subject than in today's extensively interconnected computer world. Intrusion Detection Systems (IDS) have been used along with the data mining techniques to detect intrusions. In this thesis, we present a comparative study of intrusion detection using a decision-tree learner (C4.5), two rule-based learners (ripper and...
Show moreAs network-based computer systems play increasingly vital roles in modern society, they have become the targets of criminals. Network security has never been more important a subject than in today's extensively interconnected computer world. Intrusion Detection Systems (IDS) have been used along with the data mining techniques to detect intrusions. In this thesis, we present a comparative study of intrusion detection using a decision-tree learner (C4.5), two rule-based learners (ripper and ridor), a learner to combine decision trees and rules (PART), and two instance-based learners (IBK and Nnge). We investigate and compare the performance of IDSs based on the six techniques, with respect to a case study of the DAPAR KDD-1999 network intrusion detection project. Investigation results demonstrated that data mining techniques are very useful in the area of intrusion detection.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13102
- Subject Headings
- Computer networks--Security measures, Data mining, Decision trees
- Format
- Document (PDF)
- Title
- Fuzzy logic techniques for software reliability engineering.
- Creator
- Xu, Zhiwei., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Modern people are becoming more and more dependent on computers in their daily lives. Most industries, from automobile, avionics, oil, and telecommunications to banking, stocks, and pharmaceuticals, require computers to function. As the tasks required become more complex, the complexity of computer software and hardware has increased dramatically. As a consequence, the possibility of failure increases. As the requirements for and dependence on computers increases, the possibility of crises...
Show moreModern people are becoming more and more dependent on computers in their daily lives. Most industries, from automobile, avionics, oil, and telecommunications to banking, stocks, and pharmaceuticals, require computers to function. As the tasks required become more complex, the complexity of computer software and hardware has increased dramatically. As a consequence, the possibility of failure increases. As the requirements for and dependence on computers increases, the possibility of crises caused by computer failures also increases. High reliability is an important attribute for almost any software system. Consequently, software developers are seeking ways to forecast and improve quality before release. Since many quality factors cannot be measured until after the software becomes operational, software quality models are developed to predict quality factors based on measurements collected earlier in the life cycle. Due to incomplete information in the early life cycle of software development, software quality models with fuzzy characteristics usually perform better because fuzzy concepts deal with phenomenon that is vague in nature. This study focuses on the usage of fuzzy logic in software reliability engineering. Discussing will include the fuzzy expert systems and the application of fuzzy expert systems in early risk assessment; introducing the interval prediction using fuzzy regression modeling; demonstrating fuzzy rule extraction for fuzzy classification and its usage in software quality models; demonstrating the fuzzy identification, including extraction of both rules and membership functions from fuzzy data and applying the technique to software project cost estimations. The following methodologies were considered: nonparametric discriminant analysis, Z-test and paired t-test, neural networks, fuzzy linear regression, fuzzy nonlinear regression, fuzzy classification with maximum matched method, fuzzy identification with fuzzy clustering, and fuzzy projection. Commercial software systems and the COCOMO database are used throughout this dissertation to demonstrate the usefulness of concepts and to validate new ideas.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/11948
- Subject Headings
- Software engineering, Fuzzy logic, Computer software--Quality control, Fuzzy systems
- Format
- Document (PDF)
- Title
- Count models for software quality estimation.
- Creator
- Gao, Kehan, Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The primary aim of software engineering is to produce quality software that is delivered on time, within budget, and fulfils all its requirements. A timely estimation of software quality can serve as a prerequisite in achieving high reliability of software-based systems. More specifically, software quality assurance efforts can be prioritized for targeting program modules that are most likely to have a high number of faults. Software quality estimation models are generally of two types: a...
Show moreThe primary aim of software engineering is to produce quality software that is delivered on time, within budget, and fulfils all its requirements. A timely estimation of software quality can serve as a prerequisite in achieving high reliability of software-based systems. More specifically, software quality assurance efforts can be prioritized for targeting program modules that are most likely to have a high number of faults. Software quality estimation models are generally of two types: a classification model that predicts the class membership of modules into two or more quality-based classes, and a quantitative prediction model that estimates the number of faults (or some other software quality factor) that are likely to occur in software modules. In the literature, a variety of techniques have been developed for software quality estimation, most of which are suited for either prediction or classification but not for both, e.g., the multiple linear regression (only for prediction) and logistic regression (only for classification).
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/12042
- Subject Headings
- Computer software--Quality control, Software engineering, Econometrics, Regression analysis
- Format
- Document (PDF)
- Title
- MACHINE LEARNING ALGORITHMS FOR THE DETECTION AND ANALYSIS OF WEB ATTACKS.
- Creator
- Zuech, Richard, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The Internet has provided humanity with many great benefits, but it has also introduced new risks and dangers. E-commerce and other web portals have become large industries with big data. Criminals and other bad actors constantly seek to exploit these web properties through web attacks. Being able to properly detect these web attacks is a crucial component in the overall cybersecurity landscape. Machine learning is one tool that can assist in detecting web attacks. However, properly using...
Show moreThe Internet has provided humanity with many great benefits, but it has also introduced new risks and dangers. E-commerce and other web portals have become large industries with big data. Criminals and other bad actors constantly seek to exploit these web properties through web attacks. Being able to properly detect these web attacks is a crucial component in the overall cybersecurity landscape. Machine learning is one tool that can assist in detecting web attacks. However, properly using machine learning to detect web attacks does not come without its challenges. Classification algorithms can have difficulty with severe levels of class imbalance. Class imbalance occurs when one class label disproportionately outnumbers another class label. For example, in cybersecurity, it is common for the negative (normal) label to severely outnumber the positive (attack) label. Another difficulty encountered in machine learning is models can be complex, thus making it difficult for even subject matter experts to truly understand a model’s detection process. Moreover, it is important for practitioners to determine which input features to include or exclude in their models for optimal detection performance. This dissertation studies machine learning algorithms in detecting web attacks with big data. Severe class imbalance is a common problem in cybersecurity, and mainstream machine learning research does not sufficiently consider this with web attacks. Our research first investigates the problems associated with severe class imbalance and rarity. Rarity is an extreme form of class imbalance where the positive class suffers extremely low positive class count, thus making it difficult for the classifiers to discriminate. In reducing imbalance, we demonstrate random undersampling can effectively mitigate the class imbalance and rarity problems associated with web attacks. Furthermore, our research introduces a novel feature popularity technique which produces easier to understand models by only including the fewer, most popular features. Feature popularity granted us new insights into the web attack detection process, even though we had already intensely studied it. Even so, we proceed cautiously in selecting the best input features, as we determined that the “most important” Destination Port feature might be contaminated by lopsided traffic distributions.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013823
- Subject Headings
- Machine learning, Computer security, Algorithms, Cybersecurity
- Format
- Document (PDF)
- Title
- COLLECTION AND ANALYSIS OF SLOW DENIAL OF SERVICE ATTACKS USING MACHINE LEARNING ALGORITHMS.
- Creator
- Kemp, Clifford, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Application-layer based attacks are becoming a more desirable target in computer networks for hackers. From complex rootkits to Denial of Service (DoS) attacks, hackers look to compromise computer networks. Web and application servers can get shut down by various application-layer DoS attacks, which exhaust CPU or memory resources. The HTTP protocol has become a popular target to launch application-layer DoS attacks. These exploits consume less bandwidth than traditional DoS attacks....
Show moreApplication-layer based attacks are becoming a more desirable target in computer networks for hackers. From complex rootkits to Denial of Service (DoS) attacks, hackers look to compromise computer networks. Web and application servers can get shut down by various application-layer DoS attacks, which exhaust CPU or memory resources. The HTTP protocol has become a popular target to launch application-layer DoS attacks. These exploits consume less bandwidth than traditional DoS attacks. Furthermore, this type of DoS attack is hard to detect because its network traffic resembles legitimate network requests. Being able to detect these DoS attacks effectively is a critical component of any robust cybersecurity system. Machine learning can help detect DoS attacks by identifying patterns in network traffic. With machine learning methods, predictive models can automatically detect network threats. This dissertation offers a novel framework for collecting several attack datasets on a live production network, where producing quality representative data is a requirement. Our approach builds datasets from collected Netflow and Full Packet Capture (FPC) data. We evaluate a wide range of machine learning classifiers which allows us to analyze slow DoS detection models more thoroughly. To identify attacks, we look at each dataset's unique traffic patterns and distinguishing properties. This research evaluates and investigates appropriate feature selection evaluators and search strategies. Features are assessed for their predictive value and degree of redundancy to build a subset of features. Feature subsets with high-class correlation but low intercorrelation are favored. Experimental results indicate Netflow and FPC features are discriminating enough to detect DoS attacks accurately. We conduct a comparative examination of performance metrics to determine the capability of several machine learning classifiers. Additionally, we improve upon our performance scores by investigating a variety of feature selection optimization strategies. Overall, this dissertation proposes a novel machine learning approach for detecting slow DoS attacks. Our machine learning results demonstrate that a single subset of features trained on Netflow data can effectively detect slow application-layer DoS attacks.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013848
- Subject Headings
- Machine learning, Algorithms, Denial of service attacks
- Format
- Document (PDF)
- Title
- Software reliability engineering with genetic programming.
- Creator
- Liu, Yi., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Software reliability engineering plays a vital role in managing and controlling software quality. As an important method of software reliability engineering, software quality estimation modeling is useful in defining a cost-effective strategy to achieve a reliable software system. By predicting the faults in a software system, the software quality models can identify high-risk modules, and thus, these high-risk modules can be targeted for reliability enhancements. Strictly speaking, software...
Show moreSoftware reliability engineering plays a vital role in managing and controlling software quality. As an important method of software reliability engineering, software quality estimation modeling is useful in defining a cost-effective strategy to achieve a reliable software system. By predicting the faults in a software system, the software quality models can identify high-risk modules, and thus, these high-risk modules can be targeted for reliability enhancements. Strictly speaking, software quality modeling not only aims at lowering the misclassification rate, but also takes into account the costs of different misclassifications and the available resources of a project. As a new search-based algorithm, Genetic Programming (GP) can build a model without assuming the size, shape, or structure of a model. It can flexibly tailor the fitness functions to the objectives chosen by the customers. Moreover, it can optimize several objectives simultaneously in the modeling process, and thus, a set of multi-objective optimization solutions can be obtained. This research focuses on building software quality estimation models using GP. Several GP-based models of predicting the class membership of each software module and ranking the modules by a quality factor were proposed. The first model of categorizing the modules into fault-prone or not fault-prone was proposed by considering the distinguished features of the software quality classification task and GP. The second model provided quality-based ranking information for fault-prone modules. A decision tree-based software classification model was also proposed by considering accuracy and simplicity simultaneously. This new technique provides a new multi-objective optimization algorithm to build decision trees for real-world engineering problems, in which several trade-off objectives usually have to be taken into account at the same time. The fourth model was built to find multi-objective optimization solutions by considering both the expected cost of misclassification and available resources. Also, a new goal-oriented technique of building module-order models was proposed by directly optimizing several goals chosen by project analysts. The issues of GP , bloating and overfitting, were also addressed in our research. Data were collected from three industrial projects, and applied to validate the performance of the models. Results indicate that our proposed methods can achieve useful performance results. Moreover, some proposed methods can simultaneously optimize several different objectives of a software project management team.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fau/fd/FADT12047
- Subject Headings
- Computer software--Quality control, Genetic programming (Computer science), Software engineering
- Format
- Document (PDF)
- Title
- Software quality classification using rule-based modeling.
- Creator
- Mao, Meihui., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Software-based products are part of our daily life. They can be encountered in most of the systems we interact with. This reliance on software products generates a strong need for better software reliability, reducing the cost associated with potential failures. Reliability in software systems may be achieved by using additional testing. However, extensive software testing is expensive and time consuming. Software quality classification models provide an early prediction of a module's quality...
Show moreSoftware-based products are part of our daily life. They can be encountered in most of the systems we interact with. This reliance on software products generates a strong need for better software reliability, reducing the cost associated with potential failures. Reliability in software systems may be achieved by using additional testing. However, extensive software testing is expensive and time consuming. Software quality classification models provide an early prediction of a module's quality. Boolean Discriminant Function (BDF), Generalized Boolean Discriminant Function (GBDF), and Rule-Based Modeling (RBM) can be used as classification models. This thesis demonstrates the ability of GBDF and RBM to correctly classify modules. The introduction of the AND operator in the GBDF model and the customizable outcomes for the rules in RBM, enhanced the discriminating quality of GBDF and RBM as compared to BDF. Furthermore, they also yielded better balances for the misclassification rates.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12886
- Subject Headings
- Computer software--Quality control, Software measurement
- Format
- Document (PDF)
- Title
- Three-group software quality classification modeling with TREEDISC algorithm.
- Creator
- Liu, Yongbin., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Maintaining superior quality and reliability of software systems is important nowadays. Software quality modeling detects fault-prone modules and enables us to achieve high quality in software system by focusing on fewer modules, because of limited resources and budget. Tree-based modeling is a simple and effective method that predicts the fault proneness in software systems. In this thesis, we introduce TREEDISC modeling technique with a three-group classification rule to predict the quality...
Show moreMaintaining superior quality and reliability of software systems is important nowadays. Software quality modeling detects fault-prone modules and enables us to achieve high quality in software system by focusing on fewer modules, because of limited resources and budget. Tree-based modeling is a simple and effective method that predicts the fault proneness in software systems. In this thesis, we introduce TREEDISC modeling technique with a three-group classification rule to predict the quality of software modules. A general classification rule is applied and validated. The three impact parameters, group number, minimum leaf size and significant level, are thoroughly evaluated. An optimization procedure is conducted and empirical results are presented. Conclusions about the impact factors as well as the robustness of our research are performed. TREEDISC modeling technique with three-group classification has proved to be an efficient and convincing method in software quality control.
Show less - Date Issued
- 2003
- PURL
- http://purl.flvc.org/fcla/dt/13008
- Subject Headings
- Computer software--Quality control, Software measurement, Decision trees
- Format
- Document (PDF)
- Title
- MACHINE LEARNING ALGORITHMS FOR PREDICTING BOTNET ATTACKS IN IOT NETWORKS.
- Creator
- Leevy, Joffrey, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The proliferation of Internet of Things (IoT) devices in various networks is being matched by an increase in related cybersecurity risks. To help counter these risks, big datasets such as Bot-IoT were designed to train machine learning algorithms on network-based intrusion detection for IoT devices. From a binary classification perspective, there is a high-class imbalance in Bot-IoT between each of the attack categories and the normal category, and also between the combined attack categories...
Show moreThe proliferation of Internet of Things (IoT) devices in various networks is being matched by an increase in related cybersecurity risks. To help counter these risks, big datasets such as Bot-IoT were designed to train machine learning algorithms on network-based intrusion detection for IoT devices. From a binary classification perspective, there is a high-class imbalance in Bot-IoT between each of the attack categories and the normal category, and also between the combined attack categories and the normal category. Within the scope of predicting botnet attacks in IoT networks, this dissertation demonstrates the usefulness and efficiency of novel machine learning methods, such as an easy-to-classify method and a unique set of ensemble feature selection techniques. The focus of this work is on the full Bot-IoT dataset, as well as each of the four attack categories of Bot-IoT, namely, Denial-of-Service (DoS), Distributed Denial-of-Service (DDoS), Reconnaissance, and Information Theft. Since resources and services become inaccessible during DoS and DDoS attacks, this interruption is costly to an organization in terms of both time and money. Reconnaissance attacks often signify the first stage of a cyberattack and preventing them from occurring usually means the end of the intended cyberattack. Information Theft attacks not only erode consumer confidence but may also compromise intellectual property and national security. For the DoS experiment, the ensemble feature selection approach led to the best performance, while for the DDoS experiment, the full set of Bot-IoT features resulted in the best performance. Regarding the Reconnaissance experiment, the ensemble feature selection approach effected the best performance. In relation to the Information Theft experiment, the ensemble feature selection techniques did not affect performance, positively or negatively. However, the ensemble feature selection approach is recommended for this experiment because feature reduction eases computational burden and may provide clarity through improved data visualization. For the full Bot-IoT big dataset, an explainable machine learning approach was taken using the Decision Tree classifier. An easy-to-learn Decision Tree model for predicting attacks was obtained with only three features, which is a significant result for big data.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FA00013933
- Subject Headings
- Machine learning, Internet of things--Security measures, Big data, Intrusion detection systems (Computer security)
- Format
- Document (PDF)
- Title
- A COMPARATIVE STUDY OF STRUCTURED VERSUS UNSTRUCTURED TEXT DATA.
- Creator
- Cardenas, Erika, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
In today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of...
Show moreIn today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of structured versus unstructured text data in two different applications. The first application is in the field of real estate. We compare the performance of tabular real-estate data and unstructured text descriptions of homes to predict the house price. The second application is in translating Electronic Health Records (EHR) tabular data to text data for survival classification of COVID-19 patients. Lastly, we present a range of strategies and perspectives for future research.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014220
- Subject Headings
- Natural language processing (Computer science), Text data mining
- Format
- Document (PDF)
- Title
- DATA AUGMENTATION IN DEEP LEARNING.
- Creator
- Shorten, Connor, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Recent successes of Deep Learning-powered AI are largely due to the trio of: algorithms, GPU computing, and big data. Data could take the shape of hospital records, satellite images, or the text in this paragraph. Deep Learning algorithms typically need massive collections of data before they can make reliable predictions. This limitation inspired investigation into a class of techniques referred to as Data Augmentation. Data Augmentation was originally developed as a set of label-preserving...
Show moreRecent successes of Deep Learning-powered AI are largely due to the trio of: algorithms, GPU computing, and big data. Data could take the shape of hospital records, satellite images, or the text in this paragraph. Deep Learning algorithms typically need massive collections of data before they can make reliable predictions. This limitation inspired investigation into a class of techniques referred to as Data Augmentation. Data Augmentation was originally developed as a set of label-preserving transformations used in order to simulate large datasets from small ones. For example, imagine developing a classifier that categorizes images as either a “cat” or a “dog”. After initial collection and labeling, there may only be 500 of these images, which are not enough data points to train a Deep Learning model. By transforming these images with Data Augmentations such as rotations and brightness modifications, more labeled images are available for model training and classification! In addition to applications for learning from limited labeled data, Data Augmentation can also be used for generalization testing. For example, we can augment the test set to set the visual style of images to “winter” and see how that impacts the performance of a stop sign detector. The dissertation begins with an overview of Deep Learning methods such as neural network architectures, gradient descent optimization, and generalization testing. Following an initial description of this technology, the dissertation explains overfitting. Overfitting is the crux of Deep Learning methods in which improvements to the training set do not lead to improvements on the testing set. To the rescue are Data Augmentation techniques, of which the Dissertation presents an overview of the augmentations used for both image and text data, as well as the promising potential of generative data augmentation with models such as ChatGPT. The dissertation then describes three major experimental works revolving around CIFAR-10 image classification, language modeling a novel dataset of Keras information, and patient survival classification from COVID-19 Electronic Health Records. The dissertation concludes with a reflection on the evolution of limitations of Deep Learning and directions for future work.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014228
- Subject Headings
- Deep learning (Machine learning), Artificial intelligence, Data augmentation
- Format
- Document (PDF)
- Title
- An Exploration into Synthetic Data and Generative Aversarial Networks.
- Creator
- Shorten, Connor M., Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
This Thesis surveys the landscape of Data Augmentation for image datasets. Completing this survey inspired further study into a method of generative modeling known as Generative Adversarial Networks (GANs). A survey on GANs was conducted to understood recent developments and the problems related to training them. Following this survey, four experiments were proposed to test the application of GANs for data augmentation and to contribute to the quality improvement in GAN-generated data....
Show moreThis Thesis surveys the landscape of Data Augmentation for image datasets. Completing this survey inspired further study into a method of generative modeling known as Generative Adversarial Networks (GANs). A survey on GANs was conducted to understood recent developments and the problems related to training them. Following this survey, four experiments were proposed to test the application of GANs for data augmentation and to contribute to the quality improvement in GAN-generated data. Experimental results demonstrate the effectiveness of GAN-generated data as a pre-training metric. The other experiments discuss important characteristics of GAN models such as the refining of prior information, transferring generative models from large datasets to small data, and automating the design of Deep Neural Networks within the context of the GAN framework. This Thesis will provide readers with a complete introduction to Data Augmentation and Generative Adversarial Networks, as well as insights into the future of these techniques.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013263
- Subject Headings
- Neural networks (Computer science), Computer vision, Images, Generative adversarial networks, Data sets
- Format
- Document (PDF)