Current Search: Khoshgoftaar, Taghi M. (x)
View All Items
Pages
- Title
- Ensemble-classifier approach to noise elimination: A case study in software quality classification.
- Creator
- Joshi, Vedang H., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
This thesis presents a noise handling technique that attempts to improve the quality of training data for classification purposes by eliminating instances that are likely to be noise. Our approach uses twenty five different classification techniques to create an ensemble of classifiers that acts as a noise filter on real-world software measurement datasets. Using a relatively large number of base-level classifiers for the ensemble-classifier filter facilitates in achieving the desired level...
Show moreThis thesis presents a noise handling technique that attempts to improve the quality of training data for classification purposes by eliminating instances that are likely to be noise. Our approach uses twenty five different classification techniques to create an ensemble of classifiers that acts as a noise filter on real-world software measurement datasets. Using a relatively large number of base-level classifiers for the ensemble-classifier filter facilitates in achieving the desired level of noise removal conservativeness with several possible levels of filtering. It also provides a higher degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possible) inappropriate learning bias of a few algorithms with twenty five base-level classifiers than with a relatively smaller number of base-level classifiers. Empirical case studies of two different high assurance software projects demonstrate the effectiveness of our noise elimination approach by the significant improvement achieved in classification accuracies at various levels of filtering.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13144
- Subject Headings
- Computer interfaces--Software--Quality control, Acoustical engineering, Noise control--Case studies, Expert systems (Computer science), Software documentation
- Format
- Document (PDF)
- Title
- Improved models of software quality.
- Creator
- Szabo, Robert Michael., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Though software development has been evolving for over 50 years, the development of computer software systems has largely remained an art. Through the application of measurable and repeatable processes, efforts have been made to slowly transform the software development art into a rigorous engineering discipline. The potential gains are tremendous. Computer software pervades modern society in many forms. For example, the automobile, radio, television, telephone, refrigerator, and still-camera...
Show moreThough software development has been evolving for over 50 years, the development of computer software systems has largely remained an art. Through the application of measurable and repeatable processes, efforts have been made to slowly transform the software development art into a rigorous engineering discipline. The potential gains are tremendous. Computer software pervades modern society in many forms. For example, the automobile, radio, television, telephone, refrigerator, and still-camera have all been transformed by the introduction of computer based controls. The quality of these everyday products is in part determined by the quality of the computer software running inside them. Therefore, the timely delivery of low-cost and high-quality software to enable these mass market products becomes very important to the long term success of the companies building them. It is not surprising that managing the number of faults in computer software to competitive levels is a prime focus of the software engineering activity. In support of this activity, many models of software quality have been developed to help control the software development process and ensure that our goals of cost and quality are met on time. In this study, we focus on the software quality modeling activity. We improve existing static and dynamic methodologies and demonstrate new ones in a coordinated attempt to provide engineering methods applicable to the development of computer software. We will show how the power of separate predictive and classification models of software quality may be combined into one model; introduce a three group fault classification model in the object-oriented paradigm; demonstrate a dynamic modeling methodology of the testing process and show how software product measures and software process measures may be incorporated as input to such a model; demonstrate a relationship between software product measures and the testability of software. The following methodologies were considered: principal components analysis, multiple regression analysis, Poisson regression analysis, discriminant analysis, time series analysis, and neural networks. Commercial grade software systems are used throughout this dissertation to demonstrate concepts and validate new ideas. As a result, we hope to incrementally advance the state of the software engineering "art".
Show less - Date Issued
- 1995
- PURL
- http://purl.flvc.org/fcla/dt/12409
- Subject Headings
- Software engineering--Standards, Software engineering--Management, Computer software--Development, Computer software--Quality control
- Format
- Document (PDF)
- Title
- Performance analysis of a new object-based I/O architecture for PCs and workstations.
- Creator
- Huynh, Khoa Dang., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In this dissertation, an object-based I/O architecture for personal computers (PCs) and workstations is proposed. The proposed architecture allows the flexibility of having I/O processing performed as much as possible by intelligent I/O adapters, or by the host processor, or by any processor in the system, depending on application requirements and underlying hardware capabilities. It keeps many good features of current I/O architectures, while providing more flexibility to take advantage of...
Show moreIn this dissertation, an object-based I/O architecture for personal computers (PCs) and workstations is proposed. The proposed architecture allows the flexibility of having I/O processing performed as much as possible by intelligent I/O adapters, or by the host processor, or by any processor in the system, depending on application requirements and underlying hardware capabilities. It keeps many good features of current I/O architectures, while providing more flexibility to take advantage of new hardware technologies, promote architectural openness, provide better performance and higher reliability. The proposed architecture introduces a new definition of I/O subsystems and makes use of concurrent object-oriented technology. It combines the notions of object and thread into something called an active object. All concurrency abstractions required by the proposed architecture are provided through external libraries on top of existing sequential object-oriented languages, without any changes to the syntax and semantics of these languages. We also evaluate the performance of optimal implementations of the proposed I/O architecture against other I/O architectures in three popular, PC-based, distributed environments: network file server, video server, and video conferencing. Using the RESearch Queueing Modeling Environment (RESQME), we have developed detailed simulation models for various implementations of the proposed I/O architecture and two other existing I/O architectures: a conventional, interrupt-based I/O architecture and a peer-to-peer I/O architecture. Our simulation results indicate that, on several different hardware platforms, the proposed I/O architecture outperforms both existing architectures in all three distributed environments considered.
Show less - Date Issued
- 1994
- PURL
- http://purl.flvc.org/fcla/dt/12386
- Subject Headings
- Local area networks (Computer networks), Computer input-output equipment, Computer networks, Videoconferencing, Client/server computing, Object-oriented programming (Computer science)
- Format
- Document (PDF)
- Title
- Prediction of software quality using classification tree modeling.
- Creator
- Naik, Archana B., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Reliability of software systems is one of the major concerns in today's world as computers have really become an integral part of our lives. Society has become so dependent on reliable software systems that failures can be dangerous in terms of worsening a company's business, human relationships or affecting human lives. Software quality models are tools for focusing efforts to find faults early in the development. In this experiment, we used classification tree modeling techniques to predict...
Show moreReliability of software systems is one of the major concerns in today's world as computers have really become an integral part of our lives. Society has become so dependent on reliable software systems that failures can be dangerous in terms of worsening a company's business, human relationships or affecting human lives. Software quality models are tools for focusing efforts to find faults early in the development. In this experiment, we used classification tree modeling techniques to predict the software quality by classifying program modules either as fault-prone or not fault-prone. We introduced the Classification And Regression Trees (scCART) algorithm as a tool to generate classification trees. We focused our experiments on very large telecommunications system to build quality models using set of product and process metrics as independent variables.
Show less - Date Issued
- 1998
- PURL
- http://purl.flvc.org/fcla/dt/15600
- Subject Headings
- Computer software--Quality control, Computer software--Evaluation, Software measurement
- Format
- Document (PDF)
- Title
- Multivariate modeling of software engineering measures.
- Creator
- Lanning, David Lee., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
One goal of software engineers is to produce software products. An additional goal, that the software production must lead to profit, releases the power of the software product market. This market demands high quality products and tight cycles in the delivery of new and enhanced products. These market conditions motivate the search for engineering methods that help software producers ship products quicker, at lower cost, and with fewer defects. The control of software defects is key to...
Show moreOne goal of software engineers is to produce software products. An additional goal, that the software production must lead to profit, releases the power of the software product market. This market demands high quality products and tight cycles in the delivery of new and enhanced products. These market conditions motivate the search for engineering methods that help software producers ship products quicker, at lower cost, and with fewer defects. The control of software defects is key to meeting these market conditions. Thus, many software engineering tasks are concerned with software defects. This study considers two sources of variation in the distribution of software defects: software complexity and enhancement activity. Multivariate techniques treat defect activity, software complexity, and enhancement activity as related multivariate concepts. Applied techniques include principal components analysis, canonical correlation analysis, discriminant analysis, and multiple regression analysis. The objective of this study is to improve our understanding of software complexity and software enhancement activity as sources of variation in defect activity, and to apply this understanding to produce predictive and discriminant models useful during testing and maintenance tasks. These models serve to support critical software engineering decisions.
Show less - Date Issued
- 1994
- PURL
- http://purl.flvc.org/fcla/dt/12383
- Subject Headings
- Software engineering, Computer software--Testing, Computer software--Quality control
- Format
- Document (PDF)
- Title
- An empirical study of analogy-based software quality classification models.
- Creator
- Ross, Fletcher Douglas., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Time and cost are among the most important elements in a software project. By efficiently using time and resources we can reduce costs. Any program can potentially contain faults. If we can identify those program modules that have better quality and are less likely to be fault-prone, then we can reduce the effort and cost required in testing these modules. This thesis presents a series of studies evaluating the use of Case-Based Reasoning (CBR ) as an effective method for classifying program...
Show moreTime and cost are among the most important elements in a software project. By efficiently using time and resources we can reduce costs. Any program can potentially contain faults. If we can identify those program modules that have better quality and are less likely to be fault-prone, then we can reduce the effort and cost required in testing these modules. This thesis presents a series of studies evaluating the use of Case-Based Reasoning (CBR ) as an effective method for classifying program modules based upon their quality. We believe that this is the first time that the mahalanobis distance, a distance measure utilizing the covariance matrix of the independent variables which accounts for the multi-colinearity of the data without the necessity for preprocessing, and data clustering, wherein the data was separated into groups based on a dependent variable have been used as modeling techniques in conjunction with (CBR).
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12817
- Subject Headings
- Modular programming, Computer software--Quality control, Software measurement
- Format
- Document (PDF)
- Title
- A comparative study of attribute selection techniques for CBR-based software quality classification models.
- Creator
- Nguyen, Laurent Quoc Viet., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
To achieve high reliability in software-based systems, software metrics-based quality classification models have been explored in the literature. However, the collection of software metrics may be a hard and long process, and some metrics may not be helpful or may be harmful to the classification models, deteriorating the models' accuracies. Hence, methodologies have been developed to select the most significant metrics in order to build accurate and efficient classification models. Case...
Show moreTo achieve high reliability in software-based systems, software metrics-based quality classification models have been explored in the literature. However, the collection of software metrics may be a hard and long process, and some metrics may not be helpful or may be harmful to the classification models, deteriorating the models' accuracies. Hence, methodologies have been developed to select the most significant metrics in order to build accurate and efficient classification models. Case-Based Reasoning is the classification technique used in this thesis. Since it does not provide any metric selection mechanisms, some metric selection techniques were studied. In the context of CBR, this thesis presents a comparative evaluation of metric selection methodologies, for raw and discretized data. Three attribute selection techniques have been studied: Kolmogorov-Smirnov Two-Sample Test, Kruskal-Wallis Test, and Information Gain. These techniques resulted in classification models that are useful for software quality improvement.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12944
- Subject Headings
- Case-based reasoning, Software engineering, Computer software--Quality control
- Format
- Document (PDF)
- Title
- Information theory and software measurement.
- Creator
- Allen, Edward B., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Development of reliable, high quality, software requires study and understanding at each step of the development process. A basic assumption in the field of software measurement is that metrics of internal software attributes somehow relate to the intrinsic difficulty in understanding a program. Measuring the information content of a program attempts to indirectly quantify the comprehension task. Information theory based software metrics are attractive because they quantify the amount of...
Show moreDevelopment of reliable, high quality, software requires study and understanding at each step of the development process. A basic assumption in the field of software measurement is that metrics of internal software attributes somehow relate to the intrinsic difficulty in understanding a program. Measuring the information content of a program attempts to indirectly quantify the comprehension task. Information theory based software metrics are attractive because they quantify the amount of information in a well defined framework. However, most information theory based metrics have been proposed with little reference to measurement theory fundamentals, and empirical validation of predictive quality models has been lacking. This dissertation proves that representative information theory based software metrics can be "meaningful" components of software quality models in the context of measurement theory. To this end, members of a major class of metrics are shown to be regular representations of Minimum Description Length or Variety of software attributes, and are interval scale. An empirical validation case study is presented that predicted faults in modules based on Operator Information. This metric is closely related to Harrison's Average Information Content Classification, which is the entropy of the operators. New general methods for calculating synthetic complexity at the system level and module level are presented, quantifying the joint information of an arbitrary set of primitive software measures. Since all kinds of information are not equally relevant to software quality factors, components of synthetic module complexity are also defined. Empirical case studies illustrate the potential usefulness of the proposed synthetic metrics. A metrics data base is often the key to a successful ongoing software metrics program. The contribution of any proposed metric is defined in terms of measured variation using information theory, irrespective of the metric's usefulness in quality models. This is of interest when full validation is not practical. Case studies illustrate the method.
Show less - Date Issued
- 1995
- PURL
- http://purl.flvc.org/fcla/dt/12412
- Subject Headings
- Software engineering, Computer software--Quality control, Information theory
- Format
- Document (PDF)
- Title
- Correcting noisy data and expert analysis of the correction process.
- Creator
- Seiffert, Christopher N., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
This thesis expands upon an existing noise cleansing technique, polishing, enabling it to be used in the Software Quality Prediction domain, as well as any other domain where the data contains continuous values, as opposed to categorical data for which the technique was originally designed. The procedure is applied to a real world dataset with real (as opposed to injected) noise as determined by an expert in the domain. This, in combination with expert assessment of the changes made to the...
Show moreThis thesis expands upon an existing noise cleansing technique, polishing, enabling it to be used in the Software Quality Prediction domain, as well as any other domain where the data contains continuous values, as opposed to categorical data for which the technique was originally designed. The procedure is applied to a real world dataset with real (as opposed to injected) noise as determined by an expert in the domain. This, in combination with expert assessment of the changes made to the data, provides not only a more realistic dataset than one in which the noise (or even the entire dataset) is artificial, but also a better understanding of whether the procedure is successful in cleansing the data. Lastly, this thesis provides a more in-depth view of the process than previously available, in that it gives results for different parameters and classifier building techniques. This allows the reader to gain a better understanding of the significance of both model generation and parameter selection.
Show less - Date Issued
- 2005
- PURL
- http://purl.flvc.org/fcla/dt/13223
- Subject Headings
- Computer interfaces--Software--Quality control, Acoustical engineering, Noise control--Computer programs, Expert systems (Computer science), Software documentation
- Format
- Document (PDF)
- Title
- A metrics-based software quality modeling tool.
- Creator
- Rajeevalochanam, Jayanth Munikote., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In today's world, high reliability has become an essential component of almost every software system. However, since the reliability-enhancement activities entail enormous costs, software quality models, based on the metrics collected early in the development life cycle, serve as handy tools for cost-effectively guiding such activities to the software modules that are likely to be faulty. Case-Based Reasoning (CBR) is an attractive technique for software quality modeling. Software Measurement...
Show moreIn today's world, high reliability has become an essential component of almost every software system. However, since the reliability-enhancement activities entail enormous costs, software quality models, based on the metrics collected early in the development life cycle, serve as handy tools for cost-effectively guiding such activities to the software modules that are likely to be faulty. Case-Based Reasoning (CBR) is an attractive technique for software quality modeling. Software Measurement Analysis and Reliability Toolkit (SMART) is a CBR tool customized for metrics-based software quality modeling. Developed for the NASA IV&V Facility, SMART supports three types of software quality models: quantitative quality prediction, classification, and module-order models. It also supports a goal-oriented selection of classification models. An empirical case study of a military command, control, and communication system demonstrates the accuracy and usefulness of SMART, and also serves as a user-guide for the tool.
Show less - Date Issued
- 2002
- PURL
- http://purl.flvc.org/fcla/dt/12967
- Subject Headings
- Software measurement, Computer software--Quality control, Case-based reasoning
- Format
- Document (PDF)
- Title
- Intrusion detection in wireless networks: A data mining approach.
- Creator
- Nath, Shyam Varan., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The security of wireless networks has gained considerable importance due to the rapid proliferation of wireless communications. While computer network heuristics and rules are being used to control and monitor the security of Wireless Local Area Networks (WLANs), mining and learning behaviors of network users can provide a deeper level of security analysis. The objective and contribution of this thesis is three fold: exploring the security vulnerabilities of the IEEE 802.11 standard for...
Show moreThe security of wireless networks has gained considerable importance due to the rapid proliferation of wireless communications. While computer network heuristics and rules are being used to control and monitor the security of Wireless Local Area Networks (WLANs), mining and learning behaviors of network users can provide a deeper level of security analysis. The objective and contribution of this thesis is three fold: exploring the security vulnerabilities of the IEEE 802.11 standard for wireless networks; extracting features or metrics, from a security point of view, for modeling network traffic in a WLAN; and proposing a data mining-based approach to intrusion detection in WLANs. A clustering- and expert-based approach to intrusion detection in a wireless network is presented in this thesis. The case study data is obtained from a real-word WLAN and contains over one million records. Given the clusters of network traffic records, a distance-based heuristic measure is proposed for labeling clusters as either normal or intrusive. The empirical results demonstrate the promise of the proposed approach, laying the groundwork for a clustering-based framework for intrusion detection in computer networks.
Show less - Date Issued
- 2005
- PURL
- http://purl.flvc.org/fcla/dt/13246
- Subject Headings
- Wireless communication systems, Data warehousing, Data mining, Telecommunication--Security measures, Computer networks--Security measures, Computer security
- Format
- Document (PDF)
- Title
- Software quality modeling and analysis with limited or without defect data.
- Creator
- Seliya, Naeem A., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The key to developing high-quality software is the measurement and modeling of software quality. In practice, software measurements are often used as a resource to model and comprehend the quality of software. The use of software measurements to understand quality is accomplished by a software quality model that is trained using software metrics and defect data of similar, previously developed, systems. The model is then applied to estimate quality of the target software project. Such an...
Show moreThe key to developing high-quality software is the measurement and modeling of software quality. In practice, software measurements are often used as a resource to model and comprehend the quality of software. The use of software measurements to understand quality is accomplished by a software quality model that is trained using software metrics and defect data of similar, previously developed, systems. The model is then applied to estimate quality of the target software project. Such an approach assumes that defect data is available for all program modules in the training data. Various practical issues can cause an unavailability or limited availability of defect data from the previously developed systems. This dissertation presents innovative and practical techniques for addressing the problem of software quality analysis when there is limited or completely absent defect data. The proposed techniques for software quality analysis without defect data include an expert-based approach with unsupervised clustering and an expert-based approach with semi-supervised clustering. The proposed techniques for software quality analysis with limited defect data includes a semi-supervised classification approach with the Expectation-Maximization algorithm and an expert-based approach with semi-supervised clustering. Empirical case studies of software measurement datasets obtained from multiple NASA software projects are used to present and evaluate the different techniques. The empirical results demonstrate the attractiveness, benefit, and definite promise of the proposed techniques. The newly developed techniques presented in this dissertation is invaluable to the software quality practitioner challenged by the absence or limited availability of defect data from previous software development experiences.
Show less - Date Issued
- 2005
- PURL
- http://purl.flvc.org/fcla/dt/12151
- Subject Headings
- Software measurement, Computer software--Quality control, Computer software--Reliability--Mathematical models, Software engineering--Quality control
- Format
- Document (PDF)
- Title
- Software fault prediction using tree-based models.
- Creator
- Seliya, Naeem A., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Maintaining superior quality and reliability in software systems is of utmost importance in today's world. Early fault prediction is a proven method for achieving this. Tree based modelling is a simple and effective method that can be used to predict the number of faults in a software system. In this thesis, we use regression tree based modelling to predict the number of faults in a software module. The goal of this study is four-fold. First, a comparative study of the tree based modelling...
Show moreMaintaining superior quality and reliability in software systems is of utmost importance in today's world. Early fault prediction is a proven method for achieving this. Tree based modelling is a simple and effective method that can be used to predict the number of faults in a software system. In this thesis, we use regression tree based modelling to predict the number of faults in a software module. The goal of this study is four-fold. First, a comparative study of the tree based modelling tools CART and S-PLUS. CART yielded simpler regression trees than those built by S-PLUS. Second, a comparative study of the least squares and the least absolute deviation methods of CART. It is shown that the latter yielded better results than the former. Third, a study of the possible benefits of using principal components analysis when performing regression tree modelling. The fourth and final study is a comparison of tree based modelling with other prediction techniques namely, Case Based Reasoning, Artificial Neural Networks and Multiple Linear Regression.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12782
- Subject Headings
- Software measurement, Computer software--Quality control
- Format
- Document (PDF)
- Title
- Rough Set-Based Software Quality Models and Quality of Data.
- Creator
- Bullard, Lofton A., Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In this dissertation we address two significant issues of concern. These are software quality modeling and data quality assessment. Software quality can be measured by software reliability. Reliability is often measured in terms of the time between system failures. A failure is caused by a fault which is a defect in the executable software product. The time between system failures depends both on the presence and the usage pattern of the software. Finding faulty components in the development...
Show moreIn this dissertation we address two significant issues of concern. These are software quality modeling and data quality assessment. Software quality can be measured by software reliability. Reliability is often measured in terms of the time between system failures. A failure is caused by a fault which is a defect in the executable software product. The time between system failures depends both on the presence and the usage pattern of the software. Finding faulty components in the development cycle of a software system can lead to a more reliable final system and will reduce development and maintenance costs. The issue of software quality is investigated by proposing a new approach, rule-based classification model (RBCM) that uses rough set theory to generate decision rules to predict software quality. The new model minimizes over-fitting by balancing the Type I and Type II niisclassiflcation error rates. We also propose a model selection technique for rule-based models called rulebased model selection (RBMS). The proposed rule-based model selection technique utilizes the complete and partial matching rule sets of candidate RBCMs to determine the model with the least amount of over-fitting. In the experiments that were performed, the RBCMs were effective at identifying faulty software modules, and the RBMS technique was able to identify RBCMs that minimized over-fitting. Good data quality is a critical component for building effective software quality models. We address the significance of the quality of data on the classification performance of learners by conducting a comprehensive comparative study. Several trends were observed in the experiments. Class and attribute had the greatest impact on the performance of learners when it occurred simultaneously in the data. Class noise had a significant impact on the performance of learners, while attribute noise had no impact when it occurred in less than 40% of the most significant independent attributes. Random Forest (RF100), a group of 100 decision trees, was the most, accurate and robust learner in all the experiments with noisy data.
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/fau/fd/FA00012567
- Subject Headings
- Computer software--Quality control, Computer software--Reliability, Software engineering, Computer arithmetic
- Format
- Document (PDF)
- Title
- OCR2SEQ: A NOVEL MULTI-MODAL DATA AUGMENTATION PIPELINE FOR WEAK SUPERVISION.
- Creator
- Lowe, Michael A., Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
With the recent large-scale adoption of Large Language Models in multidisciplinary research and commercial space, the need for large amounts of labeled data has become more crucial than ever to evaluate potential use cases for opportunities in applied intelligence. Most domain specific fields require a substantial shift that involves extremely large amounts of heterogeneous data to have meaningful impact on the pre-computed weights of most large language models. We explore extending the...
Show moreWith the recent large-scale adoption of Large Language Models in multidisciplinary research and commercial space, the need for large amounts of labeled data has become more crucial than ever to evaluate potential use cases for opportunities in applied intelligence. Most domain specific fields require a substantial shift that involves extremely large amounts of heterogeneous data to have meaningful impact on the pre-computed weights of most large language models. We explore extending the capabilities a state-of-the-art unsupervised pre-training method; Transformers and Sequential Denoising Auto-Encoder (TSDAE). In this study we show various opportunities for using OCR2Seq a multi-modal generative augmentation strategy to further enhance and measure the quality of noise samples used when using TSDAE as a pretraining task. This study is a first of its kind work that leverages converting both generalized and sparse domains of relational data into multi-modal sources. Our primary objective is measuring the quality of augmentation in relation to the current implementation of the sentence transformers library. Further work includes the effect on ranking, language understanding, and corrective quality.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014367
- Subject Headings
- Natural language processing (Computer science), Deep learning (Machine learning)
- Format
- Document (PDF)
- Title
- ADVANCING ONE-CLASS CLASSIFICATION: A COMPREHENSIVE ANALYSIS FROM THEORY TO NOVEL APPLICATIONS.
- Creator
- Abdollah, Zadeh Azadeh, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
This dissertation explores one-class classification (OCC) in the context of big data and fraud detection, addressing challenges posed by imbalanced datasets. A detailed survey of OCC-related literature forms a core part of the study, categorizing works into outlier detection, novelty detection, and deep learning applications. This survey reveals a gap in the application of OCC to the inherent problems of big data, such as class rarity and noisy data. Building upon the foundational insights...
Show moreThis dissertation explores one-class classification (OCC) in the context of big data and fraud detection, addressing challenges posed by imbalanced datasets. A detailed survey of OCC-related literature forms a core part of the study, categorizing works into outlier detection, novelty detection, and deep learning applications. This survey reveals a gap in the application of OCC to the inherent problems of big data, such as class rarity and noisy data. Building upon the foundational insights gained from the comprehensive literature review on OCC, the dissertation progresses to a detailed comparative analysis between OCC and binary classification methods. This comparison is pivotal in understanding their respective strengths and limitations across various applications, emphasizing their roles in addressing imbalanced datasets. The research then specifically evaluates binary and OCC using credit card fraud data. This practical application highlights the nuances and effectiveness of these classification methods in real-world scenarios, offering insights into their performance in detecting fraudulent activities. After the evaluation of binary and OCC using credit card fraud data, the dissertation extends this inquiry with a detailed investigation into the effectiveness of both methodologies in fraud detection. This extended analysis involves utilizing not only the Credit Card Fraud Detection Dataset but also the Medicare Part D dataset. The findings show the comparative performance and suitability of these classification methods in practical fraud detection scenarios. Finally, the dissertation examines the impact of training OCC algorithms on majority versus minority classes, using the two previously mentioned datasets in addition to Medicare Part B and Durable Medical Equipment, Prosthetics, Orthotics and Supplies (DMEPOS) datasets. This exploration offers critical insights into model training strategies and their implications, suggesting that training on the majority class can often lead to more robust classification results. In summary, this dissertation provides a deep understanding of OCC, effectively bridging theoretical concepts with novel applications in big data and fraud detection. It contributes to the field by offering a comprehensive analysis of OCC methodologies, their practical implications, and their effectiveness in addressing class imbalance in big data.
Show less - Date Issued
- 2024
- PURL
- http://purl.flvc.org/fau/fd/FA00014387
- Subject Headings
- Classification, Big data, Deep learning (Machine learning), Computer engineering
- Format
- Document (PDF)
- Title
- FRAUD DETECTION IN HIGHLY IMBALANCED BIG DATA WITH NOVEL AND EFFICIENT DATA REDUCTION TECHNIQUES.
- Creator
- Hancock III, John T., Taghi M. Khoshgoftaar, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The rapid growth of digital transactions and the increasing sophistication of fraudulent activities have necessitated the development of robust and efficient fraud detection techniques, particularly in the financial and healthcare sectors. This dissertation focuses on the use of novel data reduction techniques for addressing the unique challenges associated with detecting fraud in highly imbalanced Big Data, with a specific emphasis on credit card transactions and Medicare claims. The highly...
Show moreThe rapid growth of digital transactions and the increasing sophistication of fraudulent activities have necessitated the development of robust and efficient fraud detection techniques, particularly in the financial and healthcare sectors. This dissertation focuses on the use of novel data reduction techniques for addressing the unique challenges associated with detecting fraud in highly imbalanced Big Data, with a specific emphasis on credit card transactions and Medicare claims. The highly imbalanced nature of these datasets, where fraudulent instances constitute less than one percent of the data, poses significant challenges for traditional machine learning algorithms. This dissertation explores novel data reduction techniques tailored for fraud detection in highly imbalanced Big Data. The primary objectives include developing efficient data preprocessing and feature selection methods to reduce data dimensionality while preserving the most informative features, investigating various machine learning algorithms for their effectiveness in handling imbalanced data, and evaluating the proposed techniques on real-world credit card and Medicare fraud datasets. This dissertation covers a comprehensive examination of datasets, learners, experimental methodology, sampling techniques, feature selection techniques, and hybrid techniques. Key contributions include the analysis of performance metrics in the context of newly available Big Medicare Data, experiments using Big Medicare data, application of a novel ensemble supervised feature selection technique, and the combined application of data sampling and feature selection. The research demonstrates that, across both domains, the combined application of random undersampling and ensemble feature selection significantly improves classification performance.
Show less - Date Issued
- 2024
- PURL
- http://purl.flvc.org/fau/fd/FA00014424
- Subject Headings
- Fraud, Big data, Data reduction, Credit card fraud, Medicare fraud
- Format
- Document (PDF)