Current Search: Khoshgoftaar, Taghi M. (x)
View All Items
Pages
- Title
- Measurement of coupling and cohesion of software.
- Creator
- Chen, Ye., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Graphs are often used to depict an abstraction of software. A graph may be an abstraction of a software system and a subgraph may represent a software module. Coupling and cohesion are attributes that summarize the degree of interdependence or connectivity among subsystems or within subsystems, respectively. When used in conjunction with measures of other attributes, coupling and cohesion can contribute to an assessment or prediction of software quality. Information theory is attractive to us...
Show moreGraphs are often used to depict an abstraction of software. A graph may be an abstraction of a software system and a subgraph may represent a software module. Coupling and cohesion are attributes that summarize the degree of interdependence or connectivity among subsystems or within subsystems, respectively. When used in conjunction with measures of other attributes, coupling and cohesion can contribute to an assessment or prediction of software quality. Information theory is attractive to us because the design decisions embodied by the graph are information. Using information theory, we propose measures of the cohesion and coupling of a modular system and cohesion and coupling of each constituent module. These measures conform to the properties of cohesion and coupling defined by Briand, Morasca and Basili, applied to undirected graphs and therefore, are in the families of measures called cohesion and coupling.
Show less - Date Issued
- 2000
- PURL
- http://purl.flvc.org/fcla/dt/15760
- Subject Headings
- Information theory, Computer software--Evaluation, Software measurement
- Format
- Document (PDF)
- Title
- Cost of misclassification in software quality models.
- Creator
- Guan, Xin., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Reliability has become a very important and competitive factor for software products. Using software quality models based on software measurements provides a systematic and scientific way to detect software faults early and to improve software reliability. This thesis considers several classification techniques including Generalized Classification Rule, MetaCost algorithm, Cost-Boosting algorithm and AdaCost algorithm. We also introduce the weighted logistic regression algorithm, and a new...
Show moreReliability has become a very important and competitive factor for software products. Using software quality models based on software measurements provides a systematic and scientific way to detect software faults early and to improve software reliability. This thesis considers several classification techniques including Generalized Classification Rule, MetaCost algorithm, Cost-Boosting algorithm and AdaCost algorithm. We also introduce the weighted logistic regression algorithm, and a new method to evaluate the performance of classification models---ROC Analysis. We focus our experiments on a very large legacy telecommunications system (LLTS) to build software quality models with principal components analysis. Two other data sets, CCCS and LTS are also used in our experiments.
Show less - Date Issued
- 2000
- PURL
- http://purl.flvc.org/fcla/dt/15762
- Subject Headings
- Computer software--Quality control, Software measurement, Computer software--Testing
- Format
- Document (PDF)
- Title
- Software reliability engineering: An evolutionary neural network approach.
- Creator
- Hochman, Robert., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
This thesis presents the results of an empirical investigation of the applicability of genetic algorithms to a real-world problem in software reliability--the fault-prone module identification problem. The solution developed is an effective hybrid of genetic algorithms and neural networks. This approach (ENNs) was found to be superior, in terms of time, effort, and confidence in the optimality of results, to the common practice of searching manually for the best-performing net. Comparisons...
Show moreThis thesis presents the results of an empirical investigation of the applicability of genetic algorithms to a real-world problem in software reliability--the fault-prone module identification problem. The solution developed is an effective hybrid of genetic algorithms and neural networks. This approach (ENNs) was found to be superior, in terms of time, effort, and confidence in the optimality of results, to the common practice of searching manually for the best-performing net. Comparisons were made to discriminant analysis. On fault-prone, not-fault-prone, and overall classification, the lower error proportions for ENNs were found to be statistically significant. The robustness of ENNs follows from their superior performance over many data configurations. Given these encouraging results, it is suggested that ENNs have potential value in other software reliability problem domains, where genetic algorithms have been largely ignored. For future research, several plans are outlined for enhancing ENNs with respect to accuracy and applicability.
Show less - Date Issued
- 1997
- PURL
- http://purl.flvc.org/fcla/dt/15474
- Subject Headings
- Neural networks (Computer science), Software engineering, Genetic algorithms
- Format
- Document (PDF)
- Title
- Software metrics collection: Two new research tools.
- Creator
- Jordan, Sylviane G., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Collecting software metrics manually could be a tedious, inaccurate, and subjective task. Two new tools were developed to automate this process in a rapid, accurate, and objective way. The first tool, the Metrics Analyzer, evaluates 19 metrics at the function level, from complete or partial systems written in C. The second tool, the Call Graph Generator, does not assess a metric directly, but generates a call graph based on a complete or partial system written in C. The call graph is used as...
Show moreCollecting software metrics manually could be a tedious, inaccurate, and subjective task. Two new tools were developed to automate this process in a rapid, accurate, and objective way. The first tool, the Metrics Analyzer, evaluates 19 metrics at the function level, from complete or partial systems written in C. The second tool, the Call Graph Generator, does not assess a metric directly, but generates a call graph based on a complete or partial system written in C. The call graph is used as an input to another tool (not considered here) that measures the coupling of a module, such as a function or a file. A case study analyzed the relationships among the metrics, including the coupling metric, using principal component analysis, which transformed the 19 metrics into eight principal components.
Show less - Date Issued
- 1997
- PURL
- http://purl.flvc.org/fcla/dt/15483
- Subject Headings
- Software measurement, Computer software--Development, Computer software--Evaluation
- Format
- Document (PDF)
- Title
- Software quality prediction using case-based reasoning.
- Creator
- Berkovich, Yevgeniy., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
The ability to efficiently prevent faults in large software systems is a very important concern of software project managers. Successful testing allows us to build quality software systems. Unfortunately, it is not always possible to effectively test a system due to time, resources, or other constraints. A critical bug may cause catastrophic consequences, such as loss of life or very expensive equipment. We can facilitate testing by finding where faults are more likely to be hidden. Case...
Show moreThe ability to efficiently prevent faults in large software systems is a very important concern of software project managers. Successful testing allows us to build quality software systems. Unfortunately, it is not always possible to effectively test a system due to time, resources, or other constraints. A critical bug may cause catastrophic consequences, such as loss of life or very expensive equipment. We can facilitate testing by finding where faults are more likely to be hidden. Case-Based Reasoning (CBR) is one of many methodologies that make this process faster and cheaper by discovering faults early in the software life cycle. This is one of the methodologies used to predict software quality of the system by discovering fault-prone modules. We employ the SMART tool to facilitate CBR , using product and process metrics as independent variables. The study found that CBR is a robust tool capable of carrying out software quality prediction on its own with acceptable results. We also show that CBR's weaknesses do not hinder its effectiveness in finding misclassified modules.
Show less - Date Issued
- 2000
- PURL
- http://purl.flvc.org/fcla/dt/12671
- Subject Headings
- Computer software--Quality control, Computer software--Evaluation, Software measurement
- Format
- Document (PDF)
- Title
- Simulation analysis of the IBM Subsystem Control Block architecture in a network file server environment.
- Creator
- Anumulapally, Ranga R., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Advanced system bus architectures such as the Micro Channel and the EISA bus support what is called bus-mastering that allows the I/O subsystems attached to the bus to arbitrate and take control of the bus to perform data transfers independent of the system processor. I/O subsystems that can control/master the system bus are called Bus-Masters. The IBM Subsystem Control Block (SCB) architecture defines interrupt-driven as well as peer-to-peer I/O protocols for performing data transfers to...
Show moreAdvanced system bus architectures such as the Micro Channel and the EISA bus support what is called bus-mastering that allows the I/O subsystems attached to the bus to arbitrate and take control of the bus to perform data transfers independent of the system processor. I/O subsystems that can control/master the system bus are called Bus-Masters. The IBM Subsystem Control Block (SCB) architecture defines interrupt-driven as well as peer-to-peer I/O protocols for performing data transfers to/from the bus-masters. In previous studies, the performance of the SCB protocols is evaluated in network server environments using simulation models. The main drawback of these studies is that the server system is modeled in considerable detail but the network and the clients are not considered. In this study, we developed models to simulate a complete network file server environment where a single file server based on the SCB architecture provides file service to a variable number of clients on a token-ring network. We then evaluate the performance of the SCB protocols using the results obtained from the simulations.
Show less - Date Issued
- 1994
- PURL
- http://purl.flvc.org/fcla/dt/15057
- Subject Headings
- Distance education, Virtual reality
- Format
- Document (PDF)
- Title
- Tree-based classification models for analyzing a very large software system.
- Creator
- Bullard, Lofton A., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Software systems that control military radar systems must be highly reliable. A fault can compromise safety and security, and even cause death of military personnel. In this experiment we identify fault-prone software modules in a subsystem of a military radar system called the Joint Surveillance Target Attack Radar System, JSTARS. An earlier version was used in Operation Desert Storm to monitor ground movement. Product metrics were collected for different iterations of an operational...
Show moreSoftware systems that control military radar systems must be highly reliable. A fault can compromise safety and security, and even cause death of military personnel. In this experiment we identify fault-prone software modules in a subsystem of a military radar system called the Joint Surveillance Target Attack Radar System, JSTARS. An earlier version was used in Operation Desert Storm to monitor ground movement. Product metrics were collected for different iterations of an operational prototype of the subsystem over a period of approximately three years. We used these metrics to train a decision tree model and to fit a discriminant model to classify each module as fault-prone or not fault-prone. The algorithm used to generate the decision tree model was TREEDISC, developed by the SAS Institute. The decision tree model is compared to the discriminant model.
Show less - Date Issued
- 1996
- PURL
- http://purl.flvc.org/fcla/dt/15315
- Subject Headings
- Computer software--Quality control, Computer software--Reliability, Software engineering
- Format
- Document (PDF)
- Title
- DATA COLLECTION FRAMEWORK AND MACHINE LEARNING ALGORITHMS FOR THE ANALYSIS OF CYBER SECURITY ATTACKS.
- Creator
- Calvert, Chad, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The integrity of network communications is constantly being challenged by more sophisticated intrusion techniques. Attackers are shifting to stealthier and more complex forms of attacks in an attempt to bypass known mitigation strategies. Also, many detection methods for popular network attacks have been developed using outdated or non-representative attack data. To effectively develop modern detection methodologies, there exists a need to acquire data that can fully encompass the behaviors...
Show moreThe integrity of network communications is constantly being challenged by more sophisticated intrusion techniques. Attackers are shifting to stealthier and more complex forms of attacks in an attempt to bypass known mitigation strategies. Also, many detection methods for popular network attacks have been developed using outdated or non-representative attack data. To effectively develop modern detection methodologies, there exists a need to acquire data that can fully encompass the behaviors of persistent and emerging threats. When collecting modern day network traffic for intrusion detection, substantial amounts of traffic can be collected, much of which consists of relatively few attack instances as compared to normal traffic. This skewed distribution between normal and attack data can lead to high levels of class imbalance. Machine learning techniques can be used to aid in attack detection, but large levels of imbalance between normal (majority) and attack (minority) instances can lead to inaccurate detection results.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013289
- Subject Headings
- Machine learning, Algorithms, Anomaly detection (Computer security), Intrusion detection systems (Computer security), Big data
- Format
- Document (PDF)
- Title
- DEEP MAXOUT NETWORKS FOR CLASSIFICATION PROBLEMS ACROSS MULTIPLE DOMAINS.
- Creator
- Castaneda, Gabriel, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Machine learning techniques such as deep neural networks have become an indispensable tool for a wide range of applications such as image classification, speech recognition, and sentiment analysis in text. An activation function is a mathematical equation that determines the output of each neuron in the neural network. In deep learning architectures the choice of activation functions is very important to the network’s performance. Activation functions determine the output of the model, its...
Show moreMachine learning techniques such as deep neural networks have become an indispensable tool for a wide range of applications such as image classification, speech recognition, and sentiment analysis in text. An activation function is a mathematical equation that determines the output of each neuron in the neural network. In deep learning architectures the choice of activation functions is very important to the network’s performance. Activation functions determine the output of the model, its computational efficiency, and its ability to train and converge after multiple iterations of training epochs. The selection of an activation function is critical to building and training an effective and efficient neural network. In real-world applications of deep neural networks, the activation function is a hyperparameter. We have observed a lack of consensus on how to select a good activation function for a deep neural network, and that a specific function may not be suitable for all domain-specific applications.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013362
- Subject Headings
- Classification, Machine learning--Technique, Neural networks (Computer science)
- Format
- Document (PDF)
- Title
- Ensemble Learning Algorithms for the Analysis of Bioinformatics Data.
- Creator
- Fazelpour, Alireza, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Developments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset...
Show moreDevelopments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset size, noisy data, and complexity of data in terms of hard to distinguish decision boundaries between classes within the data. In recognition of the aforementioned challenges, this dissertation utilizes a variety of machine-learning and data-mining techniques, such as ensemble classification algorithms in conjunction with data sampling and feature selection techniques to alleviate these problems, while improving the classification results of models built on these datasets. However, in building classification models researchers and practitioners encounter the challenge that there is not a single classifier that performs relatively well in all cases. Thus, numerous classification approaches, such as ensemble learning methods, have been developed to address this problem successfully in a majority of circumstances. Ensemble learning is a promising technique that generates multiple classification models and then combines their decisions into a single final result. Ensemble learning often performs better than single-base classifiers in performing classification tasks. This dissertation conducts thorough empirical research by implementing a series of case studies to evaluate how ensemble learning techniques can be utilized to enhance overall classification performance, as well as improve the generalization ability of ensemble models. This dissertation investigates ensemble learning techniques of the boosting, bagging, and random forest algorithms, and proposes a number of modifications to the existing ensemble techniques in order to improve further the classification results. This dissertation examines the effectiveness of ensemble learning techniques on accounting for challenging characteristics of class imbalance and difficult-to-learn class decision boundaries. Next, it looks into ensemble methods that are relatively tolerant to class noise, and not only can account for the problem of class noise, but improves classification performance. This dissertation also examines the joint effects of data sampling along with ensemble techniques on whether sampling techniques can further improve classification performance of built ensemble models.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004588
- Subject Headings
- Bioinformatics., Data mining -- Technological innovations., Machine learning.
- Format
- Document (PDF)
- Title
- Effects of gene selection and data sampling on prediction of breast cancer treatments.
- Creator
- Heredia, Brian, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In recent years more and more researchers have begun to use data mining and machine learning tools to analyze gene microarray data. In this thesis we have collected a selection of datasets revolving around prediction of patient response in the specific area of breast cancer treatment. The datasets collected in this paper are all obtained from gene chips, which have become the industry standard in measurement of gene expression. In this thesis we will discuss the methods and procedures used in...
Show moreIn recent years more and more researchers have begun to use data mining and machine learning tools to analyze gene microarray data. In this thesis we have collected a selection of datasets revolving around prediction of patient response in the specific area of breast cancer treatment. The datasets collected in this paper are all obtained from gene chips, which have become the industry standard in measurement of gene expression. In this thesis we will discuss the methods and procedures used in the studies to analyze the datasets and their effects on treatment prediction with a particular interest in the selection of genes for predicting patient response. We will also analyze the datasets on our own in a uniform manner to determine the validity of these datasets in terms of learning potential and provide strategies for future work which explore how to best identify gene signatures.
Show less - Date Issued
- 2014
- PURL
- http://purl.flvc.org/fau/fd/FA00004292, http://purl.flvc.org/fau/fd/FA00004292
- Subject Headings
- Antineoplastic agents -- Development, Breast -- Cancer -- Treatment, Cancer -- Genetic aspects, DNA mircroarrays, Estimation theory, Gene expression
- Format
- Document (PDF)
- Title
- INVESTIGATING MACHINE LEARNING ALGORITHMS WITH IMBALANCED BIG DATA.
- Creator
- Hasanin, Tawfiq, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Recent technological developments have engendered an expeditious production of big data and also enabled machine learning algorithms to produce high-performance models from such data. Nonetheless, class imbalance (in binary classifications) between the majority and minority classes in big data can skew the predictive performance of the classification algorithms toward the majority (negative) class whereas the minority (positive) class usually holds greater value for the decision makers. Such...
Show moreRecent technological developments have engendered an expeditious production of big data and also enabled machine learning algorithms to produce high-performance models from such data. Nonetheless, class imbalance (in binary classifications) between the majority and minority classes in big data can skew the predictive performance of the classification algorithms toward the majority (negative) class whereas the minority (positive) class usually holds greater value for the decision makers. Such bias may lead to adverse consequences, some of them even life-threatening, when the existence of false negatives is generally costlier than false positives. The size of the minority class can vary from fair to extraordinary small, which can lead to different performance scores for machine learning algorithms. Class imbalance is a well-studied area for traditional data, i.e., not big data. However, there is limited research focusing on both rarity and severe class imbalance in big data.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013316
- Subject Headings
- Algorithms, Machine learning, Big data--Data processing, Big data
- Format
- Document (PDF)
- Title
- Predicting failure of remote battery backup systems.
- Creator
- Aranguren, Pachano Liz Jeannette, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Uninterruptable Power Supply (UPS) systems have become essential to modern industries that require continuous power supply to manage critical operations. Since a failure of a single battery will affect the entire backup system, UPS systems providers must replace any battery before it runs dead. In this regard, automated monitoring tools are required to determine when a battery needs replacement. Nowadays, a primitive method for monitoring the battery backup system is being used for this task....
Show moreUninterruptable Power Supply (UPS) systems have become essential to modern industries that require continuous power supply to manage critical operations. Since a failure of a single battery will affect the entire backup system, UPS systems providers must replace any battery before it runs dead. In this regard, automated monitoring tools are required to determine when a battery needs replacement. Nowadays, a primitive method for monitoring the battery backup system is being used for this task. This thesis presents a classification model that uses data mining cleansing and processing techniques to remove useless information from the data obtained from the sensors installed in the batteries in order to improve the quality of the data and determine at a given moment in time if a battery should be replaced or not. This prediction model will help UPS systems providers increase the efficiency of battery monitoring procedures.
Show less - Date Issued
- 2013
- PURL
- http://purl.flvc.org/fau/fd/FA0004002
- Subject Headings
- Electric power systems -- Equipment and supplies, Energy storing -- Testing, Lead acid batteries, Power electronics, Protective relays
- Format
- Document (PDF)
- Title
- A Comparison of Model Checking Tools for Service Oriented Architectures.
- Creator
- Venkat, Raghava, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Recently most of the research pertaining to Service-Oriented Architecture (SOA) is based on web services and how secure they are in terms of efficiency and effectiveness. This requires validation, verification, and evaluation of web services. Verification and validation should be collaborative when web services from different vendors are integrated together to carry out a coherent task. For this purpose, novel model checking technologies have been devised and applied to web services. "Model...
Show moreRecently most of the research pertaining to Service-Oriented Architecture (SOA) is based on web services and how secure they are in terms of efficiency and effectiveness. This requires validation, verification, and evaluation of web services. Verification and validation should be collaborative when web services from different vendors are integrated together to carry out a coherent task. For this purpose, novel model checking technologies have been devised and applied to web services. "Model Checking" is a promising technique for verification and validation of software systems. WS-BPEL (Business Process Execution Language for Web Services) is an emerging standard language to describe web service composition behavior. The advanced features of BPEL such as concurrency and hierarchy make it challenging to verify BPEL models. Based on all such factors my thesis surveys a few important technologies (tools) for model checking and comparing each of them based on their "functional" and "non-functional" properties. The comparison is based on three case studies (first being the small case, second medium and the third one a large case) where we construct synthetic web service compositions for each case (as there are not many publicly available compositions [1]). The first case study is "Enhanced LoanApproval Process" and is considered a small case. The second is "Enhanced Purchase Order Process" which is of medium size and the third, and largest is based on a scientific workflow pattern, called the "Service Oriented Architecture Implementing BOINC Workflow" based on BOINC (Berkeley Open Infrastructure Network Computing) architecture.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012565
- Subject Headings
- Computer network architectures, Expert systems (Computer science), Software engineering, Web servers--Management
- Format
- Document (PDF)
- Title
- Machine learning techniques for alleviating inherent difficulties in bioinformatics data.
- Creator
- Dittman, David, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In response to the massive amounts of data that make up a large number of bioinformatics datasets, it has become increasingly necessary for researchers to use computers to aid them in their endeavors. With difficulties such as high dimensionality, class imbalance, noisy data, and difficult to learn class boundaries, being present within the data, bioinformatics datasets are a challenge to work with. One potential source of assistance is the domain of data mining and machine learning, a field...
Show moreIn response to the massive amounts of data that make up a large number of bioinformatics datasets, it has become increasingly necessary for researchers to use computers to aid them in their endeavors. With difficulties such as high dimensionality, class imbalance, noisy data, and difficult to learn class boundaries, being present within the data, bioinformatics datasets are a challenge to work with. One potential source of assistance is the domain of data mining and machine learning, a field which focuses on working with these large amounts of data and develops techniques to discover new trends and patterns that are hidden within the data and to increases the capability of researchers and practitioners to work with this data. Within this domain there are techniques designed to eliminate irrelevant or redundant features, balance the membership of the classes, handle errors found in the data, and build predictive models for future data.
Show less - Date Issued
- 2015
- PURL
- http://purl.flvc.org/fau/fd/FA00004362, http://purl.flvc.org/fau/fd/FA00004362
- Subject Headings
- Artificial intelligence, Bioinformatics, Machine learning, System design, Theory of computation
- Format
- Document (PDF)
- Title
- Machine Learning Algorithms for the Analysis of Social Media and Detection of Malicious User Generated Content.
- Creator
- Heredia, Brian, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
One of the de ning characteristics of the modern Internet is its massive connectedness, with information and human connection simply a few clicks away. Social media and online retailers have revolutionized how we communicate and purchase goods or services. User generated content on the web, through social media, plays a large role in modern society; Twitter has been in the forefront of political discourse, with politicians choosing it as their platform for disseminating information, while...
Show moreOne of the de ning characteristics of the modern Internet is its massive connectedness, with information and human connection simply a few clicks away. Social media and online retailers have revolutionized how we communicate and purchase goods or services. User generated content on the web, through social media, plays a large role in modern society; Twitter has been in the forefront of political discourse, with politicians choosing it as their platform for disseminating information, while websites like Amazon and Yelp allow users to share their opinions on products via online reviews. The information available through these platforms can provide insight into a host of relevant topics through the process of machine learning. Speci - cally, this process involves text mining for sentiment analysis, which is an application domain of machine learning involving the extraction of emotion from text. Unfortunately, there are still those with malicious intent and with the changes to how we communicate and conduct business, comes changes to their malicious practices. Social bots and fake reviews plague the web, providing incorrect information and swaying the opinion of unaware readers. The detection of these false users or posts from reading the text is di cult, if not impossible, for humans. Fortunately, text mining provides us with methods for the detection of harmful user generated content. This dissertation expands the current research in sentiment analysis, fake online review detection and election prediction. We examine cross-domain sentiment analysis using tweets and reviews. Novel techniques combining ensemble and feature selection methods are proposed for the domain of online spam review detection. We investigate the ability for the Twitter platform to predict the United States 2016 presidential election. In addition, we determine how social bots in uence this prediction.
Show less - Date Issued
- 2018
- PURL
- http://purl.flvc.org/fau/fd/FA00013067
- Subject Headings
- Machine learning., Text mining., User-generated content., Social media.
- Format
- Document (PDF)
- Title
- CBR-based software quality models and quality of data.
- Creator
- Xiao, Yudong., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The performance accuracy of software quality estimation models is influenced by several factors, including the following two important factors: performance of the prediction algorithm and the quality of data. This dissertation addresses these two factors, and consists of two components: (1) a proposed genetic algorithm (GA) based optimization of software quality models for accuracy enhancement, and (2) a proposed partitioning- and rule-based filter (PRBF) for noise detection toward...
Show moreThe performance accuracy of software quality estimation models is influenced by several factors, including the following two important factors: performance of the prediction algorithm and the quality of data. This dissertation addresses these two factors, and consists of two components: (1) a proposed genetic algorithm (GA) based optimization of software quality models for accuracy enhancement, and (2) a proposed partitioning- and rule-based filter (PRBF) for noise detection toward improvement of data quality. We construct a generalized framework of our embedded GA-optimizer, and instantiate the GA-optimizer for three optimization problems in software quality engineering: parameter optimization for case-based reasoning (CBR) models; module rank optimization for module-order modeling (MOM); and structural optimization for our multi-strategy classification modeling approach, denoted RB2CBL. Empirical case studies using software measurement data from real-world software systems were performed for the optimization problems. The GA-optimization approaches improved software quality prediction accuracy, highlighting the practical benefits of using GA for solving optimization problems in software engineering. The proposed noise detection approach, PRBF, was empirically evaluated using data categorized into two classes. Empirical studies on artificially corrupted datasets and datasets with known (natural) noise demonstrated that PRBF can effectively detect both artificial and natural noise. The proposed filter is a stable and robust technique, and always provided optimal or near-optimal noise detection results. In addition, it is applicable on datasets with nominal and numerical attributes, as well as those with missing values. The PRBF technique supports two methods of noise detection: class noise detection and cost-sensitive noise detection. The former is an easy-to-use method and does not need parameter settings, while the latter is suited for applications where each class has a specific misclassification cost. PRBF can also be used iteratively to investigate the two general types of data noise: attribute and class noise.
Show less - Date Issued
- 2005
- PURL
- http://purl.flvc.org/fcla/dt/12141
- Subject Headings
- Computer software--Quality control, Genetic programming (Computer science), Software engineering, Case-based reasoning, Combinatorial optimization, Computer network architecture
- Format
- Document (PDF)
- Title
- Classification of software quality using tree modeling with the SPRINT/SLIQ algorithm.
- Creator
- Mao, Wenlei., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Providing high quality software products is the common goal of all software engineers. Finding faults early can produce large savings over the software life cycle. Therefore, software quality has become the main subject in our research field. This thesis presents a series of studies on a very large legacy telecommunication system. The system has significantly more than ten million lines of code written in a high-level language similar to Pascal. Software quality models were developed to...
Show moreProviding high quality software products is the common goal of all software engineers. Finding faults early can produce large savings over the software life cycle. Therefore, software quality has become the main subject in our research field. This thesis presents a series of studies on a very large legacy telecommunication system. The system has significantly more than ten million lines of code written in a high-level language similar to Pascal. Software quality models were developed to predict the class of each module either as fault-prone or as not fault-prone. We used the SPRINT/SLIQ algorithm to build the classification tree models. We found out that SPRINT/ SLIQ as an improved CART algorithm can give us tree models with more accuracy, more balance, and less overfitting. We also found that software process metrics can significantly improve the predictive accuracy of software quality models.
Show less - Date Issued
- 2000
- PURL
- http://purl.flvc.org/fcla/dt/15767
- Subject Headings
- Computer software--Quality control, Software engineering, Software measurement
- Format
- Document (PDF)
- Title
- Developing accurate software quality models using a faster, easier, and cheaper method.
- Creator
- Lim, Linda., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Managers of software development need to know which components of a system are fault-prone. If this can be determined early in the development cycle then resources can be more effectively allocated and significant costs can be reduced. Case-Based Reasoning (CBR) is a simple and efficient methodology for building software quality models that can provide early information to managers. Our research focuses on two case studies. The first study analyzes source files and classifies them as fault...
Show moreManagers of software development need to know which components of a system are fault-prone. If this can be determined early in the development cycle then resources can be more effectively allocated and significant costs can be reduced. Case-Based Reasoning (CBR) is a simple and efficient methodology for building software quality models that can provide early information to managers. Our research focuses on two case studies. The first study analyzes source files and classifies them as fault-prone or not fault-prone. It also predicts the number of faults in each file. The second study analyzes the fault removal process, and creates models that predict the outcome of software inspections.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12746
- Subject Headings
- Computer software--Development, Computer software--Quality control, Software engineering
- Format
- Document (PDF)
- Title
- Modeling software quality at system and subsystem level with TREEDISC classification algorithm.
- Creator
- Liu, Jinxia., Florida Atlantic University, Khoshgoftaar, Taghi M., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Software quality models are tools for detecting faults early in the software development process. In this research, the TREEDISC algorithm and a general classification rule were used to create classification tree models and predict software quality by classifying software modules as fault-prone or not fault-prone. Software metrics were collected from four consecutive releases of a very large legacy telecommunications system with six subsystems. Using release 1, four classification tree models...
Show moreSoftware quality models are tools for detecting faults early in the software development process. In this research, the TREEDISC algorithm and a general classification rule were used to create classification tree models and predict software quality by classifying software modules as fault-prone or not fault-prone. Software metrics were collected from four consecutive releases of a very large legacy telecommunications system with six subsystems. Using release 1, four classification tree models were built using raw metrics, and another four tree models were built using PCA metrics. Models were then selected based on release 2. Releases 3 and 4 were used to validate the selected model. Models that used PCA metrics were as good as or better than models that used raw metrics. This study also investigated the performance of classification tree models, when the subsystem identifier was included as a predictor.
Show less - Date Issued
- 2001
- PURL
- http://purl.flvc.org/fcla/dt/12747
- Subject Headings
- Computer software--Quality control, Software measurement
- Format
- Document (PDF)