Current Search: Machine Learning (x)
View All Items
Pages
- Title
- SCHEMATIC: AN EXPERIMENT IN MACHINE LEARNING USING CONCEPTUAL GRAPHS.
- Creator
- HALTERMAN, RICHARD L., Florida Atlantic University
- Abstract/Description
-
Conceptual graphs form the basis of a powerful representation language for artificial intelligence research. SCHEMATIC is a system that uses a subset of conceptual graph theory in acquiring knowledge about a given domain. SCHEMATIC exhibits two types of learning. It will passively absorb information as imparted by the teacher, and it also has an active learning mode that, based on its current picture of the domain, aggressively queries the teacher for more information. The knowledge base,...
Show moreConceptual graphs form the basis of a powerful representation language for artificial intelligence research. SCHEMATIC is a system that uses a subset of conceptual graph theory in acquiring knowledge about a given domain. SCHEMATIC exhibits two types of learning. It will passively absorb information as imparted by the teacher, and it also has an active learning mode that, based on its current picture of the domain, aggressively queries the teacher for more information. The knowledge base, including the concept type hierarchy, the relation list, canonical forms, and the current domain, are dynamically maintained. Teacher interaction is handled exclusively with conceptual graphs. Action concepts are treated differently by SCHEMATIC, in that, once defined, they execute procedures that alter the domain.
Show less - Date Issued
- 1987
- PURL
- http://purl.flvc.org/fcla/dt/14421
- Subject Headings
- Machine learning, Artificial intelligence
- Format
- Document (PDF)
- Title
- An evaluation of Unsupervised Machine Learning Algorithms for Detecting Fraud and Abuse in the U.S. Medicare Insurance Program.
- Creator
- Da Rosa, Raquel C., Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The population of people ages 65 and older has increased since the 1960s and current estimates indicate it will double by 2060. Medicare is a federal health insurance program for people 65 or older in the United States. Medicare claims fraud and abuse is an ongoing issue that wastes a large amount of money every year resulting in higher health care costs and taxes for everyone. In this study, an empirical evaluation of several unsupervised machine learning approaches is performed which...
Show moreThe population of people ages 65 and older has increased since the 1960s and current estimates indicate it will double by 2060. Medicare is a federal health insurance program for people 65 or older in the United States. Medicare claims fraud and abuse is an ongoing issue that wastes a large amount of money every year resulting in higher health care costs and taxes for everyone. In this study, an empirical evaluation of several unsupervised machine learning approaches is performed which indicates reasonable fraud detection results. We employ two unsupervised machine learning algorithms, Isolation Forest and Unsupervised Random Forest, which have not been previously used for the detection of fraud and abuse on Medicare data. Additionally, we implement three other machine learning methods previously applied on Medicare data which include: Local Outlier Factor, Autoencoder, and k-Nearest Neighbor. For our dataset, we combine the 2012 to 2015 Medicare provider utilization and payment data and add fraud labels from the List of Excluded Individuals/Entities (LEIE) database. Results show that Local Outlier Factor is the best model to use for Medicare fraud detection.
Show less - Date Issued
- 2018
- PURL
- http://purl.flvc.org/fau/fd/FA00013042
- Subject Headings
- Machine learning, Medicare fraud, Algorithms
- Format
- Document (PDF)
- Title
- ASSESSING METHODS AND TOOLS TO IMPROVE REPORTING, INCREASE TRANSPARENCY, AND REDUCE FAILURES IN MACHINE LEARNING APPLICATIONS IN HEALTHCARE.
- Creator
- Garbin, Christian, Marques, Oge, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Artificial intelligence (AI) had a few false starts – the AI winters of the 1970s and 1980s. We are now in what looks like an AI summer. There are many useful applications of AI in the field. But there are still unfulfilled promises and outright failures. From self-driving cars that work only in constrained cases, to medical image analysis products that would replace radiologists but never did, we still struggle to translate successful research into successful real-world applications. The...
Show moreArtificial intelligence (AI) had a few false starts – the AI winters of the 1970s and 1980s. We are now in what looks like an AI summer. There are many useful applications of AI in the field. But there are still unfulfilled promises and outright failures. From self-driving cars that work only in constrained cases, to medical image analysis products that would replace radiologists but never did, we still struggle to translate successful research into successful real-world applications. The software engineering community has accumulated a large body of knowledge over the decades on how to develop, release, and maintain products. AI products, being software products, benefit from some of that accumulated knowledge, but not all of it. AI products diverge from traditional software products in fundamental ways: their main component is not a specific piece of code, written for a specific purpose, but a generic piece of code, a model, customized by a training process driven by hyperparameters and a dataset. Datasets are usually large and models are opaque. We cannot directly inspect them as we can inspect the code of traditional software products. We need other methods to detect failures in AI products.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013580
- Subject Headings
- Machine learning, Artificial intelligence, Healthcare
- Format
- Document (PDF)
- Title
- GENERATIVE ADVERSARIAL NETWORK DATA GENERATION FOR THE USE OF REAL TIME IMAGE DETECTION IN SIDE-SCAN SONAR IMAGERY.
- Creator
- McGinley, James Patrick, Dhanak, Manhar, Florida Atlantic University, Department of Ocean and Mechanical Engineering, College of Engineering and Computer Science
- Abstract/Description
-
Automatic target recognition of unexploded ordnances in side scan sonar imagery has been a struggling task, due to the lack of publicly available side-scan sonar data. Real time image detection and classification algorithms have been implemented to combat this task, however, machine learning algorithms require a substantial amount of training data to properly detect specific targets. Transfer learning methods are used to replace the need of large datasets, by using a pre trained network on...
Show moreAutomatic target recognition of unexploded ordnances in side scan sonar imagery has been a struggling task, due to the lack of publicly available side-scan sonar data. Real time image detection and classification algorithms have been implemented to combat this task, however, machine learning algorithms require a substantial amount of training data to properly detect specific targets. Transfer learning methods are used to replace the need of large datasets, by using a pre trained network on the side-scan sonar images. In the present study the implementation of a generative adversarial network is used to generate meaningful sonar imagery from a small dataset. The generated images are then added to the existing dataset to train an image detection and classification algorithm. The study looks to demonstrate that generative images can be used to aid in detecting objects of interest in side-scan sonar imagery.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013394
- Subject Headings
- Sidescan sonar, Algorithms, Machine learning
- Format
- Document (PDF)
- Title
- MEASUREMENT, ANALYSIS, CLASSIFICATION AND DETECTION OF GUNSHOT AND GUNSHOT-LIKE SOUNDS.
- Creator
- Baliram, Rajesh Singh, Zhuang, Hanqi, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The recent uptick in senseless shootings in otherwise quiet and relatively safe environments is powerful evidence of the need, now more than ever, to reduce these occurrences. Artificial intelligence (AI) can play a significant role in deterring individuals from attempting these acts of violence. The installation of audio sensors can assist in the proper surveillance of surroundings linked to public safety, which is the first step toward AI-driven surveillance. With the increasing popularity...
Show moreThe recent uptick in senseless shootings in otherwise quiet and relatively safe environments is powerful evidence of the need, now more than ever, to reduce these occurrences. Artificial intelligence (AI) can play a significant role in deterring individuals from attempting these acts of violence. The installation of audio sensors can assist in the proper surveillance of surroundings linked to public safety, which is the first step toward AI-driven surveillance. With the increasing popularity of machine learning (ML) processes, systems are being developed and optimized to assist personnel in highly dangerous situations. In addition to saving innocent lives, supporting the capture of the responsible criminals is part of the AI algorithm that can be hosted in acoustic gunshot detection systems (AGDSs). Although there has been some speculation that these AGDSs produce a higher false positive rate (FPR) than reported in their specifications, optimizing the dataset used for the model’s training and testing will enhance its performance. This dissertation proposes a new gunshot-like sound database that can be incorporated into a dataset for improved training and testing of a ML gunshot detection model. Reduction of the sample bias (that is, a bias in ML caused by an incomplete database) is achievable. The Mel frequency cepstral coefficient (MFCC) feature extraction process was utilized in this research. The uniform manifold and projection (UMAP) algorithm revealed that the MFCCs of this newly created database were the closest sounds to a gunshot sound, as compared to other gunshot-like sounds reported in literature. The UMAP algorithm reinforced the outcome derived from the calculation of the distances of the centroids of various gunshot-like sounds in MFCCs’ clusters. Further research was conducted into the feature reduction aspect of the gunshot detection ML model. Reducing a feature set to a minimum, while also maintaining a high accuracy rate, is a key parameter of a highly efficient model. Therefore, it is necessary for field deployed ML applications to be computationally light weight and highly efficient. Building on the discoveries of this research can lead to the development of highly efficient gunshot detection models.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FA00014110
- Subject Headings
- Firearms, Sound, Detectors, Machine learning
- Format
- Document (PDF)
- Title
- ADVANCED DATA SCIENCE AND PHYSICS-BASED MODELING FOR DYNAMIC SYSTEMS.
- Creator
- Hashemi, Ali, Jang, Jinwoo, Florida Atlantic University, Department of Civil, Environmental and Geomatics Engineering, College of Engineering and Computer Science
- Abstract/Description
-
This dissertation focuses on the development of data-driven and physics-based modeling for two distinct significant structural engineering applications: time-varying response variables estimation and unwanted lateral vibration control. In the first part, I propose a machine learning (ML)-based surrogate modeling to directly predict dynamic responses over an entire mechanical system during operations. Any mechanical system design, as well as structural health monitoring systems, require...
Show moreThis dissertation focuses on the development of data-driven and physics-based modeling for two distinct significant structural engineering applications: time-varying response variables estimation and unwanted lateral vibration control. In the first part, I propose a machine learning (ML)-based surrogate modeling to directly predict dynamic responses over an entire mechanical system during operations. Any mechanical system design, as well as structural health monitoring systems, require transient vibration analysis. However, traditional methods and modeling calculations are time- and resource-consuming. The use of ML approaches is particularly promising in scientific and engineering challenges containing processes that are not completely understood, or where it is computationally infeasible to run numerical or analytical models at desired resolutions in space and time. In this research, an ML-based surrogate for the FEA approach is developed to forecast the time-varying response, i.e., displacement of a two-dimensional truss structure. Various ML regression algorithms including decision trees and deep neural networks are developed to predict movement over a truss structure, and their efficiencies are investigated. ML algorithms have been combined with FEA in preliminary attempts to address issues in static mechanical systems.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FA00014048
- Subject Headings
- Dynamics, Data Science, Machine learning
- Format
- Document (PDF)
- Title
- TACKLING BIAS, PRIVACY, AND SCARCITY CHALLENGES IN HEALTH DATA ANALYTICS.
- Creator
- Wang, Shuwen, Zhu, Xingquan, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Health data analysis has emerged as a critical domain with immense potential to revolutionize healthcare delivery, disease management, and medical research. However, it is confronted by formidable challenges, including sample bias, data privacy concerns, and the cost and scarcity of labeled data. These challenges collectively impede the development of accurate and robust machine learning models for various healthcare applications, from disease diagnosis to treatment recommendations. Sample...
Show moreHealth data analysis has emerged as a critical domain with immense potential to revolutionize healthcare delivery, disease management, and medical research. However, it is confronted by formidable challenges, including sample bias, data privacy concerns, and the cost and scarcity of labeled data. These challenges collectively impede the development of accurate and robust machine learning models for various healthcare applications, from disease diagnosis to treatment recommendations. Sample bias and specificity refer to the inherent challenges in working with health datasets that may not be representative of the broader population or may exhibit disparities in their distributions. These biases can significantly impact the generalizability and effectiveness of machine learning models in healthcare, potentially leading to suboptimal outcomes for certain patient groups. Data privacy and locality are paramount concerns in the era of digital health records and wearable devices. The need to protect sensitive patient information while still extracting valuable insights from these data sources poses a delicate balancing act. Moreover, the geographic and jurisdictional differences in data regulations further complicate the use of health data in a global context. Label cost and scarcity pertain to the often labor-intensive and expensive process of obtaining ground-truth labels for supervised learning tasks in healthcare. The limited availability of labeled data can hinder the development and deployment of machine learning models, particularly in specialized medical domains.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014336
- Subject Headings
- Data analytics, Data mining, Ensemble learning (Machine learning), Machine learning, Health
- Format
- Document (PDF)
- Title
- CONNECTED MULTI-DOMAIN AUTONOMY AND ARTIFICIAL INTELLIGENCE: AUTONOMOUS LOCALIZATION, NETWORKING, AND DATA CONFORMITY EVALUATION.
- Creator
- Tountas, Konstantinos, Pados, Dimitris, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The objective of this dissertation work is the development of a solid theoretical and algorithmic framework for three of the most important aspects of autonomous/artificialintelligence (AI) systems, namely data quality assurance, localization, and communications. In the era of AI and machine learning (ML), data reign supreme. During learning tasks, we need to ensure that the training data set is correct and complete. During operation, faulty data need to be discovered and dealt with to...
Show moreThe objective of this dissertation work is the development of a solid theoretical and algorithmic framework for three of the most important aspects of autonomous/artificialintelligence (AI) systems, namely data quality assurance, localization, and communications. In the era of AI and machine learning (ML), data reign supreme. During learning tasks, we need to ensure that the training data set is correct and complete. During operation, faulty data need to be discovered and dealt with to protect from -potentially catastrophic- system failures. With our research in data quality assurance, we develop new mathematical theory and algorithms for outlier-resistant decomposition of high-dimensional matrices (tensors) based on L1-norm principal-component analysis (PCA). L1-norm PCA has been proven to be resistant to irregular data-points and will drive critical real-world AI learning and autonomous systems operations in the future. At the same time, one of the most important tasks of autonomous systems is self-localization. In GPS-deprived environments, localization becomes a fundamental technical problem. State-of-the-art solutions frequently utilize power-hungry or expensive architectures, making them difficult to deploy. In this dissertation work, we develop and implement a robust, variable-precision localization technique for autonomous systems based on the direction-of-arrival (DoA) estimation theory, which is cost and power-efficient. Finally, communication between autonomous systems is paramount for mission success in many applications. In the era of 5G and beyond, smart spectrum utilization is key.. In this work, we develop physical (PHY) and medium-access-control (MAC) layer techniques that autonomously optimize spectrum usage and minimizes intra and internetwork interference.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013617
- Subject Headings
- Artificial intelligence, Machine learning, Tensor algebra
- Format
- Document (PDF)
- Title
- Ensemble Learning Algorithms for the Analysis of Bioinformatics Data.
- Creator
- Fazelpour, Alireza, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Developments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset...
Show moreDevelopments in advanced technologies, such as DNA microarrays, have generated tremendous amounts of data available to researchers in the field of bioinformatics. These state-of-the-art technologies present not only unprecedented opportunities to study biological phenomena of interest, but significant challenges in terms of processing the data. Furthermore, these datasets inherently exhibit a number of challenging characteristics, such as class imbalance, high dimensionality, small dataset size, noisy data, and complexity of data in terms of hard to distinguish decision boundaries between classes within the data. In recognition of the aforementioned challenges, this dissertation utilizes a variety of machine-learning and data-mining techniques, such as ensemble classification algorithms in conjunction with data sampling and feature selection techniques to alleviate these problems, while improving the classification results of models built on these datasets. However, in building classification models researchers and practitioners encounter the challenge that there is not a single classifier that performs relatively well in all cases. Thus, numerous classification approaches, such as ensemble learning methods, have been developed to address this problem successfully in a majority of circumstances. Ensemble learning is a promising technique that generates multiple classification models and then combines their decisions into a single final result. Ensemble learning often performs better than single-base classifiers in performing classification tasks. This dissertation conducts thorough empirical research by implementing a series of case studies to evaluate how ensemble learning techniques can be utilized to enhance overall classification performance, as well as improve the generalization ability of ensemble models. This dissertation investigates ensemble learning techniques of the boosting, bagging, and random forest algorithms, and proposes a number of modifications to the existing ensemble techniques in order to improve further the classification results. This dissertation examines the effectiveness of ensemble learning techniques on accounting for challenging characteristics of class imbalance and difficult-to-learn class decision boundaries. Next, it looks into ensemble methods that are relatively tolerant to class noise, and not only can account for the problem of class noise, but improves classification performance. This dissertation also examines the joint effects of data sampling along with ensemble techniques on whether sampling techniques can further improve classification performance of built ensemble models.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004588
- Subject Headings
- Bioinformatics., Data mining -- Technological innovations., Machine learning.
- Format
- Document (PDF)
- Title
- STREAMLINING CLINICAL DETECTION OF ALZHEIMER’S DISEASE USING ELECTRONIC HEALTH RECORDS AND MACHINE LEARNING TECHNIQUES.
- Creator
- Kleiman, Michael J., Barenholtz, Elan, Florida Atlantic University, Charles E. Schmidt College of Science, Department of Psychology
- Abstract/Description
-
Alzheimer’s disease is typically detected using a combination of cognitive-behavioral assessment exams and interviews of both the patient and a family member or caregiver, both administered and interpreted by a trained physician. This procedure, while standard in medical practice, can be time consuming and expensive for both the patient and the diagnostician especially because proper training is required to interpret the collected information and determine an appropriate diagnosis. The use of...
Show moreAlzheimer’s disease is typically detected using a combination of cognitive-behavioral assessment exams and interviews of both the patient and a family member or caregiver, both administered and interpreted by a trained physician. This procedure, while standard in medical practice, can be time consuming and expensive for both the patient and the diagnostician especially because proper training is required to interpret the collected information and determine an appropriate diagnosis. The use of machine learning techniques to augment diagnostic procedures has been previously examined in limited capacity but to date no research examines real-world medical applications of predictive analytics for health records and cognitive exam scores. This dissertation seeks to examine the efficacy of detecting cognitive impairment due to Alzheimer’s disease using machine learning, including multi-modal neural network architectures, with a real-world clinical dataset used to determine the accuracy and applicability of the generated models. An in-depth analysis of each type of data (e.g. cognitive exams, questionnaires, demographics) as well as the cognitive domains examined (e.g. memory, attention, language) is performed to identify the most useful targets, with cognitive exams and questionnaires being found to be the most useful features and short-term memory, attention, and language found to be the most important cognitive domains. In an effort to reduce medical costs and streamline procedures, optimally predictive and efficient groups of features were identified and selected, with the best performing and economical group containing only three questions and one cognitive exam component, producing an accuracy of 85%. The most effective diagnostic scoring procedure was examined, with simple threshold counting based on medical documentation being identified as the most useful. Overall predictive analysis found that Alzheimer’s disease can be detected most accurately using a bimodal multi-input neural network model using separated cognitive domains and questionnaires, with a detection accuracy of 88% using the real-world testing set, and that the technique of analyzing domains separately serves to significantly improve model efficacy compared to models that combine them.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013326
- Subject Headings
- Alzheimer's disease, Electronic Health Records, Machine learning
- Format
- Document (PDF)
- Title
- SEAWALL DETECTION IN FLORIDA COASTAL AREA FROM HIGH RESOLUTION IMAGERY USING MACHINE LEARNING AND OBIA.
- Creator
- Paudel, Sanjaya, Su, Hongbo, Florida Atlantic University, Department of Civil, Environmental and Geomatics Engineering, College of Engineering and Computer Science
- Abstract/Description
-
In this thesis, a methodology and framework were created to detect the seawalls accurately and efficiently in low coastal areas and was evaluated in the study area of Hallandale Beach City, Broward County, Florida. Aerial images collected from the Florida Department of Transportation (FDOT) were processed using eCognition Developer software for Multi-Resolution Segmentation and Classification of objects. Two classification approaches, pixel-based image analysis, and the object-based image...
Show moreIn this thesis, a methodology and framework were created to detect the seawalls accurately and efficiently in low coastal areas and was evaluated in the study area of Hallandale Beach City, Broward County, Florida. Aerial images collected from the Florida Department of Transportation (FDOT) were processed using eCognition Developer software for Multi-Resolution Segmentation and Classification of objects. Two classification approaches, pixel-based image analysis, and the object-based image analysis (OBIA) method were applied for image classification. However, Pixel based classification was discarded for having less accuracy in output. Three techniques within object-based classification-machine learning technique, knowledge-based technique and machine learning followed by knowledge-based technique were used to compare the most efficient method of classification. While performing the machine learning technique, three algorithms: Random Forest, support vector machine and decision tree were applied to test the best algorithm. Of all the approaches used, the combination of machine learning and a knowledge-based method was able to map the sea wall effectively.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013802
- Subject Headings
- Image analysis, Coasts--Florida, Machine learning
- Format
- Document (PDF)
- Title
- Machine Learning Algorithms with Big Medicare Fraud Data.
- Creator
- Bauder, Richard Andrew, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Healthcare is an integral component in peoples lives, especially for the rising elderly population, and must be affordable. The United States Medicare program is vital in serving the needs of the elderly. The growing number of people enrolled in the Medicare program, along with the enormous volume of money involved, increases the appeal for, and risk of, fraudulent activities. For many real-world applications, including Medicare fraud, the interesting observations tend to be less frequent...
Show moreHealthcare is an integral component in peoples lives, especially for the rising elderly population, and must be affordable. The United States Medicare program is vital in serving the needs of the elderly. The growing number of people enrolled in the Medicare program, along with the enormous volume of money involved, increases the appeal for, and risk of, fraudulent activities. For many real-world applications, including Medicare fraud, the interesting observations tend to be less frequent than the normative observations. This difference between the normal observations and those observations of interest can create highly imbalanced datasets. The problem of class imbalance, to include the classification of rare cases indicating extreme class imbalance, is an important and well-studied area in machine learning. The effects of class imbalance with big data in the real-world Medicare fraud application domain, however, is limited. In particular, the impact of detecting fraud in Medicare claims is critical in lessening the financial and personal impacts of these transgressions. Fortunately, the healthcare domain is one such area where the successful detection of fraud can garner meaningful positive results. The application of machine learning techniques, plus methods to mitigate the adverse effects of class imbalance and rarity, can be used to detect fraud and lessen the impacts for all Medicare beneficiaries. This dissertation presents the application of machine learning approaches to detect Medicare provider claims fraud in the United States. We discuss novel techniques to process three big Medicare datasets and create a new, combined dataset, which includes mapping fraud labels associated with known excluded providers. We investigate the ability of machine learning techniques, unsupervised and supervised, to detect Medicare claims fraud and leverage data sampling methods to lessen the impact of class imbalance and increase fraud detection performance. Additionally, we extend the study of class imbalance to assess the impacts of rare cases in big data for Medicare fraud detection.
Show less - Date Issued
- 2018
- PURL
- http://purl.flvc.org/fau/fd/FA00013108
- Subject Headings
- Medicare fraud, Big data, Machine learning, Algorithms
- Format
- Document (PDF)
- Title
- MACHINE LEARNING ALGORITHMS FOR THE DETECTION AND ANALYSIS OF WEB ATTACKS.
- Creator
- Zuech, Richard, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The Internet has provided humanity with many great benefits, but it has also introduced new risks and dangers. E-commerce and other web portals have become large industries with big data. Criminals and other bad actors constantly seek to exploit these web properties through web attacks. Being able to properly detect these web attacks is a crucial component in the overall cybersecurity landscape. Machine learning is one tool that can assist in detecting web attacks. However, properly using...
Show moreThe Internet has provided humanity with many great benefits, but it has also introduced new risks and dangers. E-commerce and other web portals have become large industries with big data. Criminals and other bad actors constantly seek to exploit these web properties through web attacks. Being able to properly detect these web attacks is a crucial component in the overall cybersecurity landscape. Machine learning is one tool that can assist in detecting web attacks. However, properly using machine learning to detect web attacks does not come without its challenges. Classification algorithms can have difficulty with severe levels of class imbalance. Class imbalance occurs when one class label disproportionately outnumbers another class label. For example, in cybersecurity, it is common for the negative (normal) label to severely outnumber the positive (attack) label. Another difficulty encountered in machine learning is models can be complex, thus making it difficult for even subject matter experts to truly understand a model’s detection process. Moreover, it is important for practitioners to determine which input features to include or exclude in their models for optimal detection performance. This dissertation studies machine learning algorithms in detecting web attacks with big data. Severe class imbalance is a common problem in cybersecurity, and mainstream machine learning research does not sufficiently consider this with web attacks. Our research first investigates the problems associated with severe class imbalance and rarity. Rarity is an extreme form of class imbalance where the positive class suffers extremely low positive class count, thus making it difficult for the classifiers to discriminate. In reducing imbalance, we demonstrate random undersampling can effectively mitigate the class imbalance and rarity problems associated with web attacks. Furthermore, our research introduces a novel feature popularity technique which produces easier to understand models by only including the fewer, most popular features. Feature popularity granted us new insights into the web attack detection process, even though we had already intensely studied it. Even so, we proceed cautiously in selecting the best input features, as we determined that the “most important” Destination Port feature might be contaminated by lopsided traffic distributions.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013823
- Subject Headings
- Machine learning, Computer security, Algorithms, Cybersecurity
- Format
- Document (PDF)
- Title
- COLLECTION AND ANALYSIS OF SLOW DENIAL OF SERVICE ATTACKS USING MACHINE LEARNING ALGORITHMS.
- Creator
- Kemp, Clifford, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Application-layer based attacks are becoming a more desirable target in computer networks for hackers. From complex rootkits to Denial of Service (DoS) attacks, hackers look to compromise computer networks. Web and application servers can get shut down by various application-layer DoS attacks, which exhaust CPU or memory resources. The HTTP protocol has become a popular target to launch application-layer DoS attacks. These exploits consume less bandwidth than traditional DoS attacks....
Show moreApplication-layer based attacks are becoming a more desirable target in computer networks for hackers. From complex rootkits to Denial of Service (DoS) attacks, hackers look to compromise computer networks. Web and application servers can get shut down by various application-layer DoS attacks, which exhaust CPU or memory resources. The HTTP protocol has become a popular target to launch application-layer DoS attacks. These exploits consume less bandwidth than traditional DoS attacks. Furthermore, this type of DoS attack is hard to detect because its network traffic resembles legitimate network requests. Being able to detect these DoS attacks effectively is a critical component of any robust cybersecurity system. Machine learning can help detect DoS attacks by identifying patterns in network traffic. With machine learning methods, predictive models can automatically detect network threats. This dissertation offers a novel framework for collecting several attack datasets on a live production network, where producing quality representative data is a requirement. Our approach builds datasets from collected Netflow and Full Packet Capture (FPC) data. We evaluate a wide range of machine learning classifiers which allows us to analyze slow DoS detection models more thoroughly. To identify attacks, we look at each dataset's unique traffic patterns and distinguishing properties. This research evaluates and investigates appropriate feature selection evaluators and search strategies. Features are assessed for their predictive value and degree of redundancy to build a subset of features. Feature subsets with high-class correlation but low intercorrelation are favored. Experimental results indicate Netflow and FPC features are discriminating enough to detect DoS attacks accurately. We conduct a comparative examination of performance metrics to determine the capability of several machine learning classifiers. Additionally, we improve upon our performance scores by investigating a variety of feature selection optimization strategies. Overall, this dissertation proposes a novel machine learning approach for detecting slow DoS attacks. Our machine learning results demonstrate that a single subset of features trained on Netflow data can effectively detect slow application-layer DoS attacks.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013848
- Subject Headings
- Machine learning, Algorithms, Denial of service attacks
- Format
- Document (PDF)
- Title
- A REVIEW AND ANALYSIS OF BOT-IOT SECURITY DATA FOR MACHINE LEARNING.
- Creator
- Peterson, Jared M., Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Machine learning is having an increased impact on the Cyber Security landscape. The ability for predictive models to accurately identify attack patterns in security data is set to overtake more traditional detection methods. Industry demand has led to an uptick in research in the application of machine learning for Cyber Security. To facilitate this research many datasets have been created and made public. This thesis provides an in-depth analysis of one of the newest datasets, Bot-IoT. The...
Show moreMachine learning is having an increased impact on the Cyber Security landscape. The ability for predictive models to accurately identify attack patterns in security data is set to overtake more traditional detection methods. Industry demand has led to an uptick in research in the application of machine learning for Cyber Security. To facilitate this research many datasets have been created and made public. This thesis provides an in-depth analysis of one of the newest datasets, Bot-IoT. The full dataset contains about 73 million instances (big data), 3 dependent features, and 43 independent features. The purpose of this thesis is to provide researchers with a foundational understanding of Bot-IoT, its development, its features, its composition, and its pitfalls. It will also summarize many of the published works that utilize Bot-IoT and will propose new areas of research based on the issues identified in the current research and in the dataset.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013838
- Subject Headings
- Machine learning, Cyber security, Big data
- Format
- Document (PDF)
- Title
- A MACHINE LEARNING APPROACH FOR OCEAN EVENT MODELING AND PREDICTION.
- Creator
- Muhamed, Ali Ali Abdullateef, Zhuang, Hanqi, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
In the last decade, deep learning models have been successfully applied to a variety of applications and solved many tasks. The ultimate goal of this study is to produce deep learning models to improve the skills of forecasting ocean dynamic events in general and those of the Loop Current (LC) system in particular. A specific forecast target is to predict the geographic location of the (LC) extension and duration, LC eddy shedding events for a long lead time with high accuracy. Also, this...
Show moreIn the last decade, deep learning models have been successfully applied to a variety of applications and solved many tasks. The ultimate goal of this study is to produce deep learning models to improve the skills of forecasting ocean dynamic events in general and those of the Loop Current (LC) system in particular. A specific forecast target is to predict the geographic location of the (LC) extension and duration, LC eddy shedding events for a long lead time with high accuracy. Also, this study aims to improve the predictability of velocity fields (or more precisely, velocity volumes) of subsurface currents. In this dissertation, several deep learning based prediction models have been proposed. The core of these models is the Long-Short Term Memory (LSTM) network. This type of recurrent neural network is trained with Sea Surface Height (SSH) and LC velocity datasets. The hyperparameters of these models are tuned according to each model's characteristics and data complexity. Prior to training, SSH and velocity data are decomposed into their temporal and spatial counterparts.A model uses the Robust Principle Component Analysis is first proposed, which produces a six-week lead time in forecasting SSH evolution. Next, the Wavelet+EOF+LSTM (WELL) model is proposed to improve the forecasting capability of a prediction model. This model is tested on the prediction of two LC eddies, namely eddy Cameron and Darwin. It is shown that the WELL model can predict the separation of both eddies 10 and 14 weeks ahead respectively, which is two more weeks than the DAC model. Furthermore, the WELL model overcomes the problem due to the partitioning step involved in the DAC model. For subsurface currents forecasting, a layer partitioning method is proposed to predict the subsurface field of the LC system. A weighted average fusion is used to improve the consistency of the predicted layers of the 3D subsurface velocity field. The main challenge of forecasting of the LC and its eddies is the small number of events that have occurred over time, which is only once or twice a year, which makes the training task difficult. Forecasting the velocity of subsurface currents is equally challenging because of the limited insitu measurements.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013727
- Subject Headings
- Machine learning, Loop Current, Oceanography--Forecasting
- Format
- Document (PDF)
- Title
- MACHINE LEARNING METHODS FOR IMAGE ENHANCEMENT IN DEGRADED VISUAL ENVIRONMENTS.
- Creator
- Estrada, Dennis, Tang, Yufei, Ouyang, Bing, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Significant reduction in space, weight, power, and cost (SWAP-C) of imaging hardware has induced a paradigm shift in remote sensing where unmanned platforms have become the mainstay. However, mitigating the degraded visual environment (DVE) remains an issue. DVEs can cause a loss of contrast and image detail due to particle scattering and distortion due to turbulence-induced effects. The problem is especially challenging when imaging from unmanned platforms such as autonomous underwater...
Show moreSignificant reduction in space, weight, power, and cost (SWAP-C) of imaging hardware has induced a paradigm shift in remote sensing where unmanned platforms have become the mainstay. However, mitigating the degraded visual environment (DVE) remains an issue. DVEs can cause a loss of contrast and image detail due to particle scattering and distortion due to turbulence-induced effects. The problem is especially challenging when imaging from unmanned platforms such as autonomous underwater vehicles (AUV) and unmanned ariel vehicles (UAV). While single-frame image restoration techniques have been studied extensively in recent years, single image capture is not adequate to address the effects of DVEs due to under-sampling, low dynamic range, and chromatic aberration. Significant development has been made to employ multi-frame image fusion techniques to take advantage of spatial and temporal information to aid in the recovery of corrupted image detail and high-frequency content and increasing dynamic range.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FA00013987
- Subject Headings
- Image Enhancement, Machine learning, Remote sensing
- Format
- Document (PDF)
- Title
- ADDRESSING HIGHLY IMBALANCED BIG DATA CHALLENGES FOR MEDICARE FRAUD DETECTION.
- Creator
- Johnson, Justin M., Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Access to affordable healthcare is a nationwide concern that impacts most of the United States population. Medicare is a federal government healthcare program that aims to provide affordable health insurance to the elderly population and individuals with select disabilities. Unfortunately, there is a significant amount of fraud, waste, and abuse within the Medicare system that inevitably raises premiums and costs taxpayers billions of dollars each year. Dedicated task forces investigate the...
Show moreAccess to affordable healthcare is a nationwide concern that impacts most of the United States population. Medicare is a federal government healthcare program that aims to provide affordable health insurance to the elderly population and individuals with select disabilities. Unfortunately, there is a significant amount of fraud, waste, and abuse within the Medicare system that inevitably raises premiums and costs taxpayers billions of dollars each year. Dedicated task forces investigate the most severe fraudulent cases, but with millions of healthcare providers and more than 60 million active Medicare beneficiaries, manual fraud detection efforts are not able to make widespread, meaningful impact. Through the proliferation of electronic health records and continuous breakthroughs in data mining and machine learning, there is a great opportunity to develop and leverage advanced machine learning systems for automating healthcare fraud detection. This dissertation identifies key challenges associated with predictive modeling for large-scale Medicare fraud detection and presents innovative solutions to address these challenges in order to provide state-of-the-art results on multiple real-world Medicare fraud data sets. Our methodology for curating nine distinct Medicare fraud classification data sets is presented with comprehensive details describing data accumulation, data pre-processing, data aggregation techniques, data enrichment strategies, and improved fraud labeling. Data-level and algorithm-level methods for treating severe class imbalance, including a flexible output thresholding method and a cost-sensitive framework, are evaluated using deep neural network and ensemble learners. Novel encoding techniques and representation learning methods for high-dimensional categorical features are proposed to create expressive representations of provider attributes and billing procedure codes.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FA00014057
- Subject Headings
- Medicare fraud, Big data, Machine learning
- Format
- Document (PDF)
- Title
- INVESTIGATING AND IMPROVING FAIRNESS AND BIAS IN MACHINE LEARNING MODELS FOR DERMATOLOGY.
- Creator
- Corbin, Adam, Marques, Oge, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Advancements in Artificial Intelligence (AI) and Machine Learning (ML) have significantly improved their application in dermatology. However, bias issues in AI systems can result in missed diagnoses and disparities in healthcare, especially for individuals with different skin types. This dissertation aims to investigate and improve the fairness and bias in machine learning models for dermatology by evaluating and enhancing their performance across different Fitzpatrick skin types. The...
Show moreAdvancements in Artificial Intelligence (AI) and Machine Learning (ML) have significantly improved their application in dermatology. However, bias issues in AI systems can result in missed diagnoses and disparities in healthcare, especially for individuals with different skin types. This dissertation aims to investigate and improve the fairness and bias in machine learning models for dermatology by evaluating and enhancing their performance across different Fitzpatrick skin types. The technical contributions of the dissertation include generating metadata for Fitzpatrick Skin Type using Individual Typology Angle; outlining best practices for Explainable AI (XAI) and the use of colormaps; developing and enhancing ML models through skin color transformation and extending the models to include XAI methods for better interpretation and improvement of fairness and bias; and providing a list of steps for successful application of deep learning in medical image analysis. The research findings of this dissertation have the potential to contribute to the development of fair and unbiased AI/ML models in dermatology. This can ultimately lead to better health outcomes and reduced healthcare costs, particularly for individuals with different skin types.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014131
- Subject Headings
- Diagnostic Imaging, Machine learning, Dermatology, Artificial intelligence
- Format
- Document (PDF)
- Title
- QUANTIFICATION OF PERMAFROST THAW DEPTH AND SNOW DEPTH IN INTERIOR ALASKA AT MULTIPLE SCALES USING FIELD, AIRBORNE, AND SPACEBORNE DATA.
- Creator
- Brodylo, David, Zhang, Caiyun, Florida Atlantic University, Department of Geosciences, Charles E. Schmidt College of Science
- Abstract/Description
-
Much of Interior Alaska contains permafrost, which is a permanently frozen layer found within or at the surface of the Earth. Historically, this permafrost has experienced relative stability, with limited thaw during warmer summer months and fire events. However, largely due to the impact of a warming climate, among other factors, permafrost that would typically experience limited thawing during the summer season has recently been thawing at an unprecedented rate. Trapped by this layer of...
Show moreMuch of Interior Alaska contains permafrost, which is a permanently frozen layer found within or at the surface of the Earth. Historically, this permafrost has experienced relative stability, with limited thaw during warmer summer months and fire events. However, largely due to the impact of a warming climate, among other factors, permafrost that would typically experience limited thawing during the summer season has recently been thawing at an unprecedented rate. Trapped by this layer of permafrost is a large quantity of carbon (C), which could be released into the atmosphere as greenhouse gases such as carbon dioxide (CO2) and methane (CH4). Due to the remoteness of the Arctic, there is a lack of yearly recorded permafrost thaw depth and snow depth values across much of the region. As such, the focus of this research was to establish a framework to identify how permafrost thaw depth and snow depth can be predicted across both a 1 km2 local scale and a 100 km2 regional scale in Interior Alaska by a combination of 1 m2 field data, airborne and spaceborne remote sensing products, and object-based machine learning techniques from 2014 – 2022. Machine learning techniques Random Forest, Support Vector Machine, k-Nearest Neighbor, Multiple Linear Regression, and Ensemble Analysis were applied to predict the permafrost thaw depth and snow depth. Results indicated that this methodology was able to successfully upscale both the 1 m2 field permafrost thaw depth and snow depth data to a 1 km2 local scale before successfully further upscaling the estimated results to a 100 km2 regional scale, while also linking the estimated values with ecotypes. The best results were produced by Ensemble Analysis, which tended to have the highest Pearson’s Correlation Coefficient, alongside the lowest Mean Absolute Error and Root Mean Square Error. Both Random Forest and k-Nearest Neighbor also provided encouraging results. The presence or absence of a thick canopy cover was strongly connected with thaw depth and snow depth estimates. Image resolution was an important factor when upscaling field data to the local scale, however it was overall less critical for further upscaling to the regional scale.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014229
- Subject Headings
- Permafrost--Alaska, Remote sensing, Machine learning
- Format
- Document (PDF)