Current Search: Florida Atlantic University (x) » Machine learning (x)
View All Items
Pages
- Title
- SCHEMATIC: AN EXPERIMENT IN MACHINE LEARNING USING CONCEPTUAL GRAPHS.
- Creator
- HALTERMAN, RICHARD L., Florida Atlantic University
- Abstract/Description
-
Conceptual graphs form the basis of a powerful representation language for artificial intelligence research. SCHEMATIC is a system that uses a subset of conceptual graph theory in acquiring knowledge about a given domain. SCHEMATIC exhibits two types of learning. It will passively absorb information as imparted by the teacher, and it also has an active learning mode that, based on its current picture of the domain, aggressively queries the teacher for more information. The knowledge base,...
Show moreConceptual graphs form the basis of a powerful representation language for artificial intelligence research. SCHEMATIC is a system that uses a subset of conceptual graph theory in acquiring knowledge about a given domain. SCHEMATIC exhibits two types of learning. It will passively absorb information as imparted by the teacher, and it also has an active learning mode that, based on its current picture of the domain, aggressively queries the teacher for more information. The knowledge base, including the concept type hierarchy, the relation list, canonical forms, and the current domain, are dynamically maintained. Teacher interaction is handled exclusively with conceptual graphs. Action concepts are treated differently by SCHEMATIC, in that, once defined, they execute procedures that alter the domain.
Show less - Date Issued
- 1987
- PURL
- http://purl.flvc.org/fcla/dt/14421
- Subject Headings
- Machine learning, Artificial intelligence
- Format
- Document (PDF)
- Title
- An Empirical Study of Performance Metrics for Classifier Evaluation in Machine Learning.
- Creator
- Bruhns, Stefan, Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
A variety of classifiers for solving classification problems is available from the domain of machine learning. Commonly used classifiers include support vector machines, decision trees and neural networks. These classifiers can be configured by modifying internal parameters. The large number of available classifiers and the different configuration possibilities result in a large number of combinatiorrs of classifier and configuration settings, leaving the practitioner with the problem of...
Show moreA variety of classifiers for solving classification problems is available from the domain of machine learning. Commonly used classifiers include support vector machines, decision trees and neural networks. These classifiers can be configured by modifying internal parameters. The large number of available classifiers and the different configuration possibilities result in a large number of combinatiorrs of classifier and configuration settings, leaving the practitioner with the problem of evaluating the performance of different classifiers. This problem can be solved by using performance metrics. However, the large number of available metrics causes difficulty in deciding which metrics to use and when comparing classifiers on the basis of multiple metrics. This paper uses the statistical method of factor analysis in order to investigate the relationships between several performance metrics and introduces the concept of relative performance which has the potential to case the process of comparing several classifiers. The relative performance metric is also used to evaluate different support vector machine classifiers and to determine if the default settings in the Weka data mining tool are reasonable.
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/fau/fd/FA00012508
- Subject Headings
- Machine learning, Computer algorithms, Pattern recognition systems, Data structures (Computer science), Kernel functions, Pattern perception--Data processing
- Format
- Document (PDF)
- Title
- Evolutionary Methods for Mining Data with Class Imbalance.
- Creator
- Drown, Dennis J., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different...
Show moreClass imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012515
- Subject Headings
- Combinatorial group theory, Data mining, Machine learning, Data structure (Computer science)
- Format
- Document (PDF)
- Title
- An empirical study of combining techniques in software quality classification.
- Creator
- Eroglu, Cemal., Florida Atlantic University, Khoshgoftaar, Taghi M.
- Abstract/Description
-
In the literature, there has been limited research that systematically investigates the possibility of exercising a hybrid approach by simply learning from the output of numerous base-level learners. We analyze a hybrid learning approach upon the systems that had previously been worked with twenty-four different classifiers. Instead of relying on only one classifier's judgment, it is expected that taking into account the opinions of several learners is a wise decision. Moreover, by using...
Show moreIn the literature, there has been limited research that systematically investigates the possibility of exercising a hybrid approach by simply learning from the output of numerous base-level learners. We analyze a hybrid learning approach upon the systems that had previously been worked with twenty-four different classifiers. Instead of relying on only one classifier's judgment, it is expected that taking into account the opinions of several learners is a wise decision. Moreover, by using clustering techniques some base-level classifiers were eliminated from the hybrid learner input. We had three different experiments each with a different number of base-level classifiers. We empirically show that the hybrid learning approach generally yields better performance than the best selected base-level learners and majority voting under some conditions.
Show less - Date Issued
- 2004
- PURL
- http://purl.flvc.org/fcla/dt/13162
- Subject Headings
- Computer software--Testing, Computer software--Quality control, Computational learning theory, Machine learning, Digital computer simulation
- Format
- Document (PDF)
- Title
- Efficient Machine Learning Algorithms for Identifying Risk Factors of Prostate and Breast Cancers among Males and Females.
- Creator
- Rikhtehgaran, Samaneh, Muhammad, Wazir, Florida Atlantic University, Department of Physics, Charles E. Schmidt College of Science
- Abstract/Description
-
One of the most common types of cancer among women is breast cancer. It represents one of the diseases leading to a high number of mortalities among women. On the other hand, prostate cancer is the second most frequent malignancy in men worldwide. The early detection of prostate cancer is fundamental to reduce mortality and increase the survival rate. A comparison between six types of machine learning models as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, k Nearest...
Show moreOne of the most common types of cancer among women is breast cancer. It represents one of the diseases leading to a high number of mortalities among women. On the other hand, prostate cancer is the second most frequent malignancy in men worldwide. The early detection of prostate cancer is fundamental to reduce mortality and increase the survival rate. A comparison between six types of machine learning models as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, k Nearest Neighbors, and Naïve Bayes has been performed. This research aims to identify the most efficient machine learning algorithms for identifying the most significant risk factors of prostate and breast cancers. For this reason, National Health Interview Survey (NHIS) and Prostate, Lung, Colorectal, and Ovarian (PLCO) datasets are used. A comprehensive comparison of risk factors leading to these two crucial cancers can significantly impact early detection and progressive improvement in survival.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013755
- Subject Headings
- Machine learning, Algorithms, Cancer--Risk factors, Breast--Cancer, Prostate--Cancer
- Format
- Document (PDF)
- Title
- STREAMLINING CLINICAL DETECTION OF ALZHEIMER’S DISEASE USING ELECTRONIC HEALTH RECORDS AND MACHINE LEARNING TECHNIQUES.
- Creator
- Kleiman, Michael J., Barenholtz, Elan, Florida Atlantic University, Charles E. Schmidt College of Science, Department of Psychology
- Abstract/Description
-
Alzheimer’s disease is typically detected using a combination of cognitive-behavioral assessment exams and interviews of both the patient and a family member or caregiver, both administered and interpreted by a trained physician. This procedure, while standard in medical practice, can be time consuming and expensive for both the patient and the diagnostician especially because proper training is required to interpret the collected information and determine an appropriate diagnosis. The use of...
Show moreAlzheimer’s disease is typically detected using a combination of cognitive-behavioral assessment exams and interviews of both the patient and a family member or caregiver, both administered and interpreted by a trained physician. This procedure, while standard in medical practice, can be time consuming and expensive for both the patient and the diagnostician especially because proper training is required to interpret the collected information and determine an appropriate diagnosis. The use of machine learning techniques to augment diagnostic procedures has been previously examined in limited capacity but to date no research examines real-world medical applications of predictive analytics for health records and cognitive exam scores. This dissertation seeks to examine the efficacy of detecting cognitive impairment due to Alzheimer’s disease using machine learning, including multi-modal neural network architectures, with a real-world clinical dataset used to determine the accuracy and applicability of the generated models. An in-depth analysis of each type of data (e.g. cognitive exams, questionnaires, demographics) as well as the cognitive domains examined (e.g. memory, attention, language) is performed to identify the most useful targets, with cognitive exams and questionnaires being found to be the most useful features and short-term memory, attention, and language found to be the most important cognitive domains. In an effort to reduce medical costs and streamline procedures, optimally predictive and efficient groups of features were identified and selected, with the best performing and economical group containing only three questions and one cognitive exam component, producing an accuracy of 85%. The most effective diagnostic scoring procedure was examined, with simple threshold counting based on medical documentation being identified as the most useful. Overall predictive analysis found that Alzheimer’s disease can be detected most accurately using a bimodal multi-input neural network model using separated cognitive domains and questionnaires, with a detection accuracy of 88% using the real-world testing set, and that the technique of analyzing domains separately serves to significantly improve model efficacy compared to models that combine them.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013326
- Subject Headings
- Alzheimer's disease, Electronic Health Records, Machine learning
- Format
- Document (PDF)
- Title
- MODELING GROUND ELEVATION OF LOUISIANA COASTAL WETLANDS AND ANALYZING RELATIVE SEA LEVEL RISE INUNDATION USING RSET-MH AND LIDAR MEASUREMENTS.
- Creator
- Liu, Jing, Zhang, Caiyun, Florida Atlantic University, Department of Geosciences, Charles E. Schmidt College of Science
- Abstract/Description
-
The Louisiana coastal ecosystem is experiencing increasing threats from human flood control construction, sea-level rise (SLR), and subsidence. Louisiana lost about 4,833 km2 of coastal wetlands from 1932 to 2016, and concern exists whether remaining wetlands will persist while facing the highest rate of relative sea-level rise (RSLR) in the world. Restoration aimed at rehabilitating the ongoing and future disturbances is currently underway through the implementation of the Coastal Wetlands...
Show moreThe Louisiana coastal ecosystem is experiencing increasing threats from human flood control construction, sea-level rise (SLR), and subsidence. Louisiana lost about 4,833 km2 of coastal wetlands from 1932 to 2016, and concern exists whether remaining wetlands will persist while facing the highest rate of relative sea-level rise (RSLR) in the world. Restoration aimed at rehabilitating the ongoing and future disturbances is currently underway through the implementation of the Coastal Wetlands Planning Protection and Restoration Act of 1990 (CWPPRA). To effectively monitor the progress of projects in CWPPRA, the Coastwide Reference Monitoring System (CRMS) was established in 2006. To date, more than a decade of valuable coastal, environmental, and ground elevation data have been collected and archived. This dataset offers a unique opportunity to evaluate the wetland ground elevation dynamics by linking the Rod Surface Elevation Table (RSET) measurements with environmental variables like water salinity and biophysical variables like canopy coverage. This dissertation research examined the effects of the environmental and biophysical variables on wetland terrain elevation by developing innovative machine learning based models to quantify the contribution of each factor using the CRMS collected dataset. Three modern machine learning algorithms, including Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN), were assessed and cross-compared with the commonly used Multiple Linear Regression (MLR). The results showed that RF had the best performance in modeling ground elevation with Root Mean Square Error (RMSE) of 10.8 cm and coefficient of coefficient (r) = 0.74. The top four factors contributing to ground elevation are the distance from monitoring station to closest water source, water salinity, water elevation, and dominant vegetation height.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013568
- Subject Headings
- Coastal zone management--Louisiana, Sea level rise, Inundations, Wetland restoration--Louisiana, Machine learning, Computer simulation, Algorithms.
- Format
- Document (PDF)
- Title
- QUANTIFICATION OF PERMAFROST THAW DEPTH AND SNOW DEPTH IN INTERIOR ALASKA AT MULTIPLE SCALES USING FIELD, AIRBORNE, AND SPACEBORNE DATA.
- Creator
- Brodylo, David, Zhang, Caiyun, Florida Atlantic University, Department of Geosciences, Charles E. Schmidt College of Science
- Abstract/Description
-
Much of Interior Alaska contains permafrost, which is a permanently frozen layer found within or at the surface of the Earth. Historically, this permafrost has experienced relative stability, with limited thaw during warmer summer months and fire events. However, largely due to the impact of a warming climate, among other factors, permafrost that would typically experience limited thawing during the summer season has recently been thawing at an unprecedented rate. Trapped by this layer of...
Show moreMuch of Interior Alaska contains permafrost, which is a permanently frozen layer found within or at the surface of the Earth. Historically, this permafrost has experienced relative stability, with limited thaw during warmer summer months and fire events. However, largely due to the impact of a warming climate, among other factors, permafrost that would typically experience limited thawing during the summer season has recently been thawing at an unprecedented rate. Trapped by this layer of permafrost is a large quantity of carbon (C), which could be released into the atmosphere as greenhouse gases such as carbon dioxide (CO2) and methane (CH4). Due to the remoteness of the Arctic, there is a lack of yearly recorded permafrost thaw depth and snow depth values across much of the region. As such, the focus of this research was to establish a framework to identify how permafrost thaw depth and snow depth can be predicted across both a 1 km2 local scale and a 100 km2 regional scale in Interior Alaska by a combination of 1 m2 field data, airborne and spaceborne remote sensing products, and object-based machine learning techniques from 2014 – 2022. Machine learning techniques Random Forest, Support Vector Machine, k-Nearest Neighbor, Multiple Linear Regression, and Ensemble Analysis were applied to predict the permafrost thaw depth and snow depth. Results indicated that this methodology was able to successfully upscale both the 1 m2 field permafrost thaw depth and snow depth data to a 1 km2 local scale before successfully further upscaling the estimated results to a 100 km2 regional scale, while also linking the estimated values with ecotypes. The best results were produced by Ensemble Analysis, which tended to have the highest Pearson’s Correlation Coefficient, alongside the lowest Mean Absolute Error and Root Mean Square Error. Both Random Forest and k-Nearest Neighbor also provided encouraging results. The presence or absence of a thick canopy cover was strongly connected with thaw depth and snow depth estimates. Image resolution was an important factor when upscaling field data to the local scale, however it was overall less critical for further upscaling to the regional scale.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014229
- Subject Headings
- Permafrost--Alaska, Remote sensing, Machine learning
- Format
- Document (PDF)
- Title
- FACIAL EXPRESSION PROCESSING IN AUTISM SPECTRUM DISORDER AS A FUNCTION OF ALEXITHYMIA: AN EYE MOVEMENT STUDY.
- Creator
- Escobar, Brian, Hong, Sang Wook, Florida Atlantic University, Department of Psychology, Charles E. Schmidt College of Science
- Abstract/Description
-
The perception and interpretation of faces provides individuals with a wealth of knowledge that enables them to navigate their social environments more successfully. Prior research has hypothesized that the decreased facial expression recognition (FER) abilities observed in autism spectrum disorder (ASD) may be better explained by comorbid alexithymia, the alexithymia hypothesis. The present study sought to further examine the alexithymia hypothesis by collecting data from 59 participants and...
Show moreThe perception and interpretation of faces provides individuals with a wealth of knowledge that enables them to navigate their social environments more successfully. Prior research has hypothesized that the decreased facial expression recognition (FER) abilities observed in autism spectrum disorder (ASD) may be better explained by comorbid alexithymia, the alexithymia hypothesis. The present study sought to further examine the alexithymia hypothesis by collecting data from 59 participants and examining FER performance and eye movement patterns for ASD and neurotypical (NT) individuals while controlling for alexithymia severity. Eye movement-related differences and similarities were examined via eye tracking in conjunction with statistical and machine-learning-based pattern classification analysis. In multiple different classifying conditions, where the classifier was fed 1,718 scanpath images (either at spatial, spatial-temporal, or spatial temporal-ordinal levels) for high-alexithymic ASD, high-alexithymicvi NT, low-alexithymic ASD, and low-alexithymic NT, we could accurately decode significantly above chance level. Additionally, in the cross-decoding analysis where the classifier was fed 1,718 scanpath images for high- and low alexithymic ASD individuals and tested on high- and low-alexithymic NT individuals, results showed that classification accuracy was significantly above chance level when using spatial images of eye movement patterns. Regarding FER performance results, we found that ASD and NT groups performed similarly, but at lower intensities of expressions, ASD individuals performed significantly worse than NT individuals. Together, these findings suggest that there may be eye-movement related differences between ASD and NT individuals, which may interact with alexithymia traits.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014358
- Subject Headings
- Autism Spectrum Disorder, Machine learning, Facial expression, Alexithymia, Eye tracking
- Format
- Document (PDF)
- Title
- CONNECTED MULTI-DOMAIN AUTONOMY AND ARTIFICIAL INTELLIGENCE: AUTONOMOUS LOCALIZATION, NETWORKING, AND DATA CONFORMITY EVALUATION.
- Creator
- Tountas, Konstantinos, Pados, Dimitris, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The objective of this dissertation work is the development of a solid theoretical and algorithmic framework for three of the most important aspects of autonomous/artificialintelligence (AI) systems, namely data quality assurance, localization, and communications. In the era of AI and machine learning (ML), data reign supreme. During learning tasks, we need to ensure that the training data set is correct and complete. During operation, faulty data need to be discovered and dealt with to...
Show moreThe objective of this dissertation work is the development of a solid theoretical and algorithmic framework for three of the most important aspects of autonomous/artificialintelligence (AI) systems, namely data quality assurance, localization, and communications. In the era of AI and machine learning (ML), data reign supreme. During learning tasks, we need to ensure that the training data set is correct and complete. During operation, faulty data need to be discovered and dealt with to protect from -potentially catastrophic- system failures. With our research in data quality assurance, we develop new mathematical theory and algorithms for outlier-resistant decomposition of high-dimensional matrices (tensors) based on L1-norm principal-component analysis (PCA). L1-norm PCA has been proven to be resistant to irregular data-points and will drive critical real-world AI learning and autonomous systems operations in the future. At the same time, one of the most important tasks of autonomous systems is self-localization. In GPS-deprived environments, localization becomes a fundamental technical problem. State-of-the-art solutions frequently utilize power-hungry or expensive architectures, making them difficult to deploy. In this dissertation work, we develop and implement a robust, variable-precision localization technique for autonomous systems based on the direction-of-arrival (DoA) estimation theory, which is cost and power-efficient. Finally, communication between autonomous systems is paramount for mission success in many applications. In the era of 5G and beyond, smart spectrum utilization is key.. In this work, we develop physical (PHY) and medium-access-control (MAC) layer techniques that autonomously optimize spectrum usage and minimizes intra and internetwork interference.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013617
- Subject Headings
- Artificial intelligence, Machine learning, Tensor algebra
- Format
- Document (PDF)
- Title
- COMBINING TRADITIONAL AND IMAGE ANALYSIS TECHNIQUES FOR UNCONSOLIDATED EXPOSED TERRIGENOUS BEACH SAND CHARACTERIZATION.
- Creator
- Smith, Molly Elizabeth, Zhang, Caiyun, Oleinik, Anton, Florida Atlantic University, Department of Geosciences, Charles E. Schmidt College of Science
- Abstract/Description
-
Traditional sand analysis is labor and cost-intensive, entailing specialized equipment and operators trained in geological analysis. Even a small step to automate part of the traditional geological methods could substantially improve the speed of such research while removing chances of human error. Digital image analysis techniques and computer vision have been well developed and applied in various fields but rarely explored for sand analysis. This research explores capabilities of remote...
Show moreTraditional sand analysis is labor and cost-intensive, entailing specialized equipment and operators trained in geological analysis. Even a small step to automate part of the traditional geological methods could substantially improve the speed of such research while removing chances of human error. Digital image analysis techniques and computer vision have been well developed and applied in various fields but rarely explored for sand analysis. This research explores capabilities of remote sensing digital image analysis techniques, such as object-based image analysis (OBIA), machine learning, digital image analysis, and photogrammetry to automate or semi-automate the traditional sand analysis procedure. Here presented is a framework combining OBIA and machine learning classification of microscope imagery for use with unconsolidated terrigenous beach sand samples. Five machine learning classifiers (RF, DT, SVM, k-NN, and ANN) are used to model mineral composition from images of ten terrigenous beach sand samples. Digital image analysis and photogrammetric techniques are applied and evaluated for use to characterize sand grain size and grain circularity (given as a digital proxy for traditional grain sphericity). A new segmentation process is also introduced, where pixel-level SLICO superpixel segmentation is followed by spectral difference segmentation and further levels of superpixel segmentation at the object-level. Previous methods of multi-resolution and superpixel segmentation at the object level do not provide the level of detail necessary to yield optimal sand grain-sized segments. In this proposed framework, the DT and RF classifiers provide the best estimations of mineral content of all classifiers tested compared to traditional compositional analysis. Average grain size approximated from photogrammetric procedures is comparable to traditional sieving methods, having an RMSE below 0.05%. The framework proposed here reduces the number of trained personnel needed to perform sand-related research. It requires minimal sand sample preparation and minimizes user-error that is typically introduced during traditional sand analysis.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013517
- Subject Headings
- Sand, Image analysis, Remote sensing, Photogrammetry--Digital techniques, Machine learning
- Format
- Document (PDF)
- Title
- DATA COLLECTION FRAMEWORK AND MACHINE LEARNING ALGORITHMS FOR THE ANALYSIS OF CYBER SECURITY ATTACKS.
- Creator
- Calvert, Chad, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The integrity of network communications is constantly being challenged by more sophisticated intrusion techniques. Attackers are shifting to stealthier and more complex forms of attacks in an attempt to bypass known mitigation strategies. Also, many detection methods for popular network attacks have been developed using outdated or non-representative attack data. To effectively develop modern detection methodologies, there exists a need to acquire data that can fully encompass the behaviors...
Show moreThe integrity of network communications is constantly being challenged by more sophisticated intrusion techniques. Attackers are shifting to stealthier and more complex forms of attacks in an attempt to bypass known mitigation strategies. Also, many detection methods for popular network attacks have been developed using outdated or non-representative attack data. To effectively develop modern detection methodologies, there exists a need to acquire data that can fully encompass the behaviors of persistent and emerging threats. When collecting modern day network traffic for intrusion detection, substantial amounts of traffic can be collected, much of which consists of relatively few attack instances as compared to normal traffic. This skewed distribution between normal and attack data can lead to high levels of class imbalance. Machine learning techniques can be used to aid in attack detection, but large levels of imbalance between normal (majority) and attack (minority) instances can lead to inaccurate detection results.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013289
- Subject Headings
- Machine learning, Algorithms, Anomaly detection (Computer security), Intrusion detection systems (Computer security), Big data
- Format
- Document (PDF)
- Title
- ASSESSING METHODS AND TOOLS TO IMPROVE REPORTING, INCREASE TRANSPARENCY, AND REDUCE FAILURES IN MACHINE LEARNING APPLICATIONS IN HEALTHCARE.
- Creator
- Garbin, Christian, Marques, Oge, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Artificial intelligence (AI) had a few false starts – the AI winters of the 1970s and 1980s. We are now in what looks like an AI summer. There are many useful applications of AI in the field. But there are still unfulfilled promises and outright failures. From self-driving cars that work only in constrained cases, to medical image analysis products that would replace radiologists but never did, we still struggle to translate successful research into successful real-world applications. The...
Show moreArtificial intelligence (AI) had a few false starts – the AI winters of the 1970s and 1980s. We are now in what looks like an AI summer. There are many useful applications of AI in the field. But there are still unfulfilled promises and outright failures. From self-driving cars that work only in constrained cases, to medical image analysis products that would replace radiologists but never did, we still struggle to translate successful research into successful real-world applications. The software engineering community has accumulated a large body of knowledge over the decades on how to develop, release, and maintain products. AI products, being software products, benefit from some of that accumulated knowledge, but not all of it. AI products diverge from traditional software products in fundamental ways: their main component is not a specific piece of code, written for a specific purpose, but a generic piece of code, a model, customized by a training process driven by hyperparameters and a dataset. Datasets are usually large and models are opaque. We cannot directly inspect them as we can inspect the code of traditional software products. We need other methods to detect failures in AI products.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013580
- Subject Headings
- Machine learning, Artificial intelligence, Healthcare
- Format
- Document (PDF)
- Title
- GENERATIVE ADVERSARIAL NETWORK DATA GENERATION FOR THE USE OF REAL TIME IMAGE DETECTION IN SIDE-SCAN SONAR IMAGERY.
- Creator
- McGinley, James Patrick, Dhanak, Manhar, Florida Atlantic University, Department of Ocean and Mechanical Engineering, College of Engineering and Computer Science
- Abstract/Description
-
Automatic target recognition of unexploded ordnances in side scan sonar imagery has been a struggling task, due to the lack of publicly available side-scan sonar data. Real time image detection and classification algorithms have been implemented to combat this task, however, machine learning algorithms require a substantial amount of training data to properly detect specific targets. Transfer learning methods are used to replace the need of large datasets, by using a pre trained network on...
Show moreAutomatic target recognition of unexploded ordnances in side scan sonar imagery has been a struggling task, due to the lack of publicly available side-scan sonar data. Real time image detection and classification algorithms have been implemented to combat this task, however, machine learning algorithms require a substantial amount of training data to properly detect specific targets. Transfer learning methods are used to replace the need of large datasets, by using a pre trained network on the side-scan sonar images. In the present study the implementation of a generative adversarial network is used to generate meaningful sonar imagery from a small dataset. The generated images are then added to the existing dataset to train an image detection and classification algorithm. The study looks to demonstrate that generative images can be used to aid in detecting objects of interest in side-scan sonar imagery.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013394
- Subject Headings
- Sidescan sonar, Algorithms, Machine learning
- Format
- Document (PDF)
- Title
- INVESTIGATING MACHINE LEARNING ALGORITHMS WITH IMBALANCED BIG DATA.
- Creator
- Hasanin, Tawfiq, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Recent technological developments have engendered an expeditious production of big data and also enabled machine learning algorithms to produce high-performance models from such data. Nonetheless, class imbalance (in binary classifications) between the majority and minority classes in big data can skew the predictive performance of the classification algorithms toward the majority (negative) class whereas the minority (positive) class usually holds greater value for the decision makers. Such...
Show moreRecent technological developments have engendered an expeditious production of big data and also enabled machine learning algorithms to produce high-performance models from such data. Nonetheless, class imbalance (in binary classifications) between the majority and minority classes in big data can skew the predictive performance of the classification algorithms toward the majority (negative) class whereas the minority (positive) class usually holds greater value for the decision makers. Such bias may lead to adverse consequences, some of them even life-threatening, when the existence of false negatives is generally costlier than false positives. The size of the minority class can vary from fair to extraordinary small, which can lead to different performance scores for machine learning algorithms. Class imbalance is a well-studied area for traditional data, i.e., not big data. However, there is limited research focusing on both rarity and severe class imbalance in big data.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013316
- Subject Headings
- Algorithms, Machine learning, Big data--Data processing, Big data
- Format
- Document (PDF)
- Title
- STATISTICAL MODELING OF SHIP AIRWAKES INCLUDING THE FEASIBILITY OF APPLYING MACHINE LEARNING.
- Creator
- Krishnan, Vaishakh, Gaonkar, Gopal, Florida Atlantic University, Department of Ocean and Mechanical Engineering, College of Engineering and Computer Science
- Abstract/Description
-
Airwakes are shed behind the ship’s superstructure and represent a highly turbulent and rapidly distorting flow field. This flow field severely affects pilot’s workload and such helicopter shipboard operations. It requires both the one-point statistics of autospectrum and the two-point statistics of coherence (normalized cross-spectrum) for a relatively complete description. Recent advances primarily refer to generating databases of flow velocity points through experimental and computational...
Show moreAirwakes are shed behind the ship’s superstructure and represent a highly turbulent and rapidly distorting flow field. This flow field severely affects pilot’s workload and such helicopter shipboard operations. It requires both the one-point statistics of autospectrum and the two-point statistics of coherence (normalized cross-spectrum) for a relatively complete description. Recent advances primarily refer to generating databases of flow velocity points through experimental and computational fluid dynamics (CFD) investigations, numerically computing autospectra along with a few cases of cross-spectra and coherences, and developing a framework for extracting interpretive models of autospectra in closed form from a database along with an application of this framework to study the downwash effects. By comparison, relatively little is known about coherences. In fact, even the basic expressions of cross-spectra and coherences for three components of homogeneous isotropic turbulence (HIT) vary from one study to the other, and the related literature is scattered and piecemeal. Accordingly, this dissertation begins with a unified account of all the cross-spectra and coherences of HIT from first principles. Then, it presents a framework for constructing interpretive coherence models of airwake from a database on the basis of perturbation theory. For each velocity component, the coherence is represented by a separate perturbation series in which the basis function or the first term on the right-hand side of the series is represented by the corresponding coherence for HIT. The perturbation series coefficients are evaluated by satisfying the theoretical constraints and fitting a curve in a least squares sense on a set of numerically generated coherence points from a database. Although not tested against a specific database, the framework has a mathematical basis. Moreover, for assumed values of perturbation series constants, coherence results are presented to demonstrate how coherences of airwakes and such flow fields compare to those of HIT.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013629
- Subject Headings
- Ships--Aerodynamics, Turbulence--Statistical methods, Machine learning
- Format
- Document (PDF)
- Title
- SEAWALL DETECTION IN FLORIDA COASTAL AREA FROM HIGH RESOLUTION IMAGERY USING MACHINE LEARNING AND OBIA.
- Creator
- Paudel, Sanjaya, Su, Hongbo, Florida Atlantic University, Department of Civil, Environmental and Geomatics Engineering, College of Engineering and Computer Science
- Abstract/Description
-
In this thesis, a methodology and framework were created to detect the seawalls accurately and efficiently in low coastal areas and was evaluated in the study area of Hallandale Beach City, Broward County, Florida. Aerial images collected from the Florida Department of Transportation (FDOT) were processed using eCognition Developer software for Multi-Resolution Segmentation and Classification of objects. Two classification approaches, pixel-based image analysis, and the object-based image...
Show moreIn this thesis, a methodology and framework were created to detect the seawalls accurately and efficiently in low coastal areas and was evaluated in the study area of Hallandale Beach City, Broward County, Florida. Aerial images collected from the Florida Department of Transportation (FDOT) were processed using eCognition Developer software for Multi-Resolution Segmentation and Classification of objects. Two classification approaches, pixel-based image analysis, and the object-based image analysis (OBIA) method were applied for image classification. However, Pixel based classification was discarded for having less accuracy in output. Three techniques within object-based classification-machine learning technique, knowledge-based technique and machine learning followed by knowledge-based technique were used to compare the most efficient method of classification. While performing the machine learning technique, three algorithms: Random Forest, support vector machine and decision tree were applied to test the best algorithm. Of all the approaches used, the combination of machine learning and a knowledge-based method was able to map the sea wall effectively.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013802
- Subject Headings
- Image analysis, Coasts--Florida, Machine learning
- Format
- Document (PDF)
- Title
- MACHINE LEARNING APPROACH FOR VEGETATION CLASSIFICATION USING UAS MULTISPECTRAL IMAGERY.
- Creator
- Kesavan, Pandiyan, Sudhagar Nagarajan, Florida Atlantic University, Department of Civil, Environmental and Geomatics Engineering, College of Engineering and Computer Science
- Abstract/Description
-
Vegetation monitoring plays a significant role in improving the quality of life above the earth's surface. However, vegetation resources management is challenging due to climate change, global warming, and urban development. The research aims to identify and extract vegetation communities for Jupiter Inlet Lighthouse Outstanding Natural Area (JILONA) using developed Unmanned Aerial Systems (UAS) deployed with five bands of RedEdge Micasence Multispectral Sensor. UAS has a lot of potential for...
Show moreVegetation monitoring plays a significant role in improving the quality of life above the earth's surface. However, vegetation resources management is challenging due to climate change, global warming, and urban development. The research aims to identify and extract vegetation communities for Jupiter Inlet Lighthouse Outstanding Natural Area (JILONA) using developed Unmanned Aerial Systems (UAS) deployed with five bands of RedEdge Micasence Multispectral Sensor. UAS has a lot of potential for various applications as it provides high-resolution imagery at lower altitudes. In this study, spectral reflectance values for each vegetation species were collected using a spectroradiometer instrument. Those values were correlated with five band UAS Image values to understand the sensor's performance, also added with reflectance’s similarities and divergence for vegetation species. Pixel and Object-based classification methods were performed using 0.15 ft Multispectral Imagery to identify the vegetation classes. Supervised Machine Learning Support Vector Machine (SVM) and Random Forest (RF) algorithms with topographical information were used to produce thematic vegetation maps. The Pixel-based procedure using the SVM algorithm generated an overall accuracy and kappa coefficient of above 90 percent. Both classification approaches have provided aesthetic vegetation thematic maps. According to statistical cross-validation findings and visual interpretation of vegetation communities, the pixel classification method outperformed object-based classification.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013768
- Subject Headings
- Vegetation classification, Machine learning, Multispectral imaging, Unmanned aerial vehicles
- Format
- Document (PDF)
- Title
- MULTIFACETED EMBEDDING LEARNING FOR NETWORKED DATA AND SYSTEMS.
- Creator
- Shi, Min, Tang, Yufei, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Network embedding or representation learning is important for analyzing many real-world applications and systems, i.e., social networks, citation networks and communication networks. It targets at learning low-dimensional vector representations of nodes with preserved graph structure (e.g., link relations) and content (e.g., texts) information. The derived node representations can be directly applied in many downstream applications, including node classification, clustering and visualization....
Show moreNetwork embedding or representation learning is important for analyzing many real-world applications and systems, i.e., social networks, citation networks and communication networks. It targets at learning low-dimensional vector representations of nodes with preserved graph structure (e.g., link relations) and content (e.g., texts) information. The derived node representations can be directly applied in many downstream applications, including node classification, clustering and visualization. In addition to the complex network structures, nodes may have rich non structure information such as labels and contents. Therefore, structure, label and content constitute different aspects of the entire network system that reflect node similarities from multiple complementary facets. This thesis focuses on multifaceted network embedding learning, which aims to efficiently incorporate distinct aspects of information such as node labels and node contents for cooperative low-dimensional representation learning together with node topology.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013516
- Subject Headings
- Embedded computer systems, Neural networks (Computer science), Network embedding, Machine learning
- Format
- Document (PDF)
- Title
- HPCC based Platform for COPD Readmission Risk Analysis with implementation of Dimensionality reduction and balancing techniques.
- Creator
- Jain, Piyush, Agarwal, Ankur, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Hospital readmission rates are considered to be an important indicator of quality of care because they may be a consequence of actions of commission or omission made during the initial hospitalization of the patient, or as a consequence of poorly managed transition of the patient back into the community. The negative impact on patient quality of life and huge burden on healthcare system have made reducing hospital readmissions a central goal of healthcare delivery and payment reform efforts....
Show moreHospital readmission rates are considered to be an important indicator of quality of care because they may be a consequence of actions of commission or omission made during the initial hospitalization of the patient, or as a consequence of poorly managed transition of the patient back into the community. The negative impact on patient quality of life and huge burden on healthcare system have made reducing hospital readmissions a central goal of healthcare delivery and payment reform efforts. In this study, we will be proposing a framework on how the readmission analysis and other healthcare models could be deployed in real world and a Machine learning based solution which uses patients discharge summaries as a dataset to train and test the machine learning model created. Current systems does not take into consideration one of the very important aspect of solving readmission problem by taking Big data into consideration. This study also takes into consideration Big data aspect of solutions which can be deployed in the field for real world use. We have used HPCC compute platform which provides distributed parallel programming platform to create, run and manage applications which involves large amount of data. We have also proposed some feature engineering and data balancing techniques which have shown to greatly enhance the machine learning model performance. This was achieved by reducing the dimensionality in the data and fixing the imbalance in the dataset. The system presented in this study provides a real world machine learning based predictive modeling for reducing readmissions which could be templatized for other diseases.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013560
- Subject Headings
- Machine learning, Big data, Patient Readmission, Hospitals--Admission and discharge--Data processing, High performance computing
- Format
- Document (PDF)