Current Search: Machine learning (x)
View All Items
Pages
- Title
- An evaluation of machine learning algorithms for tweet sentiment analysis.
- Creator
- Prusa, Joseph D., Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Sentiment analysis of tweets is an application of mining Twitter, and is growing in popularity as a means of determining public opinion. Machine learning algorithms are used to perform sentiment analysis; however, data quality issues such as high dimensionality, class imbalance or noise may negatively impact classifier performance. Machine learning techniques exist for targeting these problems, but have not been applied to this domain, or have not been studied in detail. In this thesis we...
Show moreSentiment analysis of tweets is an application of mining Twitter, and is growing in popularity as a means of determining public opinion. Machine learning algorithms are used to perform sentiment analysis; however, data quality issues such as high dimensionality, class imbalance or noise may negatively impact classifier performance. Machine learning techniques exist for targeting these problems, but have not been applied to this domain, or have not been studied in detail. In this thesis we discuss research that has been conducted on tweet sentiment classification, its accompanying data concerns, and methods of addressing these concerns. We test the impact of feature selection, data sampling and ensemble techniques in an effort to improve classifier performance. We also evaluate the combination of feature selection and ensemble techniques and examine the effects of high dimensionality when combining multiple types of features. Additionally, we provide strategies and insights for potential avenues of future work.
Show less - Date Issued
- 2015
- PURL
- http://purl.flvc.org/fau/fd/FA00004460, http://purl.flvc.org/fau/fd/FA00004460
- Subject Headings
- Social media., Natural language processing (Computer science), Machine learning., Algorithms., Fuzzy expert systems., Artificial intelligence.
- Format
- Document (PDF)
- Title
- COMBINING TRADITIONAL AND IMAGE ANALYSIS TECHNIQUES FOR UNCONSOLIDATED EXPOSED TERRIGENOUS BEACH SAND CHARACTERIZATION.
- Creator
- Smith, Molly Elizabeth, Zhang, Caiyun, Oleinik, Anton, Florida Atlantic University, Department of Geosciences, Charles E. Schmidt College of Science
- Abstract/Description
-
Traditional sand analysis is labor and cost-intensive, entailing specialized equipment and operators trained in geological analysis. Even a small step to automate part of the traditional geological methods could substantially improve the speed of such research while removing chances of human error. Digital image analysis techniques and computer vision have been well developed and applied in various fields but rarely explored for sand analysis. This research explores capabilities of remote...
Show moreTraditional sand analysis is labor and cost-intensive, entailing specialized equipment and operators trained in geological analysis. Even a small step to automate part of the traditional geological methods could substantially improve the speed of such research while removing chances of human error. Digital image analysis techniques and computer vision have been well developed and applied in various fields but rarely explored for sand analysis. This research explores capabilities of remote sensing digital image analysis techniques, such as object-based image analysis (OBIA), machine learning, digital image analysis, and photogrammetry to automate or semi-automate the traditional sand analysis procedure. Here presented is a framework combining OBIA and machine learning classification of microscope imagery for use with unconsolidated terrigenous beach sand samples. Five machine learning classifiers (RF, DT, SVM, k-NN, and ANN) are used to model mineral composition from images of ten terrigenous beach sand samples. Digital image analysis and photogrammetric techniques are applied and evaluated for use to characterize sand grain size and grain circularity (given as a digital proxy for traditional grain sphericity). A new segmentation process is also introduced, where pixel-level SLICO superpixel segmentation is followed by spectral difference segmentation and further levels of superpixel segmentation at the object-level. Previous methods of multi-resolution and superpixel segmentation at the object level do not provide the level of detail necessary to yield optimal sand grain-sized segments. In this proposed framework, the DT and RF classifiers provide the best estimations of mineral content of all classifiers tested compared to traditional compositional analysis. Average grain size approximated from photogrammetric procedures is comparable to traditional sieving methods, having an RMSE below 0.05%. The framework proposed here reduces the number of trained personnel needed to perform sand-related research. It requires minimal sand sample preparation and minimizes user-error that is typically introduced during traditional sand analysis.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013517
- Subject Headings
- Sand, Image analysis, Remote sensing, Photogrammetry--Digital techniques, Machine learning
- Format
- Document (PDF)
- Title
- Collabortive filtering using machine learning and statistical techniques.
- Creator
- Su, Xiaoyuan., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Collaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputation-boosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a model-based CF technique, TAN-ELR CF, and two hybrid CF...
Show moreCollaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputation-boosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a model-based CF technique, TAN-ELR CF, and two hybrid CF algorithms, sequential mixture CF and joint mixture CF. Empirical results show that our proposed CF algorithms have very good predictive performances. In the investigation of applying imputation techniques in mining incomplete data, we proposed imputation-helped classifiers, and VCI predictors (voting on classifications from imputed learning sets), both of which resulted in significant improvement in classification performance for incomplete data over conventional machine learned classifiers, including kNN, neural network, one rule, decision table, SVM, logistic regression, decision tree (C4.5), random forest, and decision list (PART), and the well known Bagging predictors. The main imputation techniques involved in these algorithms include EM (expectation maximization) and BMI (Bayesian multiple imputation).
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/FAU/186301
- Subject Headings
- Filters (Mathematics), Machine learning, Data mining, Technological innovations, Database management, Combinatorial group theory
- Format
- Document (PDF)
- Title
- Classification techniques for noisy and imbalanced data.
- Creator
- Napolitano, Amri E., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are...
Show moreMachine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
Show less - Date Issued
- 2009
- PURL
- http://purl.flvc.org/FAU/369201
- Subject Headings
- Combinatorial group theory, Data mining, Technological innovations, Decision trees, Machine learning, Filters (Mathematics)
- Format
- Document (PDF)
- Title
- DATA COLLECTION FRAMEWORK AND MACHINE LEARNING ALGORITHMS FOR THE ANALYSIS OF CYBER SECURITY ATTACKS.
- Creator
- Calvert, Chad, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The integrity of network communications is constantly being challenged by more sophisticated intrusion techniques. Attackers are shifting to stealthier and more complex forms of attacks in an attempt to bypass known mitigation strategies. Also, many detection methods for popular network attacks have been developed using outdated or non-representative attack data. To effectively develop modern detection methodologies, there exists a need to acquire data that can fully encompass the behaviors...
Show moreThe integrity of network communications is constantly being challenged by more sophisticated intrusion techniques. Attackers are shifting to stealthier and more complex forms of attacks in an attempt to bypass known mitigation strategies. Also, many detection methods for popular network attacks have been developed using outdated or non-representative attack data. To effectively develop modern detection methodologies, there exists a need to acquire data that can fully encompass the behaviors of persistent and emerging threats. When collecting modern day network traffic for intrusion detection, substantial amounts of traffic can be collected, much of which consists of relatively few attack instances as compared to normal traffic. This skewed distribution between normal and attack data can lead to high levels of class imbalance. Machine learning techniques can be used to aid in attack detection, but large levels of imbalance between normal (majority) and attack (minority) instances can lead to inaccurate detection results.
Show less - Date Issued
- 2019
- PURL
- http://purl.flvc.org/fau/fd/FA00013289
- Subject Headings
- Machine learning, Algorithms, Anomaly detection (Computer security), Intrusion detection systems (Computer security), Big data
- Format
- Document (PDF)
- Title
- Efficient Machine Learning Algorithms for Identifying Risk Factors of Prostate and Breast Cancers among Males and Females.
- Creator
- Rikhtehgaran, Samaneh, Muhammad, Wazir, Florida Atlantic University, Department of Physics, Charles E. Schmidt College of Science
- Abstract/Description
-
One of the most common types of cancer among women is breast cancer. It represents one of the diseases leading to a high number of mortalities among women. On the other hand, prostate cancer is the second most frequent malignancy in men worldwide. The early detection of prostate cancer is fundamental to reduce mortality and increase the survival rate. A comparison between six types of machine learning models as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, k Nearest...
Show moreOne of the most common types of cancer among women is breast cancer. It represents one of the diseases leading to a high number of mortalities among women. On the other hand, prostate cancer is the second most frequent malignancy in men worldwide. The early detection of prostate cancer is fundamental to reduce mortality and increase the survival rate. A comparison between six types of machine learning models as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, k Nearest Neighbors, and Naïve Bayes has been performed. This research aims to identify the most efficient machine learning algorithms for identifying the most significant risk factors of prostate and breast cancers. For this reason, National Health Interview Survey (NHIS) and Prostate, Lung, Colorectal, and Ovarian (PLCO) datasets are used. A comprehensive comparison of risk factors leading to these two crucial cancers can significantly impact early detection and progressive improvement in survival.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013755
- Subject Headings
- Machine learning, Algorithms, Cancer--Risk factors, Breast--Cancer, Prostate--Cancer
- Format
- Document (PDF)
- Title
- MULTIFACETED EMBEDDING LEARNING FOR NETWORKED DATA AND SYSTEMS.
- Creator
- Shi, Min, Tang, Yufei, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Network embedding or representation learning is important for analyzing many real-world applications and systems, i.e., social networks, citation networks and communication networks. It targets at learning low-dimensional vector representations of nodes with preserved graph structure (e.g., link relations) and content (e.g., texts) information. The derived node representations can be directly applied in many downstream applications, including node classification, clustering and visualization....
Show moreNetwork embedding or representation learning is important for analyzing many real-world applications and systems, i.e., social networks, citation networks and communication networks. It targets at learning low-dimensional vector representations of nodes with preserved graph structure (e.g., link relations) and content (e.g., texts) information. The derived node representations can be directly applied in many downstream applications, including node classification, clustering and visualization. In addition to the complex network structures, nodes may have rich non structure information such as labels and contents. Therefore, structure, label and content constitute different aspects of the entire network system that reflect node similarities from multiple complementary facets. This thesis focuses on multifaceted network embedding learning, which aims to efficiently incorporate distinct aspects of information such as node labels and node contents for cooperative low-dimensional representation learning together with node topology.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013516
- Subject Headings
- Embedded computer systems, Neural networks (Computer science), Network embedding, Machine learning
- Format
- Document (PDF)
- Title
- Machine learning algorithms for the analysis and detection of network attacks.
- Creator
- Najafabadi, Maryam Mousaarab, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The Internet and computer networks have become an important part of our organizations and everyday life. With the increase in our dependence on computers and communication networks, malicious activities have become increasingly prevalent. Network attacks are an important problem in today’s communication environments. The network traffic must be monitored and analyzed to detect malicious activities and attacks to ensure reliable functionality of the networks and security of users’ information....
Show moreThe Internet and computer networks have become an important part of our organizations and everyday life. With the increase in our dependence on computers and communication networks, malicious activities have become increasingly prevalent. Network attacks are an important problem in today’s communication environments. The network traffic must be monitored and analyzed to detect malicious activities and attacks to ensure reliable functionality of the networks and security of users’ information. Recently, machine learning techniques have been applied toward the detection of network attacks. Machine learning models are able to extract similarities and patterns in the network traffic. Unlike signature based methods, there is no need for manual analyses to extract attack patterns. Applying machine learning algorithms can automatically build predictive models for the detection of network attacks. This dissertation reports an empirical analysis of the usage of machine learning methods for the detection of network attacks. For this purpose, we study the detection of three common attacks in computer networks: SSH brute force, Man In The Middle (MITM) and application layer Distributed Denial of Service (DDoS) attacks. Using outdated and non-representative benchmark data, such as the DARPA dataset, in the intrusion detection domain, has caused a practical gap between building detection models and their actual deployment in a real computer network. To alleviate this limitation, we collect representative network data from a real production network for each attack type. Our analysis of each attack includes a detailed study of the usage of machine learning methods for its detection. This includes the motivation behind the proposed machine learning based detection approach, the data collection process, feature engineering, building predictive models and evaluating their performance. We also investigate the application of feature selection in building detection models for network attacks. Overall, this dissertation presents a thorough analysis on how machine learning techniques can be used to detect network attacks. We not only study a broad range of network attacks, but also study the application of different machine learning methods including classification, anomaly detection and feature selection for their detection at the host level and the network level.
Show less - Date Issued
- 2017
- PURL
- http://purl.flvc.org/fau/fd/FA00004882, http://purl.flvc.org/fau/fd/FA00004882
- Subject Headings
- Machine learning., Computer security., Data protection., Computer networks--Security measures.
- Format
- Document (PDF)
- Title
- Mining and fusing data for ocean turbine condition monitoring.
- Creator
- Duhaney, Janell A., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
An ocean turbine extarcts the kinetic energy from ocean currents to generate electricity. Machine Condition Monitoring (MCM) / Prognostic Health Monitoring (PHM) systems allow for self-checking and automated fault detection, and are integral in the construction of a highly reliable ocean turbine. MCM/PHM systems enable real time health assessment, prognostics and advisory generation by interpreting data from sensors installed on the machine being monitored. To effectively utilize sensor...
Show moreAn ocean turbine extarcts the kinetic energy from ocean currents to generate electricity. Machine Condition Monitoring (MCM) / Prognostic Health Monitoring (PHM) systems allow for self-checking and automated fault detection, and are integral in the construction of a highly reliable ocean turbine. MCM/PHM systems enable real time health assessment, prognostics and advisory generation by interpreting data from sensors installed on the machine being monitored. To effectively utilize sensor readings for determining the health of individual components, macro-components and the overall system, these measurements must somehow be combined or integrated to form a holistic picture. The process used to perform this combination is called data fusion. Data mining and machine learning techniques allow for the analysis of these sensor signals, any maintenance history and other available information (like expert knowledge) to automate decision making and other such processes within MCM/PHM systems. ... This dissertation proposes an MCM/PHM software architecture employing those techniques which were determined from the experiments to be ideal for this application. Our work also offers a data fusion framework applicable to ocean machinery MCM/PHM. Finally, it presents a software tool for monitoring ocean turbines and other submerged vessels, implemented according to industry standards.
Show less - Date Issued
- 2012
- PURL
- http://purl.flvc.org/FAU/3358556
- Subject Headings
- Marine turbines, Mathematical models, Fluid dynamics, Data mining, Machine learning, Multisensor data fusion
- Format
- Document (PDF)
- Title
- Data Quality in Data Mining and Machine Learning.
- Creator
- Van Hulse, Jason, Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
With advances in data storage and data transmission technologies, and given the increasing use of computers by both individuals and corporations, organizations are accumulating an ever-increasing amount of information in data warehouses and databases. The huge surge in data, however, has made the process of extracting useful, actionable, and interesting knowled_qe from the data extremely difficult. In response to the challenges posed by operating in a data-intensive environment, the fields of...
Show moreWith advances in data storage and data transmission technologies, and given the increasing use of computers by both individuals and corporations, organizations are accumulating an ever-increasing amount of information in data warehouses and databases. The huge surge in data, however, has made the process of extracting useful, actionable, and interesting knowled_qe from the data extremely difficult. In response to the challenges posed by operating in a data-intensive environment, the fields of data mining and machine learning (DM/ML) have successfully provided solutions to help uncover knowledge buried within data. DM/ML techniques use automated (or semi-automated) procedures to process vast quantities of data in search of interesting patterns. DM/ML techniques do not create knowledge, instead the implicit assumption is that knowledge is present within the data, and these procedures are needed to uncover interesting, important, and previously unknown relationships. Therefore, the quality of the data is absolutely critical in ensuring successful analysis. Having high quality data, i.e., data which is (relatively) free from errors and suitable for use in data mining tasks, is a necessary precondition for extracting useful knowledge. In response to the important role played by data quality, this dissertation investigates data quality and its impact on DM/ML. First, we propose several innovative procedures for coping with low quality data. Another aspect of data quality, the occurrence of missing values, is also explored. Finally, a detailed experimental evaluation on learning from noisy and imbalanced datasets is provided, supplying valuable insight into how class noise in skewed datasets affects learning algorithms.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00000858
- Subject Headings
- Data mining--Quality control, Machine learning, Electronic data processing--Quality control
- Format
- Document (PDF)
- Title
- Evolutionary Methods for Mining Data with Class Imbalance.
- Creator
- Drown, Dennis J., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different...
Show moreClass imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012515
- Subject Headings
- Combinatorial group theory, Data mining, Machine learning, Data structure (Computer science)
- Format
- Document (PDF)
- Title
- MACHINE LEARNING ALGORITHMS FOR PREDICTING BOTNET ATTACKS IN IOT NETWORKS.
- Creator
- Leevy, Joffrey, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
The proliferation of Internet of Things (IoT) devices in various networks is being matched by an increase in related cybersecurity risks. To help counter these risks, big datasets such as Bot-IoT were designed to train machine learning algorithms on network-based intrusion detection for IoT devices. From a binary classification perspective, there is a high-class imbalance in Bot-IoT between each of the attack categories and the normal category, and also between the combined attack categories...
Show moreThe proliferation of Internet of Things (IoT) devices in various networks is being matched by an increase in related cybersecurity risks. To help counter these risks, big datasets such as Bot-IoT were designed to train machine learning algorithms on network-based intrusion detection for IoT devices. From a binary classification perspective, there is a high-class imbalance in Bot-IoT between each of the attack categories and the normal category, and also between the combined attack categories and the normal category. Within the scope of predicting botnet attacks in IoT networks, this dissertation demonstrates the usefulness and efficiency of novel machine learning methods, such as an easy-to-classify method and a unique set of ensemble feature selection techniques. The focus of this work is on the full Bot-IoT dataset, as well as each of the four attack categories of Bot-IoT, namely, Denial-of-Service (DoS), Distributed Denial-of-Service (DDoS), Reconnaissance, and Information Theft. Since resources and services become inaccessible during DoS and DDoS attacks, this interruption is costly to an organization in terms of both time and money. Reconnaissance attacks often signify the first stage of a cyberattack and preventing them from occurring usually means the end of the intended cyberattack. Information Theft attacks not only erode consumer confidence but may also compromise intellectual property and national security. For the DoS experiment, the ensemble feature selection approach led to the best performance, while for the DDoS experiment, the full set of Bot-IoT features resulted in the best performance. Regarding the Reconnaissance experiment, the ensemble feature selection approach effected the best performance. In relation to the Information Theft experiment, the ensemble feature selection techniques did not affect performance, positively or negatively. However, the ensemble feature selection approach is recommended for this experiment because feature reduction eases computational burden and may provide clarity through improved data visualization. For the full Bot-IoT big dataset, an explainable machine learning approach was taken using the Decision Tree classifier. An easy-to-learn Decision Tree model for predicting attacks was obtained with only three features, which is a significant result for big data.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FA00013933
- Subject Headings
- Machine learning, Internet of things--Security measures, Big data, Intrusion detection systems (Computer security)
- Format
- Document (PDF)
- Title
- PRIVACY-PRESERVING TOPOLOGICAL DATA ANALYSIS USING HOMOMORPHIC ENCRYPTION.
- Creator
- Gold, Dominic, Motta, Francis, Florida Atlantic University, Department of Mathematical Sciences, Charles E. Schmidt College of Science
- Abstract/Description
-
Computational tools grounded in algebraic topology, known collectively as topological data analysis (TDA), have been used for dimensionality-reduction to preserve salient and discriminating features in data. This faithful but compressed representation of data through TDA’s flagship method, persistent homology (PH), motivates its use to address the complexity, depth, and inefficiency issues present in privacy-preserving, homomorphic encryption (HE)-based machine learning (ML) models, which...
Show moreComputational tools grounded in algebraic topology, known collectively as topological data analysis (TDA), have been used for dimensionality-reduction to preserve salient and discriminating features in data. This faithful but compressed representation of data through TDA’s flagship method, persistent homology (PH), motivates its use to address the complexity, depth, and inefficiency issues present in privacy-preserving, homomorphic encryption (HE)-based machine learning (ML) models, which permit a data provider (often referred to as the Client) to outsource computational tasks on their encrypted data to a computationally-superior but semi-honest party (the Server). This work introduces efforts to adapt the well-established TDA-ML pipeline on encrypted data to realize the benefits TDA can provide to HE’s computational limitations as well as provide HE’s provable security on the sensitive data domains in which TDA has found success in (e.g., sequence, gene expression, imaging). The privacy-protecting technologies which could emerge from this foundational work will lead to direct improvements to the accessibility and equitability of health care systems. ML promises to reduce biases and improve accuracies of diagnoses, and enabling such models to act on sensitive biomedical data without exposing it will improve trustworthiness of these systems.
Show less - Date Issued
- 2024
- PURL
- http://purl.flvc.org/fau/fd/FA00014440
- Subject Headings
- Data encryption (Computer science), Homomorphisms (Mathematics), Privacy-preserving techniques (Computer science), Machine learning
- Format
- Document (PDF)
- Title
- MODELING GROUND ELEVATION OF LOUISIANA COASTAL WETLANDS AND ANALYZING RELATIVE SEA LEVEL RISE INUNDATION USING RSET-MH AND LIDAR MEASUREMENTS.
- Creator
- Liu, Jing, Zhang, Caiyun, Florida Atlantic University, Department of Geosciences, Charles E. Schmidt College of Science
- Abstract/Description
-
The Louisiana coastal ecosystem is experiencing increasing threats from human flood control construction, sea-level rise (SLR), and subsidence. Louisiana lost about 4,833 km2 of coastal wetlands from 1932 to 2016, and concern exists whether remaining wetlands will persist while facing the highest rate of relative sea-level rise (RSLR) in the world. Restoration aimed at rehabilitating the ongoing and future disturbances is currently underway through the implementation of the Coastal Wetlands...
Show moreThe Louisiana coastal ecosystem is experiencing increasing threats from human flood control construction, sea-level rise (SLR), and subsidence. Louisiana lost about 4,833 km2 of coastal wetlands from 1932 to 2016, and concern exists whether remaining wetlands will persist while facing the highest rate of relative sea-level rise (RSLR) in the world. Restoration aimed at rehabilitating the ongoing and future disturbances is currently underway through the implementation of the Coastal Wetlands Planning Protection and Restoration Act of 1990 (CWPPRA). To effectively monitor the progress of projects in CWPPRA, the Coastwide Reference Monitoring System (CRMS) was established in 2006. To date, more than a decade of valuable coastal, environmental, and ground elevation data have been collected and archived. This dataset offers a unique opportunity to evaluate the wetland ground elevation dynamics by linking the Rod Surface Elevation Table (RSET) measurements with environmental variables like water salinity and biophysical variables like canopy coverage. This dissertation research examined the effects of the environmental and biophysical variables on wetland terrain elevation by developing innovative machine learning based models to quantify the contribution of each factor using the CRMS collected dataset. Three modern machine learning algorithms, including Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN), were assessed and cross-compared with the commonly used Multiple Linear Regression (MLR). The results showed that RF had the best performance in modeling ground elevation with Root Mean Square Error (RMSE) of 10.8 cm and coefficient of coefficient (r) = 0.74. The top four factors contributing to ground elevation are the distance from monitoring station to closest water source, water salinity, water elevation, and dominant vegetation height.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013568
- Subject Headings
- Coastal zone management--Louisiana, Sea level rise, Inundations, Wetland restoration--Louisiana, Machine learning, Computer simulation, Algorithms.
- Format
- Document (PDF)
- Title
- HPCC based Platform for COPD Readmission Risk Analysis with implementation of Dimensionality reduction and balancing techniques.
- Creator
- Jain, Piyush, Agarwal, Ankur, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Hospital readmission rates are considered to be an important indicator of quality of care because they may be a consequence of actions of commission or omission made during the initial hospitalization of the patient, or as a consequence of poorly managed transition of the patient back into the community. The negative impact on patient quality of life and huge burden on healthcare system have made reducing hospital readmissions a central goal of healthcare delivery and payment reform efforts....
Show moreHospital readmission rates are considered to be an important indicator of quality of care because they may be a consequence of actions of commission or omission made during the initial hospitalization of the patient, or as a consequence of poorly managed transition of the patient back into the community. The negative impact on patient quality of life and huge burden on healthcare system have made reducing hospital readmissions a central goal of healthcare delivery and payment reform efforts. In this study, we will be proposing a framework on how the readmission analysis and other healthcare models could be deployed in real world and a Machine learning based solution which uses patients discharge summaries as a dataset to train and test the machine learning model created. Current systems does not take into consideration one of the very important aspect of solving readmission problem by taking Big data into consideration. This study also takes into consideration Big data aspect of solutions which can be deployed in the field for real world use. We have used HPCC compute platform which provides distributed parallel programming platform to create, run and manage applications which involves large amount of data. We have also proposed some feature engineering and data balancing techniques which have shown to greatly enhance the machine learning model performance. This was achieved by reducing the dimensionality in the data and fixing the imbalance in the dataset. The system presented in this study provides a real world machine learning based predictive modeling for reducing readmissions which could be templatized for other diseases.
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013560
- Subject Headings
- Machine learning, Big data, Patient Readmission, Hospitals--Admission and discharge--Data processing, High performance computing
- Format
- Document (PDF)
- Title
- An Empirical Study of Performance Metrics for Classifier Evaluation in Machine Learning.
- Creator
- Bruhns, Stefan, Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
A variety of classifiers for solving classification problems is available from the domain of machine learning. Commonly used classifiers include support vector machines, decision trees and neural networks. These classifiers can be configured by modifying internal parameters. The large number of available classifiers and the different configuration possibilities result in a large number of combinatiorrs of classifier and configuration settings, leaving the practitioner with the problem of...
Show moreA variety of classifiers for solving classification problems is available from the domain of machine learning. Commonly used classifiers include support vector machines, decision trees and neural networks. These classifiers can be configured by modifying internal parameters. The large number of available classifiers and the different configuration possibilities result in a large number of combinatiorrs of classifier and configuration settings, leaving the practitioner with the problem of evaluating the performance of different classifiers. This problem can be solved by using performance metrics. However, the large number of available metrics causes difficulty in deciding which metrics to use and when comparing classifiers on the basis of multiple metrics. This paper uses the statistical method of factor analysis in order to investigate the relationships between several performance metrics and introduces the concept of relative performance which has the potential to case the process of comparing several classifiers. The relative performance metric is also used to evaluate different support vector machine classifiers and to determine if the default settings in the Weka data mining tool are reasonable.
Show less - Date Issued
- 2008
- PURL
- http://purl.flvc.org/fau/fd/FA00012508
- Subject Headings
- Machine learning, Computer algorithms, Pattern recognition systems, Data structures (Computer science), Kernel functions, Pattern perception--Data processing
- Format
- Document (PDF)
- Title
- An Empirical Study of Random Forests for Mining Imbalanced Data.
- Creator
- Golawala, Moiz M., Khoshgoftaar, Taghi M., Florida Atlantic University
- Abstract/Description
-
Skewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed...
Show moreSkewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed datasets. This work presents a comprehensive suite of experimentation evaluating the effectiveness of random forests for learning from imbalanced data. Reasonable parameter settings (for the Weka implementation) for ensemble size and number of random features selected are determined through experimentation oil 10 datasets. Further, the application of seven different data sampling techniques that are common methods for handling imbalanced data, in conjunction with RF, is also assessed. Finally, RF is benchmarked against 10 other commonly-used machine learning algorithms, and is shown to provide very strong performance. A total of 35 imbalanced datasets are used, and over one million classifiers are constructed in this work.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012520
- Subject Headings
- Data mining--Case studies, Machine learning--Case studies, Data structure (Computer science), Trees (Graph theory)--Case studies
- Format
- Document (PDF)
- Title
- Text Mining and Topic Modeling for Social and Medical Decision Support.
- Creator
- Hurtado, Jose Luis, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Effective decision support plays vital roles in people's daily life, as well as for professional practitioners such as health care providers. Without correct information and timely derived knowledge, a decision is often suboptimal and may result in signi cant nancial loss or compromises of the performance. In this dissertation, we study text mining and topic modeling and propose to use text mining methods, in combination with topic models, to discover knowledge from texts popularly available...
Show moreEffective decision support plays vital roles in people's daily life, as well as for professional practitioners such as health care providers. Without correct information and timely derived knowledge, a decision is often suboptimal and may result in signi cant nancial loss or compromises of the performance. In this dissertation, we study text mining and topic modeling and propose to use text mining methods, in combination with topic models, to discover knowledge from texts popularly available from a wide variety of sources, such as research publications, news, medical diagnose notes, and further employ discovered knowledge to assist social and medical decision support. Examples of such decisions include hospital patient readmission prediction, which is a national initiative for health care cost reduction, academic research topics discovery and trend modeling, and social preference modeling for friend recommendation in social networks etc. To carry out text mining, our research, in Chapter 3, first emphasizes on single document analyzing to investigate textual stylometric features for user pro ling and recognition. Our research confirms that by using properly designed features, it is possible to identify the authors who wrote the article, using a number of sample articles written by the author as the training data. This study serves as the base to assert that text mining is a powerful tool for capturing knowledge in texts for better decision making. In the Chapter 4, we advance our research from single documents to documents with interdependency relationships, and propose to model and predict citation relationship between documents. Given a collection of documents with known linkage relationships, our research will discover e ective features to train prediction models, and predict the likelihood of two documents involving a citation relationships. This study will help accurately model social network linkage relationships, and can be used to assist e ective decision making for friend recommendation in social networking, and reference recommendation in scienti c writing etc. In the Chapter 5, we advance a topic discovery and trend prediction principle to discover meaningful topics from a set of data collection, and further model the evolution trend of the topic. By proposing techniques to discover topics from text, and using temporal correlation between trend for prediction, our techniques can be used to summarize a large collection of documents as meaningful topics, and further forecast the popularity of the topic in a near future. This study can help design systems to discover popular topics in social media, and further assist resource planning and scheduling based on the discovered topics and the their evolution trend. In the Chapter 6, we employ both text mining and topic modeling to the medical domain for effective decision making. The goal is to discover knowledge from medical notes to predict the risk of a patient being re-admitted in a near future. Our research emphasizes on the challenge that re-admitted patients are only a small portion of the patient population, although they bring signficant financial loss. As a result, the datasets are highly imbalanced which often result in poor accuracy for decision making. Our research will propose to use latent topic modeling to carryout localized sampling, and combine models trained from multiple copies of sampled data for accurate prediction. This study can be directly used to assist hospital re-admission assessment for early warning and decision support. The text mining and topic modeling techniques investigated in the dissertation can be applied to many other domains, involving texts and social relationships, towards pattern and knowledge based e ective decision making.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004782, http://purl.flvc.org/fau/fd/FA00004782
- Subject Headings
- Social sciences--Research--Methodology., Data mining., Machine learning., Database searching., Discourse analysis--Data processing., Communication--Network analysis., Medical care--Quality control.
- Format
- Document (PDF)
- Title
- A VLSI implementable learning algorithm.
- Creator
- Ruiz, Laura V., Florida Atlantic University, Pandya, Abhijit S., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
A top-down design methodology using hardware description languages (HDL's) and powerful design, analysis, synthesis and layout software tools for electronic circuit design is described and applied to the design of a single layer artificial neural network that incorporates on-chip learning. Using the perception learning algorithm, these simple neurons learn a classification problem in 10.55 microseconds in one application. The objective is to describe a methodology by following the design of a...
Show moreA top-down design methodology using hardware description languages (HDL's) and powerful design, analysis, synthesis and layout software tools for electronic circuit design is described and applied to the design of a single layer artificial neural network that incorporates on-chip learning. Using the perception learning algorithm, these simple neurons learn a classification problem in 10.55 microseconds in one application. The objective is to describe a methodology by following the design of a simple network. This methodology is later applied in the design of a novel architecture, a stochastic neural network. All issues related to algorithmic design for VLSI implementability are discussed and results of layout and timing analysis given over software simulations. A top-down design methodology is presented, including a brief introduction to HDL's and an overview of the software tools used throughout the design process. These tools make it possible now for a designer to complete a design in a relative short period of time. In-depth knowledge of computer architecture, VLSI fabrication, electronic circuits and integrated circuit design is not fundamental to accomplish a task that a few years ago would have required a large team of specialized experts in many fields. This may appeal to researchers from a wide background of knowledge, including computer scientists, mathematicians, and psychologists experimenting with learning algorithms. It is only in a hardware implementation of artificial neural network learning algorithms that the true parallel nature of these architectures could be fully tested. Most of the applications of neural networks are basically software simulations of the algorithms run on a single CPU executing sequential simulations of a parallel, richly interconnected architecture. This dissertation describes a methodology whereby a researcher experimenting with a known or new learning algorithm will be able to test it as it was intentionally designed for, on a parallel hardware architecture.
Show less - Date Issued
- 1996
- PURL
- http://purl.flvc.org/fcla/dt/12453
- Subject Headings
- Integrated circuits--Very large scale integration--Design and construction, Neural networks (Computer science)--Design and construction, Computer algorithms, Machine learning
- Format
- Document (PDF)
- Title
- Predictive Models for Ebola using Machine Learning Algorithms.
- Creator
- Jain, Abhishek, Agarwal, Ankur, Furht, Borko, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Identifying and tracking individuals affected by this virus in densely populated areas is a unique and an urgent challenge in the public health sector. Currently, mapping the spread of the Ebola virus is done manually, however with the help of social contact networks we can model dynamic graphs and predictive diffusion models of Ebola virus based on the impact on either a specific person or a specific community. With the help of this model, we can make more precise forward predictions of the...
Show moreIdentifying and tracking individuals affected by this virus in densely populated areas is a unique and an urgent challenge in the public health sector. Currently, mapping the spread of the Ebola virus is done manually, however with the help of social contact networks we can model dynamic graphs and predictive diffusion models of Ebola virus based on the impact on either a specific person or a specific community. With the help of this model, we can make more precise forward predictions of the disease propagations and to identify possibly infected individuals which will help perform trace – back analysis to locate the possible source of infection for a social group. This model will visualize and identify the families and tightly connected social groups who have had contact with an Ebola patient and is a proactive approach to reduce the risk of exposure of Ebola spread within a community or geographic location.
Show less - Date Issued
- 2017
- PURL
- http://purl.flvc.org/fau/fd/FA00004919, http://purl.flvc.org/fau/fd/FA00004919
- Subject Headings
- Communicable diseases--Epidemiology., Public health surveillance., Ebola virus disease--Transmission., Machine learning., Computer algorithms., Virtual reality., Interactive multimedia., Computer graphics., History--Graphic methods., Historiography--Technological innovations.
- Format
- Document (PDF)