Current Search: Zhu, Xingquan (x)
View All Items
- Title
- Counting manatee aggregations using deep neural networks and Anisotropic Gaussian Kernel.
- Creator
- Zhiqiang Wang, Yiran Pang, Cihan Ulus, Xingquan Zhu
- Abstract/Description
-
Manatees are aquatic mammals with voracious appetites. They rely on sea grass as the main food source, and often spend up to eight hours a day grazing. They move slow and frequently stay in groups (i.e. aggregations) in shallow water to search for food, making them vulnerable to environment change and other risks. Accurate counting manatee aggregations within a region is not only biologically meaningful in observing their habit, but also crucial for designing safety rules for boaters, divers,...
Show moreManatees are aquatic mammals with voracious appetites. They rely on sea grass as the main food source, and often spend up to eight hours a day grazing. They move slow and frequently stay in groups (i.e. aggregations) in shallow water to search for food, making them vulnerable to environment change and other risks. Accurate counting manatee aggregations within a region is not only biologically meaningful in observing their habit, but also crucial for designing safety rules for boaters, divers, etc., as well as scheduling nursing, intervention, and other plans. In this paper, we propose a deep learning based crowd counting approach to automatically count number of manatees within a region, by using low quality images as input. Because manatees have unique shape and they often stay in shallow water in groups, water surface reflection, occlusion, camouflage etc. making it difficult to accurately count manatee numbers. To address the challenges, we propose to use Anisotropic Gaussian Kernel (AGK), with tunable rotation and variances, to ensure that density functions can maximally capture shapes of individual manatees in different aggregations. After that, we apply AGK kernel to different types of deep neural networks primarily designed for crowd counting, including VGG, SANet, Congested Scene Recognition network (CSRNet), MARUNet etc. to learn manatee densities and calculate number of manatees in the scene. By using generic low quality images extracted from surveillance videos, our experiment results and comparison show that AGK kernel based manatee counting achieves minimum Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The proposed method works particularly well for counting manatee aggregations in environments with complex background.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FAUIR000535
- Format
- Document (PDF)
- Title
- MACHINE LEARNING FOR PREDICTION OF FACULTY SUCCESS IN WINNING GRANT AWARDS.
- Creator
- Delgado, Jose, Zhu, Xingquan, Harriet L. Wilkes Honors College, Florida Atlantic University
- Abstract/Description
-
In order for innovation and breakthroughs to occur, principal investigators must constantly apply for grants and other funding sources. Through previous research, it has been shown that peer-review panels responsible for selecting grant award recipients don’t base their decisions on the applicant’s academic or research history and affiliations. Instead, they can identify quality research proposals that achieve high citation counts later on. Therefore, it can be deduced that the recipients are...
Show moreIn order for innovation and breakthroughs to occur, principal investigators must constantly apply for grants and other funding sources. Through previous research, it has been shown that peer-review panels responsible for selecting grant award recipients don’t base their decisions on the applicant’s academic or research history and affiliations. Instead, they can identify quality research proposals that achieve high citation counts later on. Therefore, it can be deduced that the recipients are chosen solely due to their research quality and topic with little to no bias involved. This produces two important questions: Can machine learning help predict the success of faculty seeking external awards? What are the important factors related to such predictive models? Using the Academic Analytics Research Center’s rich faculty dataset, I will leverage machine learning models to identify important factors associated with winning grant awards.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FAUHT00192
- Format
- Document (PDF)
- Title
- Deep Learning for Android Application Ransomware Detection.
- Creator
- Wongsupa, Panupong, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Smartphones and mobile tablets are rapidly growing, and very important nowadays. The most popular mobile operating system since 2012 has been Android. Android is an open source platform that allows developers to take full advantage of both the operating system and the applications itself. However, due to the open source community of an Android platform, some Android developers took advantage of this and created countless malicious applications such as Trojan, Malware, and Ransomware. All...
Show moreSmartphones and mobile tablets are rapidly growing, and very important nowadays. The most popular mobile operating system since 2012 has been Android. Android is an open source platform that allows developers to take full advantage of both the operating system and the applications itself. However, due to the open source community of an Android platform, some Android developers took advantage of this and created countless malicious applications such as Trojan, Malware, and Ransomware. All which are currently hidden in a large number of benign apps in official Android markets, such as Google PlayStore, and Amazon. Ransomware is a malware that once infected the victim’s device. It will encrypt files, unlock device system, and display a popup message which asks the victim to pay ransom in order to unlock their device or system which may include medical devices that connect through the internet. In this research, we propose to combine permission and API calls, then use Deep Learning techniques to detect ransomware apps from the Android market. Permissions setting and API calls are extracted from each app file by using a python library called AndroGuard. We are using Permissions and API call features to characterize each application, which can identify which application has potential to be ransomware or is benign. We implement our Android Ransomware Detection framework based on Keras, which uses MLP with back-propagation and a supervised algorithm. We used our method with experiments based on real-world applications with over 2000 benign applications and 1000 ransomware applications. The dataset came from ARGUS’s lab [1] which validated algorithm performance and selected the best architecture for the multi-layer perceptron (MLP) by trained our dataset with 6 various of MLP structures. Our experiments and validations show that the MLPs have over 3 hidden layers with medium sized of neurons achieved good results on both accuracy and AUC score of 98%. The worst score is approximately 45% to 60% and are from MLPs that have 2 hidden layers with large number of neurons.
Show less - Date Issued
- 2018
- PURL
- http://purl.flvc.org/fau/fd/FA00013151
- Subject Headings
- Deep learning, Android (Electronic resource)--Security measures, Malware (Computer software)--Prevention
- Format
- Document (PDF)
- Title
- Data mining heuristic-¬based malware detection for android applications.
- Creator
- Peiravian, Naser, Zhu, Xingquan, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The Google Android mobile phone platform is one of the dominant smartphone operating systems on the market. The open source Android platform allows developers to take full advantage of the mobile operation system, but also raises significant issues related to malicious applications (Apps). The popularity of Android platform draws attention of many developers which also attracts the attention of cybercriminals to develop different kinds of malware to be inserted into the Google Android Market...
Show moreThe Google Android mobile phone platform is one of the dominant smartphone operating systems on the market. The open source Android platform allows developers to take full advantage of the mobile operation system, but also raises significant issues related to malicious applications (Apps). The popularity of Android platform draws attention of many developers which also attracts the attention of cybercriminals to develop different kinds of malware to be inserted into the Google Android Market or other third party markets as safe applications. In this thesis, we propose to combine permission, API (Application Program Interface) calls and function calls to build a Heuristic-Based framework for the detection of malicious Android Apps. In our design, the permission is extracted from each App’s profile information and the APIs are extracted from the packed App file by using packages and classes to represent API calls. By using permissions, API calls and function calls as features to characterize each of Apps, we can develop a classifier by data mining techniques to identify whether an App is potentially malicious or not. An inherent advantage of our method is that it does not need to involve any dynamic tracking of the system calls but only uses simple static analysis to find system functions from each App. In addition, Our Method can be generalized to all mobile applications due to the fact that APIs and function calls are always present for mobile Apps. Experiments on real-world Apps with more than 1200 malwares and 1200 benign samples validate the algorithm performance. Research paper published based on the work reported in this thesis: Naser Peiravian, Xingquan Zhu, Machine Learning for Android Malware Detection Using Permission and API Calls, in Proc. of the 25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) – Washington D.C, November 4-6, 2013.
Show less - Date Issued
- 2013
- PURL
- http://purl.flvc.org/fau/fd/FA0004045
- Subject Headings
- Computer networks -- Security measures, Data encryption (Computer science), Data structures (Computer science), Internet -- Security measures
- Format
- Document (PDF)
- Title
- Generalized Feature Embedding Learning for Clustering and Classication.
- Creator
- Golinko, Eric David, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Data comes in many di erent shapes and sizes. In real life applications it is common that data we are studying has features that are of varied data types. This may include, numerical, categorical, and text. In order to be able to model this data with machine learning algorithms, it is required that the data is typically in numeric form. Therefore, for data that is not originally numerical, it must be transformed to be able to be used as input into these algorithms. Along with this...
Show moreData comes in many di erent shapes and sizes. In real life applications it is common that data we are studying has features that are of varied data types. This may include, numerical, categorical, and text. In order to be able to model this data with machine learning algorithms, it is required that the data is typically in numeric form. Therefore, for data that is not originally numerical, it must be transformed to be able to be used as input into these algorithms. Along with this transformation it is common that data we study has many features relative to the number of samples in the data. It is often desirable to reduce the number of features that are being trained in a model to eliminate noise and reduce time in training. This problem of high dimensionality can be approached through feature selection, feature extraction, or feature embedding. Feature selection seeks to identify the most essential variables in a dataset that will lead to a parsimonious model and high performing results, while feature extraction and embedding are techniques that utilize a mathematical transformation of the data into a represented space. As a byproduct of using a new representation, we are able to reduce the dimension greatly without sacri cing performance. Oftentimes, by using embedded features we observe a gain in performance. Though extraction and embedding methods may be powerful for isolated machine learning problems, they do not always generalize well. Therefore, we are motivated to illustrate a methodology that can be applied to any data type with little pre-processing. The methods we develop can be applied in unsupervised, supervised, incremental, and deep learning contexts. Using 28 benchmark datasets as examples which include di erent data types, we construct a framework that can be applied for general machine learning tasks. The techniques we develop contribute to the eld of dimension reduction and feature embedding. Using this framework, we make additional contributions to eigendecomposition by creating an objective matrix that includes three main vital components. The rst being a class partitioned row and feature product representation of one-hot encoded data. Secondarily, the derivation of a weighted adjacency matrix based on class label relationships. Finally, by the inner product of these aforementioned values, we are able to condition the one-hot encoded data generated from the original data prior to eigenvector decomposition. The use of class partitioning and adjacency enable subsequent projections of the data to be trained more e ectively when compared side-to-side to baseline algorithm performance. Along with this improved performance, we can adjust the dimension of the subsequent data arbitrarily. In addition, we also show how these dense vectors may be used in applications to order the features of generic data for deep learning. In this dissertation, we examine a general approach to dimension reduction and feature embedding that utilizes a class partitioned row and feature representation, a weighted approach to instance similarity, and an adjacency representation. This general approach has application to unsupervised, supervised, online, and deep learning. In our experiments of 28 benchmark datasets, we show signi cant performance gains in clustering, classi cation, and training time.
Show less - Date Issued
- 2018
- PURL
- http://purl.flvc.org/fau/fd/FA00013063
- Subject Headings
- Eigenvectors--Data processing., Algorithms., Cluster analysis.
- Format
- Document (PDF)
- Title
- Big data driven co-occurring evidence discovery in chronic obstructive pulmonary disease patients.
- Creator
- Baechle, Christopher, Agarwal, Ankur, Zhu, Xingquan
- Date Issued
- 2017-12-04
- PURL
- http://purl.flvc.org/fau/flvc_fau_islandoraimporter_10.1186_s40537-017-0067-6_1629211082
- Format
- Citation
- Title
- Text Mining and Topic Modeling for Social and Medical Decision Support.
- Creator
- Hurtado, Jose Luis, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Effective decision support plays vital roles in people's daily life, as well as for professional practitioners such as health care providers. Without correct information and timely derived knowledge, a decision is often suboptimal and may result in signi cant nancial loss or compromises of the performance. In this dissertation, we study text mining and topic modeling and propose to use text mining methods, in combination with topic models, to discover knowledge from texts popularly available...
Show moreEffective decision support plays vital roles in people's daily life, as well as for professional practitioners such as health care providers. Without correct information and timely derived knowledge, a decision is often suboptimal and may result in signi cant nancial loss or compromises of the performance. In this dissertation, we study text mining and topic modeling and propose to use text mining methods, in combination with topic models, to discover knowledge from texts popularly available from a wide variety of sources, such as research publications, news, medical diagnose notes, and further employ discovered knowledge to assist social and medical decision support. Examples of such decisions include hospital patient readmission prediction, which is a national initiative for health care cost reduction, academic research topics discovery and trend modeling, and social preference modeling for friend recommendation in social networks etc. To carry out text mining, our research, in Chapter 3, first emphasizes on single document analyzing to investigate textual stylometric features for user pro ling and recognition. Our research confirms that by using properly designed features, it is possible to identify the authors who wrote the article, using a number of sample articles written by the author as the training data. This study serves as the base to assert that text mining is a powerful tool for capturing knowledge in texts for better decision making. In the Chapter 4, we advance our research from single documents to documents with interdependency relationships, and propose to model and predict citation relationship between documents. Given a collection of documents with known linkage relationships, our research will discover e ective features to train prediction models, and predict the likelihood of two documents involving a citation relationships. This study will help accurately model social network linkage relationships, and can be used to assist e ective decision making for friend recommendation in social networking, and reference recommendation in scienti c writing etc. In the Chapter 5, we advance a topic discovery and trend prediction principle to discover meaningful topics from a set of data collection, and further model the evolution trend of the topic. By proposing techniques to discover topics from text, and using temporal correlation between trend for prediction, our techniques can be used to summarize a large collection of documents as meaningful topics, and further forecast the popularity of the topic in a near future. This study can help design systems to discover popular topics in social media, and further assist resource planning and scheduling based on the discovered topics and the their evolution trend. In the Chapter 6, we employ both text mining and topic modeling to the medical domain for effective decision making. The goal is to discover knowledge from medical notes to predict the risk of a patient being re-admitted in a near future. Our research emphasizes on the challenge that re-admitted patients are only a small portion of the patient population, although they bring signficant financial loss. As a result, the datasets are highly imbalanced which often result in poor accuracy for decision making. Our research will propose to use latent topic modeling to carryout localized sampling, and combine models trained from multiple copies of sampled data for accurate prediction. This study can be directly used to assist hospital re-admission assessment for early warning and decision support. The text mining and topic modeling techniques investigated in the dissertation can be applied to many other domains, involving texts and social relationships, towards pattern and knowledge based e ective decision making.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004782, http://purl.flvc.org/fau/fd/FA00004782
- Subject Headings
- Social sciences--Research--Methodology., Data mining., Machine learning., Database searching., Discourse analysis--Data processing., Communication--Network analysis., Medical care--Quality control.
- Format
- Document (PDF)
- Title
- User Behavior Modeling in Online Display Advertising.
- Creator
- Van Nice, Kara, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Online display advertising intends to find the best match between advertise- ment (ad) campaigns and online users, conditioned by user specific contexts such as geographic locations, and hobbies etc. During this matching process, user behavior plays a crucial role in determining whether and when the user, who has been served the ad, will result in a conversion event. Advertisers seek to understand how users behave if they are continuously served impressions from the same campaign, as well as...
Show moreOnline display advertising intends to find the best match between advertise- ment (ad) campaigns and online users, conditioned by user specific contexts such as geographic locations, and hobbies etc. During this matching process, user behavior plays a crucial role in determining whether and when the user, who has been served the ad, will result in a conversion event. Advertisers seek to understand how users behave if they are continuously served impressions from the same campaign, as well as any noticeable patterns between campaign categorization and user behavior. This thesis carries out data analytics to investigate correlation between user behavior and campaign conversion rates (CVR), including click-through conversion rates and view- through conversion rates. We investigate campaign categorization based on both IAB categories, and campaign dfficulty level defined by effective CPA (eCPA). We carry out large scale analytics over billions of impressions from over 1000 campaigns, observing consistent patterns and significant findings.
Show less - Date Issued
- 2018
- PURL
- http://purl.flvc.org/fau/fd/FA00013018
- Subject Headings
- Consumer behavior, Advertising campaigns--Data processing, Internet advertising
- Format
- Document (PDF)
- Title
- Real-Time Data Analytics and Optimization for Computational Advertising.
- Creator
- Liu, Hui, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Online advertising has built a market of hundreds of billions of dollars and still continues to grow. With well developed techniques in big data storage, data mining and analytics, online advertising is able to reach targeted audiences e ctively. Real- time bidding refers to the buying and selling of online ad impressions through ad inventory auctions which occur in the time it takes a webpage to load. How to de- termine the bidding price and how to allocate the budget of advertising is the...
Show moreOnline advertising has built a market of hundreds of billions of dollars and still continues to grow. With well developed techniques in big data storage, data mining and analytics, online advertising is able to reach targeted audiences e ctively. Real- time bidding refers to the buying and selling of online ad impressions through ad inventory auctions which occur in the time it takes a webpage to load. How to de- termine the bidding price and how to allocate the budget of advertising is the key to successful ad campaigns. Both of these aspects are fundamental to most campaign optimizations and we will introduce both of them in this thesis. For bidding price determination, we improved the estimation of CTR (Click Through Rate) (one of the most important factors of determining the bidding price) by using a re ned hierar- chical tree structure for the estimation. The result of the experiment and the A/B test showed our proposal can provide stable improvement. For budget allocation, we introduce SCO (Single Campaign Optimization) and CCO (Cross Campaign Opti- mization). SCO has been applied by our commercial partner while CCO needs more research. We will rst introduce the methods of SCO and then give our proposal about CCO. We modeled CCO as a LP (Linear Programming) problem as well as designed an e ective procedure to implement optimal impressions distribution. Our simulation showed our proposal can signi cantly increase global Gross Pro t (GP).
Show less - Date Issued
- 2017
- PURL
- http://purl.flvc.org/fau/fd/FA00004940, http://purl.flvc.org/fau/fd/FA00004940
- Subject Headings
- Internet marketing--Technological innovations., Internet advertising--Technological innovations., Data mining., Web usage mining., Business--Data processing.
- Format
- Document (PDF)
- Title
- Sensitivity analysis of predictive data analytic models to attributes.
- Creator
- Chiou, James, Zhu, Xingquan, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Classification algorithms represent a rich set of tools, which train a classification model from a given training and test set, to classify previously unseen test instances. Although existing methods have studied classification algorithm performance with respect to feature selection, noise condition, and sample distributions, our existing studies have not addressed an important issue on the classification algorithm performance relating to feature deletion and addition. In this thesis, we...
Show moreClassification algorithms represent a rich set of tools, which train a classification model from a given training and test set, to classify previously unseen test instances. Although existing methods have studied classification algorithm performance with respect to feature selection, noise condition, and sample distributions, our existing studies have not addressed an important issue on the classification algorithm performance relating to feature deletion and addition. In this thesis, we carry out sensitive study of classification algorithms by using feature deletion and addition. Three types of classifiers: (1) weak classifiers; (2) generic and strong classifiers; and (3) ensemble classifiers are validated on three types of data (1) feature dimension data, (2) gene expression data and (3) biomedical document data. In the experiments, we continuously add redundant features to the training and test set in order to observe the classification algorithm performance, and also continuously remove features to find the performance of the underlying classifiers. Our studies draw a number of important findings, which will help data mining and machine learning community under the genuine performance of common classification algorithms on real-world data.
Show less - Date Issued
- 2014
- PURL
- http://purl.flvc.org/fau/fd/FA00004274, http://purl.flvc.org/fau/fd/FA00004274
- Subject Headings
- Data mining, Forecasting -- Mathematical models, Social sciences -- Statistical methods, Ubiquitous computing
- Format
- Document (PDF)
- Title
- Pattern mining and visualization for molecular dynamics simulation.
- Creator
- Kong, Xue, Zhu, Xingquan, Florida Atlantic University, College of Computer Science and Engineering, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Molecular dynamics is a computer simulation technique for expressing the ultimate details of individual particle motions and can be used in many fields, such as chemical physics, materials science, and the modeling of biomolecules. In this thesis, we study visualization and pattern mining in molecular dynamics simulation. The molecular data set has a large number of atoms in each frame and range of frames. The features of the data set include atom ID; frame number; position in x, y, and z...
Show moreMolecular dynamics is a computer simulation technique for expressing the ultimate details of individual particle motions and can be used in many fields, such as chemical physics, materials science, and the modeling of biomolecules. In this thesis, we study visualization and pattern mining in molecular dynamics simulation. The molecular data set has a large number of atoms in each frame and range of frames. The features of the data set include atom ID; frame number; position in x, y, and z plane; charge; and mass. The three main challenges of this thesis are to display a larger number of atoms and range of frames, to visualize this large data set in 3-dimension, and to cluster the abnormally shifting atoms that move with the same pace and direction in different frames. Focusing on these three challenges, there are three contributions of this thesis. First, we design an abnormal pattern mining and visualization framework for molecular dynamics simulation. The proposed framework can visualize the clusters of abnormal shifting atom groups in a three-dimensional space, and show their temporal relationships. Second, we propose a pattern mining method to detect abnormal atom groups which share similar movement and have large variance compared to the majority atoms. We propose a general molecular dynamics simulation tool, which can visualize a large number of atoms, including their movement and temporal relationships, to help domain experts study molecular dynamics simulation results. The main functions for this visualization and pattern mining tool include atom number, cluster visualization, search across different frames, multiple frame range search, frame range switch, and line demonstration for atom motions in different frames. Therefore, this visualization and pattern mining tool can be used in the field of chemical physics, materials science, and the modeling of biomolecules for the molecular dynamic simulation outcomes.
Show less - Date Issued
- 2014
- PURL
- http://purl.flvc.org/fau/fd/FA00004212, http://purl.flvc.org/fau/fd/FA00004212
- Subject Headings
- Data mining, Information visualization, Molecular dynamics -- Computer simulation, Molecules -- Mathematical models, Pattern perception
- Format
- Document (PDF)
- Title
- DEEP LEARNING FOR CRIME PREDICTION.
- Creator
- Gacharich, Nicholas, Zhu, Xingquan, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
In this research, we propose to use deep learning to predict crimes in small neighborhoods (regions) of a city, by using historical crime data collected from the past. The motivation of crime predictions is that if we can predict the number crimes that will occur in a certain week then the city officials and law enforcement can prepare resources and manpower more effectively. Due to inherent connections between geographic regions and crime activities, the crime numbers in different regions ...
Show moreIn this research, we propose to use deep learning to predict crimes in small neighborhoods (regions) of a city, by using historical crime data collected from the past. The motivation of crime predictions is that if we can predict the number crimes that will occur in a certain week then the city officials and law enforcement can prepare resources and manpower more effectively. Due to inherent connections between geographic regions and crime activities, the crime numbers in different regions (with respect to different time periods) are often correlated. Such correlation brings challenges and opportunities to employ deep learning to learn features from historical data for accurate prediction of the future crime numbers for each neighborhood. To leverage crime correlations between different regions, we convert crime data into a heat map, to show the intensity of crime numbers and the geographical distributions. After that, we design a deep learning framework to learn from such heat map for prediction. In our study, we look at the crime reported in twenty different neighbourhoods in Vancouver, Canada over a twenty week period and predict the total crime count that will occur in the future. We will look at the number of crimes per week that have occurred in the span of ten weeks and predict the crime count for the following weeks. The location of where the crimes occur is extracted from a database and plotted onto a heat map. The model we are using to predict the crime count consists of a CNN (Convolutional Neural Network) and a LSTM (Long-Short Term Memory) network attached to the CNN. The purpose of the CNN is to train the model spatially and understand where crimes occur in the images. The LSTM is used to train the model temporally and help us understand which week the crimes occur in time. By feeding the model heat map images of crime hot spots into the CNN and LSTM network, we will be able to predict the crime count and the most likely locations of the crimes for future weeks.
Show less - Date Issued
- 2021
- PURL
- http://purl.flvc.org/fau/fd/FA00013723
- Subject Headings
- Deep learning, Crime forecasting
- Format
- Document (PDF)
- Title
- EMBEDDING LEARNING FOR COMPLEX DYNAMIC INFORMATION NETWORKS.
- Creator
- Wu, Man, Zhu, Xingquan, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
With the rapid development of networking platforms and data intensive applications, networks (or graphs) are becoming convenient and fundamental tools to model the complex inter-dependence among big scale data. As a result, networks (or graphs) are being widely used in many applications, including citation networks [40], social media networks [71], and so on. However, the high complexity (containing many important information) as well as the dynamic nature of the network makes the graph...
Show moreWith the rapid development of networking platforms and data intensive applications, networks (or graphs) are becoming convenient and fundamental tools to model the complex inter-dependence among big scale data. As a result, networks (or graphs) are being widely used in many applications, including citation networks [40], social media networks [71], and so on. However, the high complexity (containing many important information) as well as the dynamic nature of the network makes the graph learning task more difficult. To have better graph representations (capture both node content and graph structure), many research efforts have been made to develop reliable and efficient algorithms. Therefore, the good graph representation learning is the key factor in performing well on downstream tasks. The dissertation mainly focuses on the graph representation learning, which aims to embed both structure and node content information of graphs into a compact and low dimensional space for a new representation learning. More specifically, in order to achieve an efficient and robust graph representation, the following four problems will be studied from different perspectives: 1) We study the problem of positive unlabeled graph learning for network node classification, and present a new deep learning model as a solution; 2) We formulate a new open-world learning problem for graph data, and propose an uncertain node representation learning approach and sampling strategy to solve the problem; 3) For cross-domain graph learning, we present a novel unsupervised graph domain adaptation problem, and propose an effective graph convolutional network algorithm to solve it; 4) We consider a dynamic graph as a network with changing nodes and edges in temporal order and propose a temporal adaptive aggregation network (TAAN) for dynamic graph learning. Finally, the proposed models are verified and evaluated on various real-world datasets.
Show less - Date Issued
- 2022
- PURL
- http://purl.flvc.org/fau/fd/FA00014066
- Subject Headings
- Neural networks (Computer science), Machine learning, Graphs, Embeddings (Mathematics)
- Format
- Document (PDF)
- Title
- FEDERATED LEARNING FOR MEDICAL IMAGE CLASSIFICATION.
- Creator
- Blazanovic, Danica, Zhu, Xingquan, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Machine learning (ML) has traditionally been used to make predictive models by training on local data. However, due to concerns regarding privacy, it is not always possible to collect and combine data from different sources. On the other hand, if there are insufficient data available, it might not be possible to construct accurate models to produce meaningful outcomes. This is where Federated Learning comes to the rescue. Federated Learning (FL) represents a sophisticated distributed machine...
Show moreMachine learning (ML) has traditionally been used to make predictive models by training on local data. However, due to concerns regarding privacy, it is not always possible to collect and combine data from different sources. On the other hand, if there are insufficient data available, it might not be possible to construct accurate models to produce meaningful outcomes. This is where Federated Learning comes to the rescue. Federated Learning (FL) represents a sophisticated distributed machine learning strategy that enables multiple devices hosted at different institutions such as hospitals, to collaboratively train a global model while ensuring that their respective data remains securely stored on-premises. It addresses privacy concerns and data protection regulations, because raw data does not need to be shared or centralized during the training process. This thesis research studies how two different FL architectures, centralized and decentralized FL, affect medical image classification. To study and validate the findings, skin cancer images dataset is used in a federated learning setting with five sites/clients, and a center for centralized FL. Experimental results show that using both centralized and decentralized (peer to peer) version of FL for classification of skin cancer images outperforms using the traditional ML. In addition, two different FL settings, centralized federated learning (CFL) and decentralized federated learning (DFL), are compared using different data distributions across sites/clients. Our study shows that the best accuracy (95.14%) was achieved with the DFL model when tested on the original dataset (without adding bias to the class distributions). This asserts that class distribution imbalance between sites has a significant impact to the federated learning.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014205
- Subject Headings
- Medical imaging, Diagnostic Imaging--classification, Machine learning
- Format
- Document (PDF)
- Title
- NETWORK FEATURE ENGINEERING AND DATA SCIENCE ANALYTICS FOR CYBER THREAT INTELLIGENCE.
- Creator
- Wheelus, Charles, Zhu, Xingquan, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
While it is evident that network services continue to play an ever-increasing role in our daily lives, it is less evident that our information infrastructure requires a concerted, well-conceived, and fastidiously executed strategy to remain viable. Government agencies, Non-Governmental Organizations (\NGOs"), and private organizations are all targets for malicious online activity. Security has deservedly become a serious focus for organizations that seek to assume a more proactive posture; in...
Show moreWhile it is evident that network services continue to play an ever-increasing role in our daily lives, it is less evident that our information infrastructure requires a concerted, well-conceived, and fastidiously executed strategy to remain viable. Government agencies, Non-Governmental Organizations (\NGOs"), and private organizations are all targets for malicious online activity. Security has deservedly become a serious focus for organizations that seek to assume a more proactive posture; in order to deal with the many facets of securing their infrastructure. At the same time, the discipline of data science has rapidly grown into a prominent role, as once purely theoretical machine learning algorithms have become practical for implementation. This is especially noteworthy, as principles that now fall neatly into the field of data science has been contemplated for quite some time, and as much as over two hundred years ago. Visionaries like Thomas Bayes [18], Andrey Andreyevich Markov [65], Frank Rosenblatt [88], and so many others made incredible contributions to the field long before the impact of Moore's law [92] would make such theoretical work commonplace for practical use; giving rise to what has come to be known as "Data Science".
Show less - Date Issued
- 2020
- PURL
- http://purl.flvc.org/fau/fd/FA00013620
- Subject Headings
- Cyber security, Computer security, Information infrastructure, Predictive analytics
- Format
- Document (PDF)
- Title
- FEATURE REPRESENTATION LEARNING FOR ONLINE ADVERTISING AND RECOMMENDATIONS.
- Creator
- Gharibshah, Zhabiz, Zhu, Xingquan, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Online advertising [100], as a multi-billion dollar business, provides a common marketing experience when people access online services using electronic devices, such as desktop computers, tablets, smartphones, and so on. Using the Internet as a means of advertising, different stakeholders take actions in the background to provide and deliver advertisements to users through numerous platforms, such as search engines, news sites, and social networks, where dedicated spots of areas are used to...
Show moreOnline advertising [100], as a multi-billion dollar business, provides a common marketing experience when people access online services using electronic devices, such as desktop computers, tablets, smartphones, and so on. Using the Internet as a means of advertising, different stakeholders take actions in the background to provide and deliver advertisements to users through numerous platforms, such as search engines, news sites, and social networks, where dedicated spots of areas are used to display advertisements (ads) along with search results, posts, or page content. Online advertising is mainly based on dynamically selecting ads through a real-time bidding (or auction) mechanism. Predicting user responses like clicking ads in e-commerce platforms and internet-based advertising systems, as the first measurable user response, is an essential step for many digital advertising and recommendation systems to capture the user’s propensity to follow up actions, such as purchasing a product or subscribing to a service. To maximize revenue and user satisfaction, online advertising platforms must predict the expected user behavior of each displayed advertisement and maximize the user’s expectations of clicking [28]. Based on this observed feedback, these systems are tailored to user preferences to decide the order in that ads or any promoted content should be served to them. This objective provides an incentive to develop new research by using ideas derived from different domains like machine learning and data mining combined with models for information retrieval and mathematical optimization. They introduce different machine learning and data mining methods that employ deep learning-based predictive models to learn the representation of input features with the aim of user response prediction. Feature representation learning is known as a fundamental task on how to input information is going to be represented in machine learning models. A good feature representation learning method that seeks to learn low-dimensional embedding vectors is a key factor for the success of many downstream analytics tasks, such as click-through prediction and conversion prediction in recommendation systems and online advertising platforms.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014269
- Subject Headings
- Internet advertising, Deep learning (Machine learning), Internet marketing
- Format
- Document (PDF)
- Title
- TACKLING BIAS, PRIVACY, AND SCARCITY CHALLENGES IN HEALTH DATA ANALYTICS.
- Creator
- Wang, Shuwen, Zhu, Xingquan, Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
Health data analysis has emerged as a critical domain with immense potential to revolutionize healthcare delivery, disease management, and medical research. However, it is confronted by formidable challenges, including sample bias, data privacy concerns, and the cost and scarcity of labeled data. These challenges collectively impede the development of accurate and robust machine learning models for various healthcare applications, from disease diagnosis to treatment recommendations. Sample...
Show moreHealth data analysis has emerged as a critical domain with immense potential to revolutionize healthcare delivery, disease management, and medical research. However, it is confronted by formidable challenges, including sample bias, data privacy concerns, and the cost and scarcity of labeled data. These challenges collectively impede the development of accurate and robust machine learning models for various healthcare applications, from disease diagnosis to treatment recommendations. Sample bias and specificity refer to the inherent challenges in working with health datasets that may not be representative of the broader population or may exhibit disparities in their distributions. These biases can significantly impact the generalizability and effectiveness of machine learning models in healthcare, potentially leading to suboptimal outcomes for certain patient groups. Data privacy and locality are paramount concerns in the era of digital health records and wearable devices. The need to protect sensitive patient information while still extracting valuable insights from these data sources poses a delicate balancing act. Moreover, the geographic and jurisdictional differences in data regulations further complicate the use of health data in a global context. Label cost and scarcity pertain to the often labor-intensive and expensive process of obtaining ground-truth labels for supervised learning tasks in healthcare. The limited availability of labeled data can hinder the development and deployment of machine learning models, particularly in specialized medical domains.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014336
- Subject Headings
- Data analytics, Data mining, Ensemble learning (Machine learning), Machine learning, Health
- Format
- Document (PDF)
- Title
- iVEST A: Interactive Data Visualization and Analysis for Drive Test Data Evaluation.
- Creator
- Lee, Yongsuk, Zhu, Xingquan, Pandya, Abhijit S., Hsu, Sam, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In this thesis, a practical solution for drive test data evaluation and a real application are studied. We propose a system framework to project high dimensional Drive Test Data (DTD) to well-organized web pages, such that users can visually review phone performance with respect to different factors. The proposed application, iVESTA (interactive Visualization and Evaluation System for driven Test dAta), employs a web-based architecture which enables users to upload DTD and immediately...
Show moreIn this thesis, a practical solution for drive test data evaluation and a real application are studied. We propose a system framework to project high dimensional Drive Test Data (DTD) to well-organized web pages, such that users can visually review phone performance with respect to different factors. The proposed application, iVESTA (interactive Visualization and Evaluation System for driven Test dAta), employs a web-based architecture which enables users to upload DTD and immediately visualize the test results and observe phone and network performances with respect to different factors such as dropped call rate, signal quality, vehicle speed, handover and network delays. iVESTA provides practical solutions for mobile phone manufacturers and network service providers to perform comprehensive study on their products from the real-world DTD.
Show less - Date Issued
- 2007
- PURL
- http://purl.flvc.org/fau/fd/FA00012532
- Subject Headings
- Information visualization--Data processing, Object-oriented programming (Computer science), Information technology--Management, Application software--Development
- Format
- Document (PDF)