Current Search: Wald, Randall David. (x)
View All Items
- Title
- Comparison of Data Sampling Approaches for Imbalanced Bioinformatics Data.
- Creator
- Dittman, David, Wald, Randall, Napolitano, Amri E., Graduate College, Khoshgoftaar, Taghi M.
- Abstract/Description
-
Class imbalance is a frequent problem found in bioinformatics datasets. Unfortunately, the minority class is usually also the class of interest. One of the methods to improve this situation is data sampling. There are a number of different data sampling methods, each with their own strengths and weaknesses, which makes choosing one a difficult prospect. In our work we compare three data sampling techniques Random Undersampling, Random Oversampling, and SMOTE on six bioinformatics datasets...
Show moreClass imbalance is a frequent problem found in bioinformatics datasets. Unfortunately, the minority class is usually also the class of interest. One of the methods to improve this situation is data sampling. There are a number of different data sampling methods, each with their own strengths and weaknesses, which makes choosing one a difficult prospect. In our work we compare three data sampling techniques Random Undersampling, Random Oversampling, and SMOTE on six bioinformatics datasets with varying levels of class imbalance. Additionally, we apply two different classifiers to the problem 5-NN and SVM, and use feature selection to reduce our datasets to 25 features prior to applying sampling. Our results show that there is very little difference between the data sampling techniques, although Random Undersampling is the most frequent top performing data sampling technique for both of our classifiers. We also performed statistical analysis which confirms that there is no statistical difference between the techniques. Therefore, our recommendation is to use Random Undersampling when choosing a data sampling technique, because it is less computationally expensive to implement than SMOTE and it also reduces the size of the dataset, which will improve subsequent computational costs without sacrificing classification performance.
Show less - Date Issued
- 2014
- PURL
- http://purl.flvc.org/fau/fd/FA00005811
- Format
- Document (PDF)
- Title
- A review of the stability of feature selection techniques for bioinformatics data.
- Creator
- Awada, Wael, Khoshgoftaar, Taghi M., Dittman, David, Wald, Randall, Napolitano, Amri E., Graduate College
- Date Issued
- 2013-04-12
- PURL
- http://purl.flvc.org/fcla/dt/3361293
- Subject Headings
- Bioinformatics, DNA microarrays, Data mining
- Format
- Document (PDF)
- Title
- Vibration analysis for ocean turbine reliability models.
- Creator
- Wald, Randall David., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Submerged turbines which harvest energy from ocean currents are an important potential energy resource, but their harsh and remote environment demands an automated system for machine condition monitoring and prognostic health monitoring (MCM/PHM). For building MCM/PHM models, vibration sensor data is among the most useful (because it can show abnormal behavior which has yet to cause damage) and the most challenging (because due to its waveform nature, frequency bands must be extracted from...
Show moreSubmerged turbines which harvest energy from ocean currents are an important potential energy resource, but their harsh and remote environment demands an automated system for machine condition monitoring and prognostic health monitoring (MCM/PHM). For building MCM/PHM models, vibration sensor data is among the most useful (because it can show abnormal behavior which has yet to cause damage) and the most challenging (because due to its waveform nature, frequency bands must be extracted from the signal). To perform the necessary analysis of the vibration signals, which may arrive rapidly in the form of data streams, we develop three new wavelet-based transforms (the Streaming Wavelet Transform, Short-Time Wavelet Packet Decomposition, and Streaming Wavelet Packet Decomposition) and propose modifications to the existing Short-TIme Wavelet Transform. ... The proposed algorithms also create and select frequency-band features which focus on the areas of the signal most important to MCM/PHM, producing only the information necessary for building models (or removing all unnecessary information) so models can run on less powerful hardware. Finally, we demonstrate models which can work in multiple environmental conditions. ... Our results show that many of the transforms give similar results in terms of performance, but their different properties as to time complexity, ability to operate in a fully streaming fashion, and number of generated features may make some more appropriate than others in particular applications, such as when streaming data or hardware limitations are extremely important (e.g., ocean turbine MCM/PHM).
Show less - Date Issued
- 2012
- PURL
- http://purl.flvc.org/FAU/3359158
- Subject Headings
- Marine turbines, Mathematical models, Fluid dynamics, Structural dynamics, Vibration, Measurement, Stochastic processes
- Format
- Document (PDF)