You are here
An Empirical Study of Random Forests for Mining Imbalanced Data
- Date Issued:
- 2007
- Abstract/Description:
- Skewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed datasets. This work presents a comprehensive suite of experimentation evaluating the effectiveness of random forests for learning from imbalanced data. Reasonable parameter settings (for the Weka implementation) for ensemble size and number of random features selected are determined through experimentation oil 10 datasets. Further, the application of seven different data sampling techniques that are common methods for handling imbalanced data, in conjunction with RF, is also assessed. Finally, RF is benchmarked against 10 other commonly-used machine learning algorithms, and is shown to provide very strong performance. A total of 35 imbalanced datasets are used, and over one million classifiers are constructed in this work.
Title: | An Empirical Study of Random Forests for Mining Imbalanced Data. |
122 views
38 downloads |
---|---|---|
Name(s): |
Golawala, Moiz M. Khoshgoftaar, Taghi M., Thesis advisor Florida Atlantic University, Degree grantor |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Date Created: | 2007 | |
Date Issued: | 2007 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 83 p. | |
Language(s): | English | |
Abstract/Description: | Skewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed datasets. This work presents a comprehensive suite of experimentation evaluating the effectiveness of random forests for learning from imbalanced data. Reasonable parameter settings (for the Weka implementation) for ensemble size and number of random features selected are determined through experimentation oil 10 datasets. Further, the application of seven different data sampling techniques that are common methods for handling imbalanced data, in conjunction with RF, is also assessed. Finally, RF is benchmarked against 10 other commonly-used machine learning algorithms, and is shown to provide very strong performance. A total of 35 imbalanced datasets are used, and over one million classifiers are constructed in this work. | |
Identifier: | FA00012520 (IID) | |
Degree granted: | Thesis (M.S.)--Florida Atlantic University, 2007. | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): | College of Engineering and Computer Science | |
Subject(s): |
Data mining--Case studies Machine learning--Case studies Data structure (Computer science) Trees (Graph theory)--Case studies |
|
Held by: | Florida Atlantic University Libraries | |
Sublocation: | Digital Library | |
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FA00012520 | |
Use and Reproduction: | Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |