You are here

Evolutionary Methods for Mining Data with Class Imbalance

Download pdf | Full Screen View

Date Issued:
2007
Abstract/Description:
Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Title: Evolutionary Methods for Mining Data with Class Imbalance.
90 views
34 downloads
Name(s): Drown, Dennis J.
Khoshgoftaar, Taghi M., Thesis advisor
Florida Atlantic University, Degree grantor
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Date Created: 2007
Date Issued: 2007
Publisher: Florida Atlantic University
Place of Publication: Boca Raton, Fla.
Physical Form: application/pdf
Extent: 172 p.
Language(s): English
Abstract/Description: Class imbalance tends to cause inferior performance in data mining learners, particularly with regard to predicting the minority class, which generally imposes a higher misclassification cost. This work explores the benefits of using genetic algorithms (GA) to develop classification models which are better able to deal with the problems encountered when mining datasets which suffer from class imbalance. Using GA we evolve configuration parameters suited for skewed datasets for three different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We also propose a novel technique called evolutionary sampling which works to remove noisy and unnecessary duplicate instances so that the sampled training data will produce a superior classifier for the imbalanced dataset. Our GA fitness function uses metrics appropriate for dealing with class imbalance, in particular the area under the ROC curve. We perform extensive empirical testing on these techniques and compare the results with seven exist ing sampling methods.
Identifier: FA00012515 (IID)
Degree granted: Thesis (M.S.)--Florida Atlantic University, 2007.
Collection: FAU Electronic Theses and Dissertations Collection
Note(s): College of Engineering and Computer Science
Subject(s): Combinatorial group theory
Data mining
Machine learning
Data structure (Computer science)
Held by: Florida Atlantic University Libraries
Sublocation: Digital Library
Persistent Link to This Record: http://purl.flvc.org/fau/fd/FA00012515
Use and Reproduction: Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU
Is Part of Series: Florida Atlantic University Digital Library Collections.