You are here

FRAUD DETECTION IN HIGHLY IMBALANCED BIG DATA WITH NOVEL AND EFFICIENT DATA REDUCTION TECHNIQUES

Download pdf | Full Screen View

Date Issued:
2024
Abstract/Description:
The rapid growth of digital transactions and the increasing sophistication of fraudulent activities have necessitated the development of robust and efficient fraud detection techniques, particularly in the financial and healthcare sectors. This dissertation focuses on the use of novel data reduction techniques for addressing the unique challenges associated with detecting fraud in highly imbalanced Big Data, with a specific emphasis on credit card transactions and Medicare claims. The highly imbalanced nature of these datasets, where fraudulent instances constitute less than one percent of the data, poses significant challenges for traditional machine learning algorithms. This dissertation explores novel data reduction techniques tailored for fraud detection in highly imbalanced Big Data. The primary objectives include developing efficient data preprocessing and feature selection methods to reduce data dimensionality while preserving the most informative features, investigating various machine learning algorithms for their effectiveness in handling imbalanced data, and evaluating the proposed techniques on real-world credit card and Medicare fraud datasets. This dissertation covers a comprehensive examination of datasets, learners, experimental methodology, sampling techniques, feature selection techniques, and hybrid techniques. Key contributions include the analysis of performance metrics in the context of newly available Big Medicare Data, experiments using Big Medicare data, application of a novel ensemble supervised feature selection technique, and the combined application of data sampling and feature selection. The research demonstrates that, across both domains, the combined application of random undersampling and ensemble feature selection significantly improves classification performance.
Title: FRAUD DETECTION IN HIGHLY IMBALANCED BIG DATA WITH NOVEL AND EFFICIENT DATA REDUCTION TECHNIQUES.
72 views
11 downloads
Name(s): Hancock III, John T. , author
Taghi M. Khoshgoftaar, Thesis advisor
Florida Atlantic University, Degree grantor
Department of Computer and Electrical Engineering and Computer Science
College of Engineering and Computer Science
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Date Created: 2024
Date Issued: 2024
Publisher: Florida Atlantic University
Place of Publication: Boca Raton, Fla.
Physical Form: application/pdf
Extent: 240 p.
Language(s): English
Abstract/Description: The rapid growth of digital transactions and the increasing sophistication of fraudulent activities have necessitated the development of robust and efficient fraud detection techniques, particularly in the financial and healthcare sectors. This dissertation focuses on the use of novel data reduction techniques for addressing the unique challenges associated with detecting fraud in highly imbalanced Big Data, with a specific emphasis on credit card transactions and Medicare claims. The highly imbalanced nature of these datasets, where fraudulent instances constitute less than one percent of the data, poses significant challenges for traditional machine learning algorithms. This dissertation explores novel data reduction techniques tailored for fraud detection in highly imbalanced Big Data. The primary objectives include developing efficient data preprocessing and feature selection methods to reduce data dimensionality while preserving the most informative features, investigating various machine learning algorithms for their effectiveness in handling imbalanced data, and evaluating the proposed techniques on real-world credit card and Medicare fraud datasets. This dissertation covers a comprehensive examination of datasets, learners, experimental methodology, sampling techniques, feature selection techniques, and hybrid techniques. Key contributions include the analysis of performance metrics in the context of newly available Big Medicare Data, experiments using Big Medicare data, application of a novel ensemble supervised feature selection technique, and the combined application of data sampling and feature selection. The research demonstrates that, across both domains, the combined application of random undersampling and ensemble feature selection significantly improves classification performance.
Identifier: FA00014424 (IID)
Degree granted: Dissertation (PhD)--Florida Atlantic University, 2024.
Collection: FAU Electronic Theses and Dissertations Collection
Note(s): Includes bibliography.
Subject(s): Fraud
Big data
Data reduction
Credit card fraud
Medicare fraud
Persistent Link to This Record: http://purl.flvc.org/fau/fd/FA00014424
Use and Reproduction: Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU