You are here
Big Data Analytics and Engineering for Medicare Fraud Detection
- Date Issued:
- 2019
- Abstract/Description:
- The United States (U.S.) healthcare system produces an enormous volume of data with a vast number of financial transactions generated by physicians administering healthcare services. This makes healthcare fraud difficult to detect, especially when there are considerably less fraudulent transactions than non-fraudulent. Fraud is an extremely important issue for healthcare, as fraudulent activities within the U.S. healthcare system contribute to significant financial losses. In the U.S., the elderly population continues to rise, increasing the need for programs, such as Medicare, to help with associated medical expenses. Unfortunately, due to healthcare fraud, these programs are being adversely affected, draining resources and reducing the quality and accessibility of necessary healthcare services. In response, advanced data analytics have recently been explored to detect possible fraudulent activities. The Centers for Medicare and Medicaid Services (CMS) released several ‘Big Data’ Medicare claims datasets for different parts of their Medicare program to help facilitate this effort. In this dissertation, we employ three CMS Medicare Big Data datasets to evaluate the fraud detection performance available using advanced data analytics techniques, specifically machine learning. We use two distinct approaches, designated as anomaly detection and traditional fraud detection, where each have very distinct data processing and feature engineering. Anomaly detection experiments classify by provider specialty, determining whether outlier physicians within the same specialty signal fraudulent behavior. Traditional fraud detection refers to the experiments directly classifying physicians as fraudulent or non-fraudulent, leveraging machine learning algorithms to discriminate between classes. We present our novel data engineering approaches for both anomaly detection and traditional fraud detection including data processing, fraud mapping, and the creation of a combined dataset consisting of all three Medicare parts. We incorporate the List of Excluded Individuals and Entities database to identify real world fraudulent physicians for model evaluation. Regarding features, the final datasets for anomaly detection contain only claim counts for every procedure a physician submits while traditional fraud detection incorporates aggregated counts and payment information, specialty, and gender. Additionally, we compare cross-validation to the real world application of building a model on a training dataset and evaluating on a separate test dataset for severe class imbalance and rarity.
Title: | Big Data Analytics and Engineering for Medicare Fraud Detection. |
![]() ![]() |
---|---|---|
Name(s): |
Herland, Matthew Andrew, author Khoshgoftaar, Taghi M., Thesis advisor Florida Atlantic University, Degree grantor College of Engineering and Computer Science Department of Computer and Electrical Engineering and Computer Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Date Created: | 2019 | |
Date Issued: | 2019 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 204 p. | |
Language(s): | English | |
Abstract/Description: | The United States (U.S.) healthcare system produces an enormous volume of data with a vast number of financial transactions generated by physicians administering healthcare services. This makes healthcare fraud difficult to detect, especially when there are considerably less fraudulent transactions than non-fraudulent. Fraud is an extremely important issue for healthcare, as fraudulent activities within the U.S. healthcare system contribute to significant financial losses. In the U.S., the elderly population continues to rise, increasing the need for programs, such as Medicare, to help with associated medical expenses. Unfortunately, due to healthcare fraud, these programs are being adversely affected, draining resources and reducing the quality and accessibility of necessary healthcare services. In response, advanced data analytics have recently been explored to detect possible fraudulent activities. The Centers for Medicare and Medicaid Services (CMS) released several ‘Big Data’ Medicare claims datasets for different parts of their Medicare program to help facilitate this effort. In this dissertation, we employ three CMS Medicare Big Data datasets to evaluate the fraud detection performance available using advanced data analytics techniques, specifically machine learning. We use two distinct approaches, designated as anomaly detection and traditional fraud detection, where each have very distinct data processing and feature engineering. Anomaly detection experiments classify by provider specialty, determining whether outlier physicians within the same specialty signal fraudulent behavior. Traditional fraud detection refers to the experiments directly classifying physicians as fraudulent or non-fraudulent, leveraging machine learning algorithms to discriminate between classes. We present our novel data engineering approaches for both anomaly detection and traditional fraud detection including data processing, fraud mapping, and the creation of a combined dataset consisting of all three Medicare parts. We incorporate the List of Excluded Individuals and Entities database to identify real world fraudulent physicians for model evaluation. Regarding features, the final datasets for anomaly detection contain only claim counts for every procedure a physician submits while traditional fraud detection incorporates aggregated counts and payment information, specialty, and gender. Additionally, we compare cross-validation to the real world application of building a model on a training dataset and evaluating on a separate test dataset for severe class imbalance and rarity. | |
Identifier: | FA00013215 (IID) | |
Degree granted: | Dissertation (Ph.D.)--Florida Atlantic University, 2019. | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): | Includes bibliography. | |
Subject(s): |
Big data Medicare fraud Data analytics Machine learning |
|
Held by: | Florida Atlantic University Libraries | |
Sublocation: | Digital Library | |
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FA00013215 | |
Use and Reproduction: | Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |