You are here
Data Quality in Data Mining and Machine Learning
- Date Issued:
- 2007
- Summary:
- With advances in data storage and data transmission technologies, and given the increasing use of computers by both individuals and corporations, organizations are accumulating an ever-increasing amount of information in data warehouses and databases. The huge surge in data, however, has made the process of extracting useful, actionable, and interesting knowled_qe from the data extremely difficult. In response to the challenges posed by operating in a data-intensive environment, the fields of data mining and machine learning (DM/ML) have successfully provided solutions to help uncover knowledge buried within data. DM/ML techniques use automated (or semi-automated) procedures to process vast quantities of data in search of interesting patterns. DM/ML techniques do not create knowledge, instead the implicit assumption is that knowledge is present within the data, and these procedures are needed to uncover interesting, important, and previously unknown relationships. Therefore, the quality of the data is absolutely critical in ensuring successful analysis. Having high quality data, i.e., data which is (relatively) free from errors and suitable for use in data mining tasks, is a necessary precondition for extracting useful knowledge. In response to the important role played by data quality, this dissertation investigates data quality and its impact on DM/ML. First, we propose several innovative procedures for coping with low quality data. Another aspect of data quality, the occurrence of missing values, is also explored. Finally, a detailed experimental evaluation on learning from noisy and imbalanced datasets is provided, supplying valuable insight into how class noise in skewed datasets affects learning algorithms.
Title: | Data Quality in Data Mining and Machine Learning. |
228 views
144 downloads |
---|---|---|
Name(s): |
Van Hulse, Jason Khoshgoftaar, Taghi M., Thesis advisor Florida Atlantic University, Degree Grantor College of Engineering and Computer Science Department of Computer and Electrical Engineering and Computer Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Date Created: | 2007 | |
Date Issued: | 2007 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 226 p. | |
Language(s): | English | |
Summary: | With advances in data storage and data transmission technologies, and given the increasing use of computers by both individuals and corporations, organizations are accumulating an ever-increasing amount of information in data warehouses and databases. The huge surge in data, however, has made the process of extracting useful, actionable, and interesting knowled_qe from the data extremely difficult. In response to the challenges posed by operating in a data-intensive environment, the fields of data mining and machine learning (DM/ML) have successfully provided solutions to help uncover knowledge buried within data. DM/ML techniques use automated (or semi-automated) procedures to process vast quantities of data in search of interesting patterns. DM/ML techniques do not create knowledge, instead the implicit assumption is that knowledge is present within the data, and these procedures are needed to uncover interesting, important, and previously unknown relationships. Therefore, the quality of the data is absolutely critical in ensuring successful analysis. Having high quality data, i.e., data which is (relatively) free from errors and suitable for use in data mining tasks, is a necessary precondition for extracting useful knowledge. In response to the important role played by data quality, this dissertation investigates data quality and its impact on DM/ML. First, we propose several innovative procedures for coping with low quality data. Another aspect of data quality, the occurrence of missing values, is also explored. Finally, a detailed experimental evaluation on learning from noisy and imbalanced datasets is provided, supplying valuable insight into how class noise in skewed datasets affects learning algorithms. | |
Identifier: | FA00000858 (IID) | |
Degree granted: | Dissertation (Ph.D.)--Florida Atlantic University, 2007. | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): |
Includes bibliography. College of Engineering and Computer Science |
|
Subject(s): |
Data mining--Quality control Machine learning Electronic data processing--Quality control |
|
Held by: | Florida Atlantic University Libraries | |
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FA00000858 | |
Sublocation: | Digital Library | |
Use and Reproduction: | Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |