You are here
A COMPARATIVE STUDY OF STRUCTURED VERSUS UNSTRUCTURED TEXT DATA
- Date Issued:
- 2023
- Abstract/Description:
- In today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of structured versus unstructured text data in two different applications. The first application is in the field of real estate. We compare the performance of tabular real-estate data and unstructured text descriptions of homes to predict the house price. The second application is in translating Electronic Health Records (EHR) tabular data to text data for survival classification of COVID-19 patients. Lastly, we present a range of strategies and perspectives for future research.
Title: | A COMPARATIVE STUDY OF STRUCTURED VERSUS UNSTRUCTURED TEXT DATA. |
43 views
18 downloads |
---|---|---|
Name(s): |
Cardenas, Erika , author Khoshgoftaar, Taghi M. , Thesis advisor Florida Atlantic University, Degree grantor Department of Computer and Electrical Engineering and Computer Science College of Engineering and Computer Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Date Created: | 2023 | |
Date Issued: | 2023 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 48 p. | |
Language(s): | English | |
Abstract/Description: | In today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of structured versus unstructured text data in two different applications. The first application is in the field of real estate. We compare the performance of tabular real-estate data and unstructured text descriptions of homes to predict the house price. The second application is in translating Electronic Health Records (EHR) tabular data to text data for survival classification of COVID-19 patients. Lastly, we present a range of strategies and perspectives for future research. | |
Identifier: | FA00014220 (IID) | |
Degree granted: | Thesis (MS)--Florida Atlantic University, 2023. | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): | Includes bibliography. | |
Subject(s): |
Natural language processing (Computer science) Text data mining |
|
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FA00014220 | |
Use and Reproduction: | Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Host Institution: | FAU |