You are here

A COMPARATIVE STUDY OF STRUCTURED VERSUS UNSTRUCTURED TEXT DATA

Download pdf | Full Screen View

Date Issued:
2023
Abstract/Description:
In today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of structured versus unstructured text data in two different applications. The first application is in the field of real estate. We compare the performance of tabular real-estate data and unstructured text descriptions of homes to predict the house price. The second application is in translating Electronic Health Records (EHR) tabular data to text data for survival classification of COVID-19 patients. Lastly, we present a range of strategies and perspectives for future research.
Title: A COMPARATIVE STUDY OF STRUCTURED VERSUS UNSTRUCTURED TEXT DATA.
20 views
6 downloads
Name(s): Cardenas, Erika , author
Khoshgoftaar, Taghi M. , Thesis advisor
Florida Atlantic University, Degree grantor
Department of Computer and Electrical Engineering and Computer Science
College of Engineering and Computer Science
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Date Created: 2023
Date Issued: 2023
Publisher: Florida Atlantic University
Place of Publication: Boca Raton, Fla.
Physical Form: application/pdf
Extent: 48 p.
Language(s): English
Abstract/Description: In today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of structured versus unstructured text data in two different applications. The first application is in the field of real estate. We compare the performance of tabular real-estate data and unstructured text descriptions of homes to predict the house price. The second application is in translating Electronic Health Records (EHR) tabular data to text data for survival classification of COVID-19 patients. Lastly, we present a range of strategies and perspectives for future research.
Identifier: FA00014220 (IID)
Degree granted: Thesis (MS)--Florida Atlantic University, 2023.
Collection: FAU Electronic Theses and Dissertations Collection
Note(s): Includes bibliography.
Subject(s): Natural language processing (Computer science)
Text data mining
Persistent Link to This Record: http://purl.flvc.org/fau/fd/FA00014220
Use and Reproduction: Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Host Institution: FAU