Current Search: Natural language processing Computer science (x)
View All Items
- Title
- A COMPARATIVE STUDY OF STRUCTURED VERSUS UNSTRUCTURED TEXT DATA.
- Creator
- Cardenas, Erika, Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
In today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of...
Show moreIn today’s world, data is generated at an unprecedented rate, and a significant portion of it is unstructured text data. The recent advancements in Natural Language Processing have enabled computers to understand and interpret human language. Data mining techniques were once unable to use text data due to the high dimensionality of text processing models. This limitation was overcome with the ability to represent data as text. This thesis aims to compare the predictive performance of structured versus unstructured text data in two different applications. The first application is in the field of real estate. We compare the performance of tabular real-estate data and unstructured text descriptions of homes to predict the house price. The second application is in translating Electronic Health Records (EHR) tabular data to text data for survival classification of COVID-19 patients. Lastly, we present a range of strategies and perspectives for future research.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014220
- Subject Headings
- Natural language processing (Computer science), Text data mining
- Format
- Document (PDF)
- Title
- OCR2SEQ: A NOVEL MULTI-MODAL DATA AUGMENTATION PIPELINE FOR WEAK SUPERVISION.
- Creator
- Lowe, Michael A., Khoshgoftaar, Taghi M., Florida Atlantic University, Department of Computer and Electrical Engineering and Computer Science, College of Engineering and Computer Science
- Abstract/Description
-
With the recent large-scale adoption of Large Language Models in multidisciplinary research and commercial space, the need for large amounts of labeled data has become more crucial than ever to evaluate potential use cases for opportunities in applied intelligence. Most domain specific fields require a substantial shift that involves extremely large amounts of heterogeneous data to have meaningful impact on the pre-computed weights of most large language models. We explore extending the...
Show moreWith the recent large-scale adoption of Large Language Models in multidisciplinary research and commercial space, the need for large amounts of labeled data has become more crucial than ever to evaluate potential use cases for opportunities in applied intelligence. Most domain specific fields require a substantial shift that involves extremely large amounts of heterogeneous data to have meaningful impact on the pre-computed weights of most large language models. We explore extending the capabilities a state-of-the-art unsupervised pre-training method; Transformers and Sequential Denoising Auto-Encoder (TSDAE). In this study we show various opportunities for using OCR2Seq a multi-modal generative augmentation strategy to further enhance and measure the quality of noise samples used when using TSDAE as a pretraining task. This study is a first of its kind work that leverages converting both generalized and sparse domains of relational data into multi-modal sources. Our primary objective is measuring the quality of augmentation in relation to the current implementation of the sentence transformers library. Further work includes the effect on ranking, language understanding, and corrective quality.
Show less - Date Issued
- 2023
- PURL
- http://purl.flvc.org/fau/fd/FA00014367
- Subject Headings
- Natural language processing (Computer science), Deep learning (Machine learning)
- Format
- Document (PDF)
- Title
- An evaluation of machine learning algorithms for tweet sentiment analysis.
- Creator
- Prusa, Joseph D., Khoshgoftaar, Taghi M., Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Sentiment analysis of tweets is an application of mining Twitter, and is growing in popularity as a means of determining public opinion. Machine learning algorithms are used to perform sentiment analysis; however, data quality issues such as high dimensionality, class imbalance or noise may negatively impact classifier performance. Machine learning techniques exist for targeting these problems, but have not been applied to this domain, or have not been studied in detail. In this thesis we...
Show moreSentiment analysis of tweets is an application of mining Twitter, and is growing in popularity as a means of determining public opinion. Machine learning algorithms are used to perform sentiment analysis; however, data quality issues such as high dimensionality, class imbalance or noise may negatively impact classifier performance. Machine learning techniques exist for targeting these problems, but have not been applied to this domain, or have not been studied in detail. In this thesis we discuss research that has been conducted on tweet sentiment classification, its accompanying data concerns, and methods of addressing these concerns. We test the impact of feature selection, data sampling and ensemble techniques in an effort to improve classifier performance. We also evaluate the combination of feature selection and ensemble techniques and examine the effects of high dimensionality when combining multiple types of features. Additionally, we provide strategies and insights for potential avenues of future work.
Show less - Date Issued
- 2015
- PURL
- http://purl.flvc.org/fau/fd/FA00004460, http://purl.flvc.org/fau/fd/FA00004460
- Subject Headings
- Social media., Natural language processing (Computer science), Machine learning., Algorithms., Fuzzy expert systems., Artificial intelligence.
- Format
- Document (PDF)
- Title
- Context-based Image Concept Detection and Annotation.
- Creator
- Zolghadr, Esfandiar, Furht, Borko, Florida Atlantic University, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Scene understanding attempts to produce a textual description of visible and latent concepts in an image to describe the real meaning of the scene. Concepts are either objects, events or relations depicted in an image. To recognize concepts, the decision of object detection algorithm must be further enhanced from visual similarity to semantical compatibility. Semantically relevant concepts convey the most consistent meaning of the scene. Object detectors analyze visual properties (e.g., pixel...
Show moreScene understanding attempts to produce a textual description of visible and latent concepts in an image to describe the real meaning of the scene. Concepts are either objects, events or relations depicted in an image. To recognize concepts, the decision of object detection algorithm must be further enhanced from visual similarity to semantical compatibility. Semantically relevant concepts convey the most consistent meaning of the scene. Object detectors analyze visual properties (e.g., pixel intensities, texture, color gradient) of sub-regions of an image to identify objects. The initially assigned objects names must be further examined to ensure they are compatible with each other and the scene. By enforcing inter-object dependencies (e.g., co-occurrence, spatial and semantical priors) and object to scene constraints as background information, a concept classifier predicts the most semantically consistent set of names for discovered objects. The additional background information that describes concepts is called context. In this dissertation, a framework for building context-based concept detection is presented that uses a combination of multiple contextual relationships to refine the result of underlying feature-based object detectors to produce most semantically compatible concepts. In addition to the lack of ability to capture semantical dependencies, object detectors suffer from high dimensionality of feature space that impairs them. Variances in the image (i.e., quality, pose, articulation, illumination, and occlusion) can also result in low-quality visual features that impact the accuracy of detected concepts. The object detectors used to build context-based framework experiments in this study are based on the state-of-the-art generative and discriminative graphical models. The relationships between model variables can be easily described using graphical models and the dependencies and precisely characterized using these representations. The generative context-based implementations are extensions of Latent Dirichlet Allocation, a leading topic modeling approach that is very effective in reduction of the dimensionality of the data. The discriminative contextbased approach extends Conditional Random Fields which allows efficient and precise construction of model by specifying and including only cases that are related and influence it. The dataset used for training and evaluation is MIT SUN397. The result of the experiments shows overall 15% increase in accuracy in annotation and 31% improvement in semantical saliency of the annotated concepts.
Show less - Date Issued
- 2016
- PURL
- http://purl.flvc.org/fau/fd/FA00004745, http://purl.flvc.org/fau/fd/FA00004745
- Subject Headings
- Computer vision--Mathematical models., Pattern recognition systems., Information visualization., Natural language processing (Computer science), Multimodal user interfaces (Computer systems), Latent structure analysis., Expert systems (Computer science)
- Format
- Document (PDF)