You are here
COMPUTATION IN SELF-ATTENTION NETWORKS
- Date Issued:
- 2022
- Abstract/Description:
- Neural network models with many tunable parameters can be trained to approximate functions that transform a source distribution, or dataset, into a target distribution of interest. In contrast to low-parameter models with simple governing equations, the dynamics of transformations learned in deep neural network models are abstract and the correspondence of dynamical structure to predictive function is opaque. Despite their “black box” nature, neural networks converge to functions that implement complex tasks in computer vision, Natural Language Processing (NLP), and the sciences when trained on large quantities of data. Where traditional machine learning approaches rely on clean datasets with appropriate features, sample densities, and label distributions to mitigate unwanted bias, modern Transformer neural networks with self-attention mechanisms use Self-Supervised Learning (SSL) to pretrain on large, unlabeled datasets scraped from the internet without concern for data quality. SSL tasks have been shown to learn functions that match or outperform their supervised learning counterparts in many fields, even without task-specific finetuning. The recent paradigm shift to pretraining large models with massive amounts of unlabeled data has given credibility to the hypothesis that SSL pretraining can produce functions that implement generally intelligent computations.
Title: | COMPUTATION IN SELF-ATTENTION NETWORKS. |
34 views
13 downloads |
---|---|---|
Name(s): |
Morris, Paul , author Barenholtz, Elan , Thesis advisor Florida Atlantic University, Degree grantor Center for Complex Systems and Brain Sciences Charles E. Schmidt College of Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Date Created: | 2022 | |
Date Issued: | 2022 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 99 p. | |
Language(s): | English | |
Abstract/Description: | Neural network models with many tunable parameters can be trained to approximate functions that transform a source distribution, or dataset, into a target distribution of interest. In contrast to low-parameter models with simple governing equations, the dynamics of transformations learned in deep neural network models are abstract and the correspondence of dynamical structure to predictive function is opaque. Despite their “black box” nature, neural networks converge to functions that implement complex tasks in computer vision, Natural Language Processing (NLP), and the sciences when trained on large quantities of data. Where traditional machine learning approaches rely on clean datasets with appropriate features, sample densities, and label distributions to mitigate unwanted bias, modern Transformer neural networks with self-attention mechanisms use Self-Supervised Learning (SSL) to pretrain on large, unlabeled datasets scraped from the internet without concern for data quality. SSL tasks have been shown to learn functions that match or outperform their supervised learning counterparts in many fields, even without task-specific finetuning. The recent paradigm shift to pretraining large models with massive amounts of unlabeled data has given credibility to the hypothesis that SSL pretraining can produce functions that implement generally intelligent computations. | |
Identifier: | FA00014061 (IID) | |
Degree granted: | Dissertation (PhD)--Florida Atlantic University, 2022. | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): | Includes bibliography. | |
Subject(s): |
Neural networks (Computer science) Machine learning Self-supervised learning |
|
Persistent Link to This Record: | http://purl.flvc.org/fau/fd/FA00014061 | |
Use and Reproduction: | Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |