You are here

COMPUTATION IN SELF-ATTENTION NETWORKS

Download pdf | Full Screen View

Date Issued:
2022
Abstract/Description:
Neural network models with many tunable parameters can be trained to approximate functions that transform a source distribution, or dataset, into a target distribution of interest. In contrast to low-parameter models with simple governing equations, the dynamics of transformations learned in deep neural network models are abstract and the correspondence of dynamical structure to predictive function is opaque. Despite their “black box” nature, neural networks converge to functions that implement complex tasks in computer vision, Natural Language Processing (NLP), and the sciences when trained on large quantities of data. Where traditional machine learning approaches rely on clean datasets with appropriate features, sample densities, and label distributions to mitigate unwanted bias, modern Transformer neural networks with self-attention mechanisms use Self-Supervised Learning (SSL) to pretrain on large, unlabeled datasets scraped from the internet without concern for data quality. SSL tasks have been shown to learn functions that match or outperform their supervised learning counterparts in many fields, even without task-specific finetuning. The recent paradigm shift to pretraining large models with massive amounts of unlabeled data has given credibility to the hypothesis that SSL pretraining can produce functions that implement generally intelligent computations.
Title: COMPUTATION IN SELF-ATTENTION NETWORKS.
34 views
13 downloads
Name(s): Morris, Paul , author
Barenholtz, Elan , Thesis advisor
Florida Atlantic University, Degree grantor
Center for Complex Systems and Brain Sciences
Charles E. Schmidt College of Science
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Date Created: 2022
Date Issued: 2022
Publisher: Florida Atlantic University
Place of Publication: Boca Raton, Fla.
Physical Form: application/pdf
Extent: 99 p.
Language(s): English
Abstract/Description: Neural network models with many tunable parameters can be trained to approximate functions that transform a source distribution, or dataset, into a target distribution of interest. In contrast to low-parameter models with simple governing equations, the dynamics of transformations learned in deep neural network models are abstract and the correspondence of dynamical structure to predictive function is opaque. Despite their “black box” nature, neural networks converge to functions that implement complex tasks in computer vision, Natural Language Processing (NLP), and the sciences when trained on large quantities of data. Where traditional machine learning approaches rely on clean datasets with appropriate features, sample densities, and label distributions to mitigate unwanted bias, modern Transformer neural networks with self-attention mechanisms use Self-Supervised Learning (SSL) to pretrain on large, unlabeled datasets scraped from the internet without concern for data quality. SSL tasks have been shown to learn functions that match or outperform their supervised learning counterparts in many fields, even without task-specific finetuning. The recent paradigm shift to pretraining large models with massive amounts of unlabeled data has given credibility to the hypothesis that SSL pretraining can produce functions that implement generally intelligent computations.
Identifier: FA00014061 (IID)
Degree granted: Dissertation (PhD)--Florida Atlantic University, 2022.
Collection: FAU Electronic Theses and Dissertations Collection
Note(s): Includes bibliography.
Subject(s): Neural networks (Computer science)
Machine learning
Self-supervised learning
Persistent Link to This Record: http://purl.flvc.org/fau/fd/FA00014061
Use and Reproduction: Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU
Is Part of Series: Florida Atlantic University Digital Library Collections.