You are here

Model-based classification of speech audio

Download pdf | Full Screen View

Date Issued:
2009
Summary:
This work explores the process of model-based classification of speech audio signals using low-level feature vectors. The process of extracting low-level features from audio signals is described along with a discussion of established techniques for training and testing mixture model-based classifiers and using these models in conjunction with feature selection algorithms to select optimal feature subsets. The results of a number of classification experiments using a publicly available speech database, the Berlin Database of Emotional Speech, are presented. This includes experiments in optimizing feature extraction parameters and comparing different feature selection results from over 700 candidate feature vectors for the tasks of classifying speaker gender, identity, and emotion. In the experiments, final classification accuracies of 99.5%, 98.0% and 79% were achieved for the gender, identity and emotion tasks respectively.
Title: Model-based classification of speech audio.
1390 views
1308 downloads
Name(s): Thoman, Chris.
College of Engineering and Computer Science
Department of Computer and Electrical Engineering and Computer Science
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Issuance: monographic
Date Issued: 2009
Publisher: Florida Atlantic University
Physical Form: electronic
Extent: xiv, 186 p. : ill.
Language(s): English
Summary: This work explores the process of model-based classification of speech audio signals using low-level feature vectors. The process of extracting low-level features from audio signals is described along with a discussion of established techniques for training and testing mixture model-based classifiers and using these models in conjunction with feature selection algorithms to select optimal feature subsets. The results of a number of classification experiments using a publicly available speech database, the Berlin Database of Emotional Speech, are presented. This includes experiments in optimizing feature extraction parameters and comparing different feature selection results from over 700 candidate feature vectors for the tasks of classifying speaker gender, identity, and emotion. In the experiments, final classification accuracies of 99.5%, 98.0% and 79% were achieved for the gender, identity and emotion tasks respectively.
Identifier: 426148870 (oclc), 210518 (digitool), FADT210518 (IID), fau:3410 (fedora)
Note(s): by Chris Thoman.
Thesis (M.S.C.S.)--Florida Atlantic University, 2009.
Includes bibliography.
Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.
Subject(s): Signal processing -- Digital techniques
Speech processing systems
Sound -- Recording and reproducing -- Digital techniques
Pattern recognition systems
Held by: FBoU FAUER
Persistent Link to This Record: http://purl.flvc.org/FAU/210518
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU