Current Search: Speech processing systems (x)
View All Items
- Title
- Automatic V/UV/S classification of continuous speech without a predetermined training set.
- Creator
- Leung, Chung Sing., Florida Atlantic University, Kostopoulos, George, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
In speech analysis, a Voiced-Unvoiced-Silence (V/UV/S) decision is performed through pattern recognition, based on measurements made on the signal. The examined speech segment is assigned to a particular class, V/UV/S, based on a minimum probability-of-error decision rule which is obtained under the assumption that the measured parameters are distributed according to a multidimensional Gaussian probability density function. The means and covariances for the Gaussian distribution are...
Show moreIn speech analysis, a Voiced-Unvoiced-Silence (V/UV/S) decision is performed through pattern recognition, based on measurements made on the signal. The examined speech segment is assigned to a particular class, V/UV/S, based on a minimum probability-of-error decision rule which is obtained under the assumption that the measured parameters are distributed according to a multidimensional Gaussian probability density function. The means and covariances for the Gaussian distribution are determined from manually classified speech data included in a training set. If the recording conditions vary considerably, a new set of training data is required. With the assumption that all three classes exist in the incoming speech signal, this research describes an automatic parametric learning method. Such a method estimates the means and covariances from the incoming speech signal and provides a reliable classification in any reasonable acoustic environment. This approach eliminates the necessity for the manual classification of training data and has the capability of being self-adapting to the background acoustic environment as well as to speech level variations. Thus the presented approach can be readily applied to on-line continuous speech classification without prior recognition.
Show less - Date Issued
- 1989
- PURL
- http://purl.flvc.org/fcla/dt/12246
- Subject Headings
- Speech synthesis, Speech processing systems
- Format
- Document (PDF)
- Title
- APPLICATION OF LINE SPECTRUM PAIRS TO TONE DETECTION (SINEWAVE, FREQUENCIES, SINUSOIDAL, PREDICTIVE, AUTOCORRELATION).
- Creator
- WODKE, KENNETH E., Florida Atlantic University, Erdol, Nurgun, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
This thesis deals with the application of Line Spectrum Pairs to tone detection. Linear Predictive Coding (LPC) is described as a background to deriving the Line Spectrum Pairs. Two sources of LPC prediction coefficients are used to calcul?te Line Spectrum Pairs. One source is the polynomial roots of an LPC inverse filter; various locations of up to 3 pairs of complex conjugate roots are used to provide filter coefficients. The radii of the conjugate roots are varied to see the effect on the...
Show moreThis thesis deals with the application of Line Spectrum Pairs to tone detection. Linear Predictive Coding (LPC) is described as a background to deriving the Line Spectrum Pairs. Two sources of LPC prediction coefficients are used to calcul?te Line Spectrum Pairs. One source is the polynomial roots of an LPC inverse filter; various locations of up to 3 pairs of complex conjugate roots are used to provide filter coefficients. The radii of the conjugate roots are varied to see the effect on the calculated Line Spectrum Pairs. A second source of the filter coefficients is single and multiple sinusoidal tones that are LPC analyzed by the autocorrelation method to generate filter prediction coefficients. The frequencies and amplitudes of the summed sinusoids, and the length of the LPC analysis window are varied to determine the ability to detect the sinusoids by calculating the related Line Spectrum Pairs.
Show less - Date Issued
- 1986
- PURL
- http://purl.flvc.org/fcla/dt/14328
- Subject Headings
- Speech processing systems
- Format
- Document (PDF)
- Title
- Multiresolution analysis of glottal pulses.
- Creator
- Miguel, Agnieszka C., Florida Atlantic University, Erdol, Nurgun, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Glottal pulse models provide vocal tract excitation signals which are used in producing high quality speech. Most of the currently used glottal pulse models are obtained by concatenating a small number of parametric functions over the pitch period. In this thesis, a new glottal pulse model is proposed. It is an alternative approach, which is based on the projection of glottal volume velocity over multiresolution subspaces spanned by wavelets and scaling functions. A detailed multiresolution...
Show moreGlottal pulse models provide vocal tract excitation signals which are used in producing high quality speech. Most of the currently used glottal pulse models are obtained by concatenating a small number of parametric functions over the pitch period. In this thesis, a new glottal pulse model is proposed. It is an alternative approach, which is based on the projection of glottal volume velocity over multiresolution subspaces spanned by wavelets and scaling functions. A detailed multiresolution analysis of the glottal models is performed using the compactly supported orthogonal Daubechies wavelets. The wavelet representation has been tested for optimality in terms of the reconstruction error and the energy compactness of the coefficients. It is demonstrated that by choosing proper parameters of the wavelet representation, high compression ratios and low rms error can be achieved.
Show less - Date Issued
- 1996
- PURL
- http://purl.flvc.org/fcla/dt/15334
- Subject Headings
- Signal processing, Speech processing systems, Wavelets (Mathematics)--Data processing
- Format
- Document (PDF)
- Title
- Missing speech packet reconstruction based on the short-time energy and the zero-crossings.
- Creator
- Castelluccia, Claude., Florida Atlantic University, Erdol, Nurgun
- Abstract/Description
-
A waveform substitution technique using interpolation based on such slow varying parameters of speech as short-time energy and average zero-crossing rate is developed for a packetized speech communication system. The system uses 64 Kbps conventional PCM for encoding and takes advantage of active talkpurts and silence intervals to increase the utilization efficiency of a digital link. The short-time energy and average zero-crossing rates calculated for the purpose of determining talkpurts are...
Show moreA waveform substitution technique using interpolation based on such slow varying parameters of speech as short-time energy and average zero-crossing rate is developed for a packetized speech communication system. The system uses 64 Kbps conventional PCM for encoding and takes advantage of active talkpurts and silence intervals to increase the utilization efficiency of a digital link. The short-time energy and average zero-crossing rates calculated for the purpose of determining talkpurts are transmitted in a preceeding packet. Hence, when a packet is pronounced "lost", its envelope and frequency characteristics are obtained from the previous packet and used to synthetize a substitution waveform which is free of annoying sounds that are due to abrupt changes in amplitude. Informal listening tests show that tolerable packet loss rate up to 40% are achievable with these procedures.
Show less - Date Issued
- 1991
- PURL
- http://purl.flvc.org/fcla/dt/14704
- Subject Headings
- Packet switching (Data transmission), Speech processing systems
- Format
- Document (PDF)
- Title
- Voice activity detection over multiresolution subspaces.
- Creator
- Schultz, Robert Carl., Florida Atlantic University, Erdol, Nurgun, College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Society's increased demand for communications requires searching for techniques that preserve bandwidth. It has been observed that much of the time spent during telephone communications is actually idle time with no voice activity present. Detecting these idle periods and preventing transmission during these idle periods can aid in reducing bandwidth requirements during high traffic periods. While techniques exist to perform this detection, certain types of noise can prove difficult at best...
Show moreSociety's increased demand for communications requires searching for techniques that preserve bandwidth. It has been observed that much of the time spent during telephone communications is actually idle time with no voice activity present. Detecting these idle periods and preventing transmission during these idle periods can aid in reducing bandwidth requirements during high traffic periods. While techniques exist to perform this detection, certain types of noise can prove difficult at best for signal detection. The use of wavelets with multi-resolution subspaces can aid detection by providing noise whitening and signal matching. This thesis explores its use and proposes a technique for detection.
Show less - Date Issued
- 1999
- PURL
- http://purl.flvc.org/fcla/dt/15740
- Subject Headings
- Speech processing systems, Signal processing--Digital techniques, Wavelets (Mathematics)
- Format
- Document (PDF)
- Title
- Model-based classification of speech audio.
- Creator
- Thoman, Chris., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
This work explores the process of model-based classification of speech audio signals using low-level feature vectors. The process of extracting low-level features from audio signals is described along with a discussion of established techniques for training and testing mixture model-based classifiers and using these models in conjunction with feature selection algorithms to select optimal feature subsets. The results of a number of classification experiments using a publicly available speech...
Show moreThis work explores the process of model-based classification of speech audio signals using low-level feature vectors. The process of extracting low-level features from audio signals is described along with a discussion of established techniques for training and testing mixture model-based classifiers and using these models in conjunction with feature selection algorithms to select optimal feature subsets. The results of a number of classification experiments using a publicly available speech database, the Berlin Database of Emotional Speech, are presented. This includes experiments in optimizing feature extraction parameters and comparing different feature selection results from over 700 candidate feature vectors for the tasks of classifying speaker gender, identity, and emotion. In the experiments, final classification accuracies of 99.5%, 98.0% and 79% were achieved for the gender, identity and emotion tasks respectively.
Show less - Date Issued
- 2009
- PURL
- http://purl.flvc.org/FAU/210518
- Subject Headings
- Signal processing, Digital techniques, Speech processing systems, Sound, Recording and reproducing, Digital techniques, Pattern recognition systems
- Format
- Document (PDF)
- Title
- Sensitivity analysis of blind separation of speech mixtures.
- Creator
- Bulek, Savaskan., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
Blind source separation (BSS) refers to a class of methods by which multiple sensor signals are combined with the aim of estimating the original source signals. Independent component analysis (ICA) is one such method that effectively resolves static linear combinations of independent non-Gaussian distributions. We propose a method that can track variations in the mixing system by seeking a compromise between adaptive and block methods by using mini-batches. The resulting permutation...
Show moreBlind source separation (BSS) refers to a class of methods by which multiple sensor signals are combined with the aim of estimating the original source signals. Independent component analysis (ICA) is one such method that effectively resolves static linear combinations of independent non-Gaussian distributions. We propose a method that can track variations in the mixing system by seeking a compromise between adaptive and block methods by using mini-batches. The resulting permutation indeterminacy is resolved based on the correlation continuity principle. Methods employing higher order cumulants in the separation criterion are susceptible to outliers in the finite sample case. We propose a robust method based on low-order non-integer moments by exploiting the Laplacian model of speech signals. We study separation methods for even (over)-determined linear convolutive mixtures in the frequency domain based on joint diagonalization of matrices employing time-varying second order statistics. We investigate the sources affecting the sensitivity of the solution under the finite sample case such as the set size, overlap amount and cross-spectrum estimation methods.
Show less - Date Issued
- 2010
- PURL
- http://purl.flvc.org/FAU/2953201
- Subject Headings
- Blind source separation, Mathematical models, Signal processing, Digital techniques, Neural networks (Computer science), Automatic speech recognition, Speech processing systems
- Format
- Document (PDF)
- Title
- Spectral refinement to speech enhancement.
- Creator
- Charoenruengkit, Werayuth., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
- Abstract/Description
-
The goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency amplitude of clean speech is estimated per short-time frame of the noisy signal. The state of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function...
Show moreThe goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency amplitude of clean speech is estimated per short-time frame of the noisy signal. The state of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function of the noisy signal. The PSD has to be computed from a large ensemble of signal realizations. However, in practice, it may only be estimated from a finite-length sample of a single realization of the signal. Estimation errors introduced by these limitations deviate the solution from the optimal. Various spectral estimation techniques, many with added spectral smoothing, have been investigated for decades to reduce the estimation errors. These algorithms do not address significantly issue on quality of speech as perceived by a human. This dissertation presents analysis and techniques that offer spectral refinements toward speech enhancement. We present an analytical framework of the effect of spectral estimate variance on the performance of speech enhancement. We use the variance quality factor (VQF) as a quantitative measure of estimated spectra. We show that reducing the spectral estimator VQF reduces significantly the VQF of the enhanced speech. The Autoregressive Multitaper (ARMT) spectral estimate is proposed as a low VQF spectral estimator for use in speech enhancement algorithms. An innovative method of incorporating a speech production model using multiband excitation is also presented as a technique to emphasize the harmonic components of the glottal speech input., The preconditioning of the noisy estimates by exploiting other avenues of information, such as pitch estimation and the speech production model, effectively increases the localized narrow-band signal-to noise ratio (SNR) of the noisy signal, which is subsequently denoised by the amplitude gain. Combined with voicing structure enhancement, the ARMT spectral estimate delivers enhanced speech with sound clarity desirable to human listeners. The resulting improvements in enhanced speech are observed to be significant with both Objective and Subjective measurement.
Show less - Date Issued
- 2009
- PURL
- http://purl.flvc.org/FAU/186327
- Subject Headings
- Adaptive signal processing, Digital techniques, Spectral theory (Mathematics), Noise control, Fuzzy algorithms, Speech processing systems, Digital techniques
- Format
- Document (PDF)