Differences

This shows you the differences between two versions of the page.

Link to this comparison view

is_phoneme_length_and_phoneme_energy_useful_in_automatic_speaker_recognition [2014/03/15 12:10]
bziolko
is_phoneme_length_and_phoneme_energy_useful_in_automatic_speaker_recognition [2014/03/15 12:11] (current)
bziolko
Line 5: Line 5:
 We discuss, if automatic speaker recognition can be enhanced by analyzing speech prosody features contained within a phonemic level by concentrating on the parameters of duration, energy and power.  Duration was defined in msec,  energy was unitless,  expressed by the root mean square of normalized acoustic signal , power was defined as the amount of energy per msec. To define these speaker-dependent parameters, analyses were performed on the “CORPORA” database, a Polish speech corpus from 45 speakers. The prosodic properties for each speaker were expressed  by features vectors, and we computed speakers individual distance to mean vector values derived for the whole group .  We discuss, if automatic speaker recognition can be enhanced by analyzing speech prosody features contained within a phonemic level by concentrating on the parameters of duration, energy and power.  Duration was defined in msec,  energy was unitless,  expressed by the root mean square of normalized acoustic signal , power was defined as the amount of energy per msec. To define these speaker-dependent parameters, analyses were performed on the “CORPORA” database, a Polish speech corpus from 45 speakers. The prosodic properties for each speaker were expressed  by features vectors, and we computed speakers individual distance to mean vector values derived for the whole group . 
  
-As expected, phonemic temporal values were speaker dependent. Duration of phonemes was found to be a good descriptor of speech rate for each speaker. Also an average phoneme energy and power (were speaker-specific. +As expected, phonemic temporal values were speaker dependent. Duration of phonemes was found to be a good descriptor of speech rate for each speaker. Also an average phoneme energy and power were speaker-specific. 
  
 Our research suggests that temporal speech signal features can be used to complement standard feature vector analysis of time-energy distribution applied for computer based speaker recognition systems, can be applied to speech modeling (e.g. speech rate normalization), in automatic speech recognition and in natural speech synthesis.  Our research suggests that temporal speech signal features can be used to complement standard feature vector analysis of time-energy distribution applied for computer based speaker recognition systems, can be applied to speech modeling (e.g. speech rate normalization), in automatic speech recognition and in natural speech synthesis. 
  
Copyright © XXII PVC Organizing Committee 2013. All Rights Reserved.