Differences

This shows you the differences between two versions of the page.

Link to this comparison view

a_multistage_algorithm_for_fricative_spotting [2014/03/25 21:48]
bziolko created
a_multistage_algorithm_for_fricative_spotting [2014/03/25 21:48] (current)
bziolko
Line 3: Line 3:
 **A multistage algorithm for fricative spotting** **A multistage algorithm for fricative spotting**
  
-Institiutions:  Tel-Hai College+Tel-Hai College
  
 The task of finding all the appearances of a phoneme or a group of phonemes in a given speech utterance is known as phoneme spotting. It can be valuable in many situations in which a differential manipulation of these phonemes is required, such as improving the aesthetic quality of phonemes with bad pronunciation or recording environment flaws, or for enhancing the ability of the hearing impaired to understand speech utterances with phonemes which otherwise may be difficult to perceive. In this paper we focus on spotting of fricatives, which are known to introduce difficulties in both situations. We present a simple and efficient algorithm for detection and demarcation of fricatives in recorded speech or songs. The algorithm consists of several stages. In the first stage, a feature vector is computed for each short frame of the speech signal, composed of moments of the zero-crossing rate, spectral peaks and band energy ratio. Each frame is classified as fricative or as non-fricative, using a linear discriminant analysis (LDA) based algorithm, pre-trained on a small set of phonemes. To reduce the number of false positives, a decision-tree classifier is applied in the second stage, trained to discriminate between fricatives and stops, the most prevalent false positives. The decision-tree algorithm was compared and outperformed several classifiers such as a support vector machine (SVM), trained on the same database. Evaluation of the algorithm using hundreds of sentences with over 4600 fricatives from the TIMIT speech database yielded a high detection rate (90%), with relatively low percentage of false positive (specificity of 98%). The number of false positives can be further reduced, at the cost of a somewhat decreased detection rate. The algorithm uses only information contained in the acoustic signal, which makes it equally applicable to all languages and dialects. It is based on simple features and standard classifiers, which contributes to efficiency and reduced computational complexity. Furthermore, each portion of audio is processed independently of other portions, making it applicable for real-time implementation. The task of finding all the appearances of a phoneme or a group of phonemes in a given speech utterance is known as phoneme spotting. It can be valuable in many situations in which a differential manipulation of these phonemes is required, such as improving the aesthetic quality of phonemes with bad pronunciation or recording environment flaws, or for enhancing the ability of the hearing impaired to understand speech utterances with phonemes which otherwise may be difficult to perceive. In this paper we focus on spotting of fricatives, which are known to introduce difficulties in both situations. We present a simple and efficient algorithm for detection and demarcation of fricatives in recorded speech or songs. The algorithm consists of several stages. In the first stage, a feature vector is computed for each short frame of the speech signal, composed of moments of the zero-crossing rate, spectral peaks and band energy ratio. Each frame is classified as fricative or as non-fricative, using a linear discriminant analysis (LDA) based algorithm, pre-trained on a small set of phonemes. To reduce the number of false positives, a decision-tree classifier is applied in the second stage, trained to discriminate between fricatives and stops, the most prevalent false positives. The decision-tree algorithm was compared and outperformed several classifiers such as a support vector machine (SVM), trained on the same database. Evaluation of the algorithm using hundreds of sentences with over 4600 fricatives from the TIMIT speech database yielded a high detection rate (90%), with relatively low percentage of false positive (specificity of 98%). The number of false positives can be further reduced, at the cost of a somewhat decreased detection rate. The algorithm uses only information contained in the acoustic signal, which makes it equally applicable to all languages and dialects. It is based on simple features and standard classifiers, which contributes to efficiency and reduced computational complexity. Furthermore, each portion of audio is processed independently of other portions, making it applicable for real-time implementation.
Copyright © XXII PVC Organizing Committee 2013. All Rights Reserved.