M Tech Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 2 of 2
  • ItemOpen Access
    Person recognition from their hum
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Madhavi, Maulik C.; Patil, Hemant A.
    In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work is based on using Mel filterbank-based cepstral features for person recognition task. The first task was consisted of data collection and corpus design procedure for humming. For this purpose, humming for old Hindi songs from around 170 subjects are used. Then feature extraction schemes were developed. Mel filterbank follows the human perception for hearing, so MFCC was used as state-of- the-art feature set. Then some modifications in filterbank structure were done in order to compute Gaussian Mel scalebased MFCC (GMFCC) and Inverse Mel scale-based MFCC (IMFCC) feature sets. In this thesis mainly two features are explored. First feature set captures the phase information via MFCC utilizing VTEO (Variable length Teager Energy Operator) in time-domain, i.e., MFCC-VTMP and second captures the vocal-source information called as Variable length Teager Energy Operator based MFCC, i.e., VTMFCC. The proposed feature set MFCCVTMP has two characteristics, viz., it captures phase information and other it uses the property of VTEO. VTEO is extension of TEO and it is a nonlinear energy tracking operator. Feature sets like VTMFCC captures the vocal-source information. This information exhibits the excitation mechanism in the speech (hum) production process. It is found to be having complementary nature of information than the vocal tract information. So the score-level fusion based approach of different source and system features improves the person recognition performance.
  • ItemOpen Access
    Gaussian mixture models for spoken language identification
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2006) Manwani, Naresh; Mitra, Suman K.; Joshi, Manjunath
    Language Identification (LID) is the problem of identifying the language of any spoken utterance irrespective of the topic, speaker or the duration of the speech. Although A very huge amount of work has been done for automatic Language Identification, accuracy and complexity of LID systems remains major challenges. People have used different methods of feature extraction of speech and have used different baseline systems for learning purpose. To understand the role of these issues a comparative study was conducted over few algorithms. The results of this study were used to select appropriate feature extraction method and the baseline system for LID. Based on the results of the study mentioned above we have used Gaussian Mixture Models (GMM) as our baseline system which are trained using Expectation Maximization (EM) algorithm. Mel Frequency Cepstral Coefficients (MFCC), its delta and delta-delta cepstral coefficients are used as features of speech applied to the system. English and three Indian languages (Hindi, Gujarati and Telugu) are used to test the performances. In this dissertation we have tried to improve the performance of GMM for LID. Two modified EM algorithms are used to overcome the limitations of EM algorithm. The first approach is Split and Merge EM algorithm The second variation is Model Selection Based Self-Splitting Gaussian Mixture Leaning We have also prepared the speech database for three Indian languages namely Hindi, Gujarati and Telugu and that we have used in our experiments.