Theses and Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1

Browse

Search Results

Now showing 1 - 2 of 2
  • ItemOpen Access
    i-vector-Based Speaker and Person Recognition
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2017) Naik, Apeksha J.; Patil, Hemant A.
    Speaker recognition is the process of determining whether the speech is uttered by the claimed speaker or not with the help of machines. Speaker recognition or voice biometrics is well suited for many real-world applications over other available biometrics due to its simplicity and ease of implementation. Though the speaker recognition field has grown over the past decade, there are still some limitations. In this thesis, we studied those limitations and implemented three different speaker recognition systems, which address some of the issues in the speaker recognition. First system is Gaussian Mixture Model-Universal Background Model (GMM-UBM)-based Automatic Speaker Verification (ASV) system in which target models are adapted from a well-trained UBM. This is the classical approach in speaker verification. The problem with this method is that it is very slow during verification phase and it does not account for channel variability. Hence, to overcome these problems, we shifted to the recent state-of-the-art i-vector-based technique. In this technique, each utterance is presented using a low-dimensional vector called as an i-vector. This low-dimensional representation makes the process in verification phase very fast. Other advantage of this method is, it gives better speaker recognition performance in various channel conditions. This i-vector-based technique is implemented using two different pattern classifiers, namely, Cosine Distance Scoring (CDS) and Probabilistic Linear Discriminant Analysis (PLDA). In addition, we also implemented all the three systems using recently proposed Phase-Encoded Mel Cepstral Coefficients (PEMCC) features and the results obtained were compared with the baseline Mel Frequency Cepstral Coefficients (MFCC) features. Furthermore, score-level fusion of MFCC and PEMCC features was performed and it gave better speaker verification performance over MFCC and PEMCC features alone, which illustrate the presence of complementary information in both the feature sets. All the experiments in this thesis are carried out on TIMIT database as well as on statistically meaningful NIST SRE 2002 database. Second part of this thesis focuses on the humming-based person recognition. Here, hum sound of a person (instead of speech utterance) is used to verify the identity claim. Teager energy operator (TEO)-based features termed as MFCC-VTMP features were used to develop the GMM-UBM-based system and i-vector CDS-based system. It is noted that performance of MFCC-VTMP features is better than the baseline MFCC features for both the systems. Finally, the thesis summarizes the work presented along with future research directions.
  • ItemOpen Access
    Gaussian mixture models for spoken language identification
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2006) Manwani, Naresh; Mitra, Suman K.; Joshi, Manjunath
    Language Identification (LID) is the problem of identifying the language of any spoken utterance irrespective of the topic, speaker or the duration of the speech. Although A very huge amount of work has been done for automatic Language Identification, accuracy and complexity of LID systems remains major challenges. People have used different methods of feature extraction of speech and have used different baseline systems for learning purpose. To understand the role of these issues a comparative study was conducted over few algorithms. The results of this study were used to select appropriate feature extraction method and the baseline system for LID. Based on the results of the study mentioned above we have used Gaussian Mixture Models (GMM) as our baseline system which are trained using Expectation Maximization (EM) algorithm. Mel Frequency Cepstral Coefficients (MFCC), its delta and delta-delta cepstral coefficients are used as features of speech applied to the system. English and three Indian languages (Hindi, Gujarati and Telugu) are used to test the performances. In this dissertation we have tried to improve the performance of GMM for LID. Two modified EM algorithms are used to overcome the limitations of EM algorithm. The first approach is Split and Merge EM algorithm The second variation is Model Selection Based Self-Splitting Gaussian Mixture Leaning We have also prepared the speech database for three Indian languages namely Hindi, Gujarati and Telugu and that we have used in our experiments.