Theses and Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1

Browse

Search Results

Now showing 1 - 2 of 2
  • ItemOpen Access
    Feature based approach for singer identification
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Radadia, Purushotam G.; Patil, Hemant A.
    One of the challenging and difficult problems under the category of Music Information Retrieval (MIR) is to identify a singer of a given song under strong instrumental accompaniments. Besides instrumental sounds, other parameters are also severely affecting Singer IDentification (SID) accuracy, such as quality of song recording devices, transmission channels and other singing voices present within a song. In our work, we propose singer identification with large database of 500 songs (largest database ever used in any of the SID problem) prepared from Hindi (Indian Language) Bollywood songs. In addition, vocal portions are segmented manually from each of the songs. Different features have been employed in addition to state-of-the-art feature set, Mel Frequency Cepstral Coefficients (MFCC) in this thesis work. To identify a singer, three classifiers are employed, viz., 2nd order polynomial classifier, 3rd order polynomial classifier and state-of-the-art GMM classifier. Furthermore, to alleviate the effect of recording devices and transmission channels, Cepstral Mean Subtraction (CMS) technique on MFCC is utilized for singer identification and it is providing better results than the baseline MFCC alone. Moreover, the 3rd order classifier outperforms amongst all three classifiers. Score-level fusion technique of MFCC and CMSMFCC is also used in this thesis and it improves the results significantly.
  • ItemOpen Access
    Person recognition from their hum
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Madhavi, Maulik C.; Patil, Hemant A.
    In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work is based on using Mel filterbank-based cepstral features for person recognition task. The first task was consisted of data collection and corpus design procedure for humming. For this purpose, humming for old Hindi songs from around 170 subjects are used. Then feature extraction schemes were developed. Mel filterbank follows the human perception for hearing, so MFCC was used as state-of- the-art feature set. Then some modifications in filterbank structure were done in order to compute Gaussian Mel scalebased MFCC (GMFCC) and Inverse Mel scale-based MFCC (IMFCC) feature sets. In this thesis mainly two features are explored. First feature set captures the phase information via MFCC utilizing VTEO (Variable length Teager Energy Operator) in time-domain, i.e., MFCC-VTMP and second captures the vocal-source information called as Variable length Teager Energy Operator based MFCC, i.e., VTMFCC. The proposed feature set MFCCVTMP has two characteristics, viz., it captures phase information and other it uses the property of VTEO. VTEO is extension of TEO and it is a nonlinear energy tracking operator. Feature sets like VTMFCC captures the vocal-source information. This information exhibits the excitation mechanism in the speech (hum) production process. It is found to be having complementary nature of information than the vocal tract information. So the score-level fusion based approach of different source and system features improves the person recognition performance.