M Tech Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3
Browse
3 results
Search Results
Item Open Access Acoustic analysis of musical pillars of vitthala temple, Hampi(Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Lakshmipriya, V. K.; Patil, Hemant A.This thesis is a systematic investigation on the acoustics of musical pillars of Vitthala temple at Hampi, India. The columns of different pillars produce sounds of different musical instruments (in particular, instruments used in Indian classical music) when struck by a finger. The earlier works in this area constituted a detailed dimensional and spectral analysis of these musical columns. The use of beam theories to model the columns and calculate the flexural frequencies was also suggested before. In this thesis, the performances of two major beam theories, viz., Euler-Bernoulli beam theory and Timoshenko beam theory, are compared based on the calculated flexural frequencies. The Euler-Bernoulli model gives better performance despite the fact that it is the simplest beam model and does not consider shear and rotation effects of beam bending. The concept of linear prediction (LP) from speech processing is applied onto the columns of musical pillars. The LP spectrum obtained matches closely with the spectrum of sound from the pillar. The concatenated tube model is also used to estimate the varying area function of the columns. Furthermore, an attempt for synthesis of sound from musical pillars is presented using the digital waveguide model. Eventhough the synthesized sound is audibly similar to the pillar’s sound, the model needs many improvements to match its spectral characteristics with that of the sound from the musical pillar.Item Open Access Speaker recognition over VoIP network(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Goswami, Parth A.; Patil, Hemant A.This thesis deals with the Automatic Speaker Recognition (ASR) system over narrowband Voice over Internet Protocol (VoIP) networks. There are several artifacts of VoIP network such as speech codec, packet loss and packet re-ordering, network jitter & echo. In this thesis, packet loss is considered as the research issue in order to investigate performance degradation for an ASR system, due to packet loss. As the voice packets travel over Internet Protocol (IP) network, they tend to take different routes. Some of them are dropped by the channel due to congestion and some are rejected by the receiver. This packet loss reduces the perceptual quality of speech. Therefore, it is natural to expect that packet loss may affects the performance of an ASR system. To alleviate this degradation in ASR system performance due to packet loss, novel interleaving schemes and lossy training method are proposed. It is shown in the present work that these interleaving schemes and lossy training methods significantly help in improving the performance of an ASR system.Item Open Access Person recognition from their hum(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Madhavi, Maulik C.; Patil, Hemant A.In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work is based on using Mel filterbank-based cepstral features for person recognition task. The first task was consisted of data collection and corpus design procedure for humming. For this purpose, humming for old Hindi songs from around 170 subjects are used. Then feature extraction schemes were developed. Mel filterbank follows the human perception for hearing, so MFCC was used as state-of- the-art feature set. Then some modifications in filterbank structure were done in order to compute Gaussian Mel scalebased MFCC (GMFCC) and Inverse Mel scale-based MFCC (IMFCC) feature sets. In this thesis mainly two features are explored. First feature set captures the phase information via MFCC utilizing VTEO (Variable length Teager Energy Operator) in time-domain, i.e., MFCC-VTMP and second captures the vocal-source information called as Variable length Teager Energy Operator based MFCC, i.e., VTMFCC. The proposed feature set MFCCVTMP has two characteristics, viz., it captures phase information and other it uses the property of VTEO. VTEO is extension of TEO and it is a nonlinear energy tracking operator. Feature sets like VTMFCC captures the vocal-source information. This information exhibits the excitation mechanism in the speech (hum) production process. It is found to be having complementary nature of information than the vocal tract information. So the score-level fusion based approach of different source and system features improves the person recognition performance.