M Tech Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3
Browse
2 results
Search Results
Item Open Access Design of countermeasures for replay spoof speech attack(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Tak, Hemlata; Patil, Hemant A.Automatic Speaker Verification (ASV) system is a biometric person authentication system to verify a claimed speaker's identity from his/her voice with the help of machines. The ASV systems are vulnerable to various types of spoofing attacks, such as impersonation, speech synthesis (SS), voice conversion (VC), replay and twins. Replay attack poses one of the most difficult challenge for the use of ASV systems in the practical scenarios, as it does not require any specific expert knowledge and advanced equipment. In this work, we present a standalone replay Spoof Speech Detection (SSD) task to classify the natural vs. replayed speech. In the earlier studies, researchers mainly used vocal tract system-based (segmental) information for replay SSD. However, during replay mechanism, excitation source-based information also gets affected (in particular, degradation in pitch (F0) source harmonics at the higher frequency regions) due to recording environment and replay devices. Hence, in this thesis, we have explored the excitation source-based feature set along with system-based features for replay SSD task. In particular, we proposed the novel Linear Frequency Residual Cepstral Coefficients (LFRCC) for replay SSD task. The objective of using this novel feature set for replay SSD task is to explore possible complementary excitation source information present in the Linear Prediction (LP) residual-based features. In addition, we also proposed system-based features, namely, Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) features using Hilbert Transform (HT) demodulation technique. These HT-based Instantaneous Amplitude Cepstral Coefficients (IACC) and Instantaneous Frequency Cepstral Coefficients (IFCC) feature sets are able to capture the information present in a slowly-varying envelope and fast-varying changes in frequency. Experiments were performed on ASV Spoof 2017 Challenge database with Gaussian Mixture Model (GMM) and Convolutional Neural Network (CNN) classifiers. On the other hand, the score-level fusion of source-based features and system-based features significantly improved the performance. Furthermore, for a fixed feature set, when we have fused GMM and CNN classifier at a score-level a significant reduction in % Equal Error Rate (EER) is obtained. Furthermore, we have also analyze the effect of classifier-level fusion for replay SSD task.Item Open Access Acoustic analysis of musical pillars of vitthala temple, Hampi(Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Lakshmipriya, V. K.; Patil, Hemant A.This thesis is a systematic investigation on the acoustics of musical pillars of Vitthala temple at Hampi, India. The columns of different pillars produce sounds of different musical instruments (in particular, instruments used in Indian classical music) when struck by a finger. The earlier works in this area constituted a detailed dimensional and spectral analysis of these musical columns. The use of beam theories to model the columns and calculate the flexural frequencies was also suggested before. In this thesis, the performances of two major beam theories, viz., Euler-Bernoulli beam theory and Timoshenko beam theory, are compared based on the calculated flexural frequencies. The Euler-Bernoulli model gives better performance despite the fact that it is the simplest beam model and does not consider shear and rotation effects of beam bending. The concept of linear prediction (LP) from speech processing is applied onto the columns of musical pillars. The LP spectrum obtained matches closely with the spectrum of sound from the pillar. The concatenated tube model is also used to estimate the varying area function of the columns. Furthermore, an attempt for synthesis of sound from musical pillars is presented using the digital waveguide model. Eventhough the synthesized sound is audibly similar to the pillar’s sound, the model needs many improvements to match its spectral characteristics with that of the sound from the musical pillar.