Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
2 results
Search Results
Item Open Access Development of Countermeasures for Voice Liveness and Spoofed Speech Detection(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Chodingala, Piyushkumar Kiritbhai; Patil, Hemant A.An Automatic Speaker Verification (ASV) or voice biometric system performs machine based authentication of speakers using voice signals. ASV is a voice biometric system which has applications, such as banking transactions using mobile phones. Personal information, and banking details, demand more robust security of ASV systems. Furthermore, the Voice Assistants (VAs) are also known for the convenience of controlling most of the surrounding devices, such as user�s personal device, door locks, electric appliances, etc. However, these ASV and VA systems are also vulnerable to various spoofing attacks, such as details, twins, Voice Conversion (VC), Speech Synthesis (SS), and replay. In particular, the user�s voice command can be conveniently recorded and played back by the imposter (attacker) with negligible cost. Hence, the most harmful attack (replay attack) of morphing user�s voice command can be performed easily. Hence, this thesis aims to develop countermeasure to protect these ASV and VA systems from replay attacks. In addition, this thesis is also an attempt to develop Voice Liveness Detection (VLD) task as countermeasure for replay attack. In this thesis, the novel Cochlear Filter Cepstral Coefficients based Instanta neous Frequency using Quadrature Energy Separation Algorithm (CFCCIF-QESA) feature set is proposed for replay Spoofed Speech Detection (SSD) on ASV systems. Performance of the proposed feature set is evaluated using publicly avail- able datasets such as, ASVSpoof 2017 v2.0 and BTAS 2016. Furthermore, the significance of Delay and Sum (DAS) beamformer over state of the art Minimum Variance Distortionless Response (MVDR) for replay SSD on VAs. Finally, the wavelet based features are proposed for VLD task. The performance of proposed wavelet-based approaches are evaluated using recently released POp noise COr pus (POCO).Item Open Access Design of robust automatic speaker verification system in adverse conditions(2020) Rajpura, Divyesh G.; Patil, Hemant A.The Automatic Speaker Verification (ASV) aims to verify the identity of a person from his/her voice with the help of machines. It has become an essential component of many speech-related applications due to its use as a biometric authentication system. The traditional approaches to the ASV have achieved better performance in clean conditions, high-quality, and near-field speech. However, it remains challenging under adverse conditions, such as noisy environment, mismatch conditions, far-field, and short-duration speech. This thesis focuses on investigating the robustness of the traditional approaches for the ASV in adverse conditions. The far-field speech signal is degraded due to reverberation, the proximity of the microphones, and the quality of the recording devices, in contrast to the near-field speech. Therefore, to reduce the effects of noise in far-field speech, we investigate the Teager Energy Operator (TEO)-based feature sets, namely, Instantaneous Amplitude Cepstral Coefficients (IACC), and Instantaneous Frequency Cepstral Coefficients (IFCC) along with the conventional Mel Frequency Cepstral Coefficients (MFCC) feature set. In real-life applications of the ASV, the short-duration utterances are a common problem. Finding speaking patterns and extracting speaker-specific information from short utterances is difficult, due to limited phonetic variability. In this context, we analyze the robustness of various Statistical and Deep Neural Network (DNN)-based speaker representations, namely, i-vector, x-vector, and d-vector. In this thesis, another contribution is in the field of Voice Conversion (VC). The Singing Voice Conversion (SVC) in the presence of background music has a wide range of applications, such as dubbing of the songs, and singing voice synthesis. However, recent approaches to SVC do not pay much attention to the background music. To address this issue, we propose a new framework consisting of music separation followed by voice conversion. Due to the limited availability of speaker-specific data, we also perform an extensive analysis using different transfer learning and fine-tuning-based systems.