Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
2 results
Search Results
Item Open Access Handcrafted Feature Design for Voice Liveness Detection and Countermeasures for Spoof Attacks(2021) Khoria, Kuldeep; Patil, Hemant A.Automatic Speaker Verification (ASV) systems are highly vulnerable to the spoofing attacks. Spoof attacks are the attacks when an imposter tries to manipulate the biometric system and to get the access of the system by some unfair practice. ASV systems are vulnerable to several kinds of spoofing attacks, namely, Speech Synthesis (SS), Voice Conversion (VC), Impersonation, Twins, and Replay. Replay attack on voice biometric can be constructed by surreptitiously recording the genuine speech signal and then presenting it as if it were authentic to the ASV system. Among all the spoofing attack, replay attack is the most simple to execute (or mount) but hard to detect. In particular, replay attack on ASV system done using a high quality recording and playback device is very hard to detect as it is very similar to the genuine speaker. Given this vulnerabilities of replayed spoofing attacks on ASV, this thesis aims at voice liveness detection (VLD) task to verify whether the speaker is live in front of ASV system or speaker’s voice is replayed. In addition to that this thesis is also an attempt to develop effective countermeasures to protect these systems from spoof attacks, namely, Speech Synthesis (SS) and Voice Conversion (VC). In this thesis, two novel feature sets are developed for voice liveness detection (VLD) task as countermeasures for replay attack, namely, Constant-Q Transform (CQT) and Spectral Root Cepstral Coefficients (SRCC). Performance of the proposed feature sets is evaluted using recently released POp noise COrpus (POCO). Short-Time Fourier Transform (STFT)-based feature set is considered as baseline feature to compare results. Further a noval feature set, namely, Cochlear Filter Cepstral Coefficient- Instantaneous Frequency feature set using Energy Separation Algorithm (CFCCIF-ESA), is proposed for detection of SS and VC based spoofing attacks. The experiments to evaluate the performance of CFCCIF-ESA feature set is performed on ASVSpoof 2015 dataset. The results obtained are further compared with the baseline Constant Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), and state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) feature sets.Item Open Access Design of robust automatic speaker verification system in adverse conditions(2020) Rajpura, Divyesh G.; Patil, Hemant A.The Automatic Speaker Verification (ASV) aims to verify the identity of a person from his/her voice with the help of machines. It has become an essential component of many speech-related applications due to its use as a biometric authentication system. The traditional approaches to the ASV have achieved better performance in clean conditions, high-quality, and near-field speech. However, it remains challenging under adverse conditions, such as noisy environment, mismatch conditions, far-field, and short-duration speech. This thesis focuses on investigating the robustness of the traditional approaches for the ASV in adverse conditions. The far-field speech signal is degraded due to reverberation, the proximity of the microphones, and the quality of the recording devices, in contrast to the near-field speech. Therefore, to reduce the effects of noise in far-field speech, we investigate the Teager Energy Operator (TEO)-based feature sets, namely, Instantaneous Amplitude Cepstral Coefficients (IACC), and Instantaneous Frequency Cepstral Coefficients (IFCC) along with the conventional Mel Frequency Cepstral Coefficients (MFCC) feature set. In real-life applications of the ASV, the short-duration utterances are a common problem. Finding speaking patterns and extracting speaker-specific information from short utterances is difficult, due to limited phonetic variability. In this context, we analyze the robustness of various Statistical and Deep Neural Network (DNN)-based speaker representations, namely, i-vector, x-vector, and d-vector. In this thesis, another contribution is in the field of Voice Conversion (VC). The Singing Voice Conversion (SVC) in the presence of background music has a wide range of applications, such as dubbing of the songs, and singing voice synthesis. However, recent approaches to SVC do not pay much attention to the background music. To address this issue, we propose a new framework consisting of music separation followed by voice conversion. Due to the limited availability of speaker-specific data, we also perform an extensive analysis using different transfer learning and fine-tuning-based systems.