Publication:
Morse wavelet transform-based features for voice liveness detection

dc.contributor.affiliationDA-IICT, Gandhinagar
dc.contributor.authorGupta, Priyanka
dc.contributor.authorPatil, Hemant
dc.contributor.researcherGupta, Priyanka (201721001)
dc.date.accessioned2025-08-01T13:09:02Z
dc.date.issued01-03-2024
dc.description.abstractThe need for Voice Liveness Detection (VLD) has emerged particularly for the security of Automatic Speaker Verification (ASV) systems. Existing Spoofed Speech Detection (SSD) systems rely on attack-specific approaches to detect spoofed speech. However, to safeguard ASV systems against�all�the kinds of spoofing attacks (known as well as unknown attacks), determining whether a speech is uttered live (genuine) or not, is important. To that effect, in this work, we propose the detection of pop noise using Morse wavelet for VLD task. Pop noise is a discriminative acoustic cue that is present in live speech and is absent/diminished in spoofed speech. It is captured by the microphone in the form of sudden bursts of air from a live speaker�s�mouth�due to the�close proximity�of the speaker with the microphone. To validate this hypothesis, we present an analysis of pop noise energy w.r.t. distance and found that it decreases exponentially with distance. Furthermore, pop noise is said to be present in very low frequency regions. To capture the pop noise effectively, we propose to exploit the excellent frequency resolution of Continuous�Wavelet Transform�(CWT) using Generalized Morse Wavelets (GMWs). GMWs are a superfamily of analytic wavelets. To that effect, in this work, we have analysed the suitability of GMWs for pop noise detection for VLD task using the POp noise COrpus (POCO). The wavelet parameters are fine-tuned according to the VLD task. Furthermore, the performance of VLD system is evaluated for various subband frequencies, and it is observed that the subband of 1 to��gives the best performance accuracy of 90.55% and 88.43% on the Dev and Eval sets, respectively. In addition, phoneme-based analysis shows the dependence of the performance of the VLD system on the type of phonemes in the utterances. It is shown that phonemes, such as plosives and fricatives show distinct pop noise as compared to other phonemes. Furthermore, the extension of the POCO dataset is used for experiments where simulated�reverberation�is added to spoofed signals, assuming the attacker (or the recording device) is positioned at various distances. This leads to the studying the effect of speaker-attacker distance. Similar to the previous results, it is observed that for the reverberated case too, the�optimal frequency�subband for VLD task is 1 to�, across all the distances. Furthermore, the proposed feature set is evaluated using three classifiers, namely, Convolutional�Neural Network�(CNN), Light CNN (LCNN), and Residual�Neural Network�(ResNet), for POCO dataset as well as reverberated POCO dataset. It is observed that CNN gives the highest accuracy of 88.43% on Eval set of the POCO dataset. Furthermore, the proposed features are also evaluated under the assumptions of two ideal scenarios � when the ASV system is strictly under attack, and when it is strictly not under attack. It is observed that the proposed Morse wavelet-based VLD system rejected 89% of the spoofed utterances, and accepted 88.30% of the genuine utterances.
dc.format.extentJan-27
dc.identifier.citationPriyanka Gupta, and Patil, Hemant A, "Morse wavelet transform-based features for voice liveness detection," Computer Speech & Language, Elsevier, ISSN: 1095-8363, vol. 84, Mar. 2024, Article no.102952, doi: 10.1016/j.csl.2023.101571. [Published : 4 Oct. 2023]
dc.identifier.doi10.1016/j.csl.2023.101571
dc.identifier.issn1095-8363
dc.identifier.scopus2-s2.0-85173140430
dc.identifier.urihttps://ir.daiict.ac.in/handle/dau.ir/1568
dc.identifier.wosWOS:001160048300001
dc.language.isoen
dc.publisherElsevier
dc.relation.ispartofseriesVol. 84; No.
dc.sourceComputer Speech & Language
dc.source.urihttps://www.sciencedirect.com/science/article/pii/S0885230823000906
dc.titleMorse wavelet transform-based features for voice liveness detection
dspace.entity.typePublication
relation.isAuthorOfPublicationfdb7041b-280e-498b-b2ee-34f9bc351f4c
relation.isAuthorOfPublication.latestForDiscoveryfdb7041b-280e-498b-b2ee-34f9bc351f4c

Files

Collections