Theses and Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1

Browse

Search Results

Now showing 1 - 3 of 3
  • ItemOpen Access
    Significance of Teager Energy Operator for Speech Applications
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Therattil, Anand Saju; Patil, Hemant A.
    Speech is used in various applications apart from voice communications, such as pathology detection, severity-level classification of dysarthria, and replay spoof speech detection for voice biometric and voice assistants. The first part of this thesis work deals with the development of the countermeasure (CM) system for replay Spoof Speech Detection (SSD). Replay attack on voice biometric, refers to the fraudulent attempt made by an imposter to spoof another person�s identity by replaying the pre-recorded voice samples in front of an Automatic Speaker Veri- fication (ASV) system or Voice Assistants (VAs). Lastly, the dysarthria, which is neuromotor speech disorder is studied and analysed using various speech processing and deep learning approaches. Dysarthria, Parkinson�s disease, Cerebral Palsy, etc. are types of atypical speech, which impairs neuromotor functions of the human body. Among these, dysarthria is one of the most common atypical speech. To analyse the dysarthic condition of the patient depends on the severity level, which is generally provided by Speech Language Pathologist (SLPs). However, to make the assessment immune to human biases and errors, this thesis is oriented towards developing the severity level classification system using signal processing and deep learning approaches for dysarthric speech. This presents analysis of dysarthic vs. normal speech using the Teager Energy Operator (TEO) based Teager Energy Cepstral Coefficients (TECC), and Squared Energy Operator (SEO) based Squared Energy Cepstral Co-efficients (SECC) as the frontend features. These features provided as input for deep learning and pattern recognition model predicts the severitylevel class for dysarthria. Lastly, the generalization of the countermeasure system for the replay attacks on the ASV systems and VAs is analysed using the TEO based TECC feature set. The generalization of the CM system is presented through the cross database evaluation between the Voice Spoofing Detection Corpus (VSDC), ASVspoof 2017 version 2.0 and ASVspoof 2019 PA datasets. Further, the analysis of One point Replay (1PR) and Two Point Replay (2PR) are presented in this thesis.
  • ItemOpen Access
    Deep Learning for Severity Level-based Classification of Dysarthria
    (2021) Gupta, Siddhant; Patil, Hemant A.
    Dysarthria is a motor speech disorder in which muscles required to speak somehow gets damaged or paralyzed resulting in an adverse effect to the articulatory elements in the speech and rendering the output voice unintelligible. Dysarthria is considered to be one of the most common form of speech disorders. Dysarthria occurs as a result of several neurological and neuro-degenerative diseases, such as Parkinson’s Disease, Cerebral palsy, etc. People suffering from dysarthria face difficulties in conveying vocal messages and emotions, which in many cases transform into depression and social isolation amongst the individuals. Dysarthria has become a major speech technology issue as the systems that work efficiently for normal speech, such as Automatic Speech Recognition systems, do not provide satisfactory results for corresponding dysarthric speech. In addition, people suffering from dysarthria are generally limited by their motor functions. Therefore, development of voice assisted systems for them become all the more crucial. Furthermore, analysis and classification of dysarthric speech can be useful in tracking the progression of disease and its treatment in a patient. In this thesis, dysarthria has been studied as a speech technology problem to classify dysarthric speech into four severity-levels. Since, people with dysarthria face problem during long speech utterances, short duration speech segments (maximum 1s) have been used for the task, to explore the practical applicability of the thesis work. In addition, analysis of dysarthric speech has been done using different methods such as time-domain waveforms, Linear prediction profile, Teager Energy Operator profile, Short-Time Fourier Transform etc., to distinguish the best representative feature for the classification task. With the rise in Artificial Intelligence, deep learning techniques have been gaining significant popularity in the machine classification and pattern recognition tasks. Therefore, to keep the thesis work relevant, several machine learning and deep learning techniques, such as Gaussian Mixture Models (GMM), Convolutional Neural Network (CCN), Light Convolutional Neural Network (LCNN), and Residual Neural Network (ResNet) have been adopted. The severity levelbased classification task has been evaluated on various popular measures such as, classification accuracy and F1-scores. In addition, for comparison with the short duration speech, classification has also been done on long duration speech (more than 1 sec) data. Furthermore, to enhance the relevance of the work, experiments have been performed on statically meaningful and widely used Universal Access-Speech Corpus.
  • ItemOpen Access
    Deep learning techniques for speech pathology applications
    (2020) Purohit, Mirali Virendrabhai; Patil, Hemant A.
    Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.