M Tech (EC) Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/6

Browse

Search Results

Now showing 1 - 2 of 2
  • ItemOpen Access
    Phase Based Methods for Various Speech Applications
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Pusuluri, Aditya; Patil, Hemant A.
    Vocal communication plays a fundamental role in human interaction and expression.Right from the first cry to adult speech, the signal conveys information aboutthe well-being of the individual. Lack of coordination between the speech musclesand the brain leads to voice pathologies. Some pathologies related to infants areAsphyxia, Sudden Death Syndrome (SIDS), etc. The other voice pathologies thataffect the speech production systems are dysarthria, cerebral palsy, and parkinson�sdisease.Dysarthria, a neurological motor speech disorder, is characterized by impairedspeech intelligibility that can vary across severity-levels. This works focuses onexploring the importance of Modified Group Delay Cepstral Coefficients (MDGCC)-based features in capturing the distinctive acoustic characteristics associated withdysarthric severity-level classification, particularly for irregularities in speech.Convolutional Neural Network (CNN) and traditional Gaussian Mixture Model(GMM) are used as the classification models in this study. MGDCC is comparedwith state-of-the-art magnitude-based features, namely, Mel Frequency CepstralCoefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC). In addition,this work also analyzed the noise robustness of MGDCC. To that effect,experiments were performed on various noise types and SNR levels, where thephenomenal performance of MGDCC over other feature sets was reported. Further,this study also analyses the cross-database scenarios for dysarthric severitylevelclassification. Analysis of Voice onset Time (VOT) and experiments wereperformed using MGDCC to detect dysarthric speech against normal speech. Further,the performance of MGDCC was then compared with baseline features usingprecision, recall, and F-1 score and finally, the latency period was analysed forpractical deployment of the system.This work also explores the application of phase-based features on the emotionrecognition task and pop noise detection. As technological advancementsprogress, dependence on machines is inevitable. Therefore, to facilitate effectiveinteraction between humans and machines, it has become crucial to develop proficienttechniques for Speech Emotion Recognition (SER). The MGDCC featureset is compared against MFCC and LFCC features using a CNN classifier and theLeave One Speaker Out technique. Furthermore, due to the ability of MGDCCto capture the information in low-frequency regions and due to the fact that popnoise occurs at lower frequencies, the application of phase-based features on voiceliveness detection is performed. The results are obtained from a CNN classifierusing the 5-Fold cross-validation metric and are compared against MFCC andLFCC feature sets.This work proposed the time averaging-based features in order to understandthe amount of information being captured across the temporal axis as there wouldnot be many temporal variations in a cry signal. The research conducted in thisstudy utilizes a 10-fold stratified cross-validation approach with machine learningclassifiers, specifically Support Vector Machine (SVM), K-Nearest Neighbor(KNN), and Random Forest (RF). This work also showcased CQT-based Constant-Q Harmonic coefficient (CQHC) and Constant-Q Pitch coefficients (CQPC) for theclassification of infant cry into normal and pathology as an effective representationof the spectral and pitch components of a spectrum together is not achievedleaving scope for improvement. The results are compared by considering theMFCC, LFCC, and CQCC feature sets as the baseline features using machinelearning and deep learning classifiers, such as Convolutional Neural Networks(CNN), Gaussian Mixture Models (GMM), and Support Vector Machines (SVM)with 5-Fold cross-validation accuracy as the metric.
  • ItemOpen Access
    Classification of Pathological Infant Cries and Dysarthric Severity-Level
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Kachhi, Aastha Bidhenbhai; Patil, Hemant A.; Sailor, Hardik B.
    Vocal communication is the most important part of any individual�s life to convey their needs. Right from the first cry of neonates to the matured adult speech, required proper brain co-ordination. Any kind of lack in coordination between brain and speech producing system leads to pathology. Asphyxia, asthma, Sudden Death Syndrome, Deaf (SIDS), etc. are some of teh infant cry pathologies and neuromotor speech disorders such as Dysarthria, Parkinson�s Disease, Cere- bal Palsy, etc. are some of the adult speech-related pathologies. These pathologies lead to damaged or paralysed articulatory movements in speech production and rendering unintelligible words. Infants as well as adults suffering from any of the pathologies face difficulties in conveying the emotions. The infant cry classification and analysis is a highly non invasive method for identifying the reason behind the crying. The present work in this thesis is directed towards analysing and classifying the normal vs. pathological cries using signal processing approaches. Various signal processing methods, such as Constant Q Transform (CQT), Heisenberg�s Uncertainty Principle (U-Vector) and Teager Energy Operator (TEO) are analysed in this thesis. Spectrographic analysis using ten different cry modes in a cry signal is also analysed in this work. In addition to this, an attempt has also been made to analyse various pathologies using the form invariance property of the CQT. In addition to the infant cry analysis, classification of normal vs. pathological cries using 10 fold cross validation on Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) have been adopted. In recent the years, dysarthria has also become one of the major speech technology issue for models, such as Automatic Speech Recognition systems. Dysarthric severity-level classification, has gained immense attention via researchers in the recent years. The dysarthric severity level classification aids in knowing the advancement of the disease, and it�s treatment. In this thesis, the dysarthric speech has been analysed using various signal processing operators, such as TEO, and Linear Energy Operator (LEO) for four different dysarthric severity level against normal speech. With increasing use of artificial intelligence, there has been a significant increase in the use of deep learn- ing methods for pattern classification task. To that effect, the severity level classifi- cation of dysarthric speech, deep learning techniques, such as Convolutional Neural Network (CNN), Light CNN (LCNN), and Residual Neural Network (ResNet) have been adopted. Finally, the performance of various signal processing-based feature has been measured using various performance evaluation methods, such as F1-Score, J-Statics, Matthew�s Correlation Coefficient (MCC), Jaccard�s Index, Hamming Loss, Linear Discriminant Analysis (LDA), and latency period for the better practical deployment of the system.