M Tech (EC) Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/6

Browse

Search Results

Now showing 1 - 10 of 17

Open Access
Phase Based Methods for Various Speech Applications
(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Pusuluri, Aditya; Patil, Hemant A.
Vocal communication plays a fundamental role in human interaction and expression.Right from the first cry to adult speech, the signal conveys information aboutthe well-being of the individual. Lack of coordination between the speech musclesand the brain leads to voice pathologies. Some pathologies related to infants areAsphyxia, Sudden Death Syndrome (SIDS), etc. The other voice pathologies thataffect the speech production systems are dysarthria, cerebral palsy, and parkinson�sdisease.Dysarthria, a neurological motor speech disorder, is characterized by impairedspeech intelligibility that can vary across severity-levels. This works focuses onexploring the importance of Modified Group Delay Cepstral Coefficients (MDGCC)-based features in capturing the distinctive acoustic characteristics associated withdysarthric severity-level classification, particularly for irregularities in speech.Convolutional Neural Network (CNN) and traditional Gaussian Mixture Model(GMM) are used as the classification models in this study. MGDCC is comparedwith state-of-the-art magnitude-based features, namely, Mel Frequency CepstralCoefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC). In addition,this work also analyzed the noise robustness of MGDCC. To that effect,experiments were performed on various noise types and SNR levels, where thephenomenal performance of MGDCC over other feature sets was reported. Further,this study also analyses the cross-database scenarios for dysarthric severitylevelclassification. Analysis of Voice onset Time (VOT) and experiments wereperformed using MGDCC to detect dysarthric speech against normal speech. Further,the performance of MGDCC was then compared with baseline features usingprecision, recall, and F-1 score and finally, the latency period was analysed forpractical deployment of the system.This work also explores the application of phase-based features on the emotionrecognition task and pop noise detection. As technological advancementsprogress, dependence on machines is inevitable. Therefore, to facilitate effectiveinteraction between humans and machines, it has become crucial to develop proficienttechniques for Speech Emotion Recognition (SER). The MGDCC featureset is compared against MFCC and LFCC features using a CNN classifier and theLeave One Speaker Out technique. Furthermore, due to the ability of MGDCCto capture the information in low-frequency regions and due to the fact that popnoise occurs at lower frequencies, the application of phase-based features on voiceliveness detection is performed. The results are obtained from a CNN classifierusing the 5-Fold cross-validation metric and are compared against MFCC andLFCC feature sets.This work proposed the time averaging-based features in order to understandthe amount of information being captured across the temporal axis as there wouldnot be many temporal variations in a cry signal. The research conducted in thisstudy utilizes a 10-fold stratified cross-validation approach with machine learningclassifiers, specifically Support Vector Machine (SVM), K-Nearest Neighbor(KNN), and Random Forest (RF). This work also showcased CQT-based Constant-Q Harmonic coefficient (CQHC) and Constant-Q Pitch coefficients (CQPC) for theclassification of infant cry into normal and pathology as an effective representationof the spectral and pitch components of a spectrum together is not achievedleaving scope for improvement. The results are compared by considering theMFCC, LFCC, and CQCC feature sets as the baseline features using machinelearning and deep learning classifiers, such as Convolutional Neural Networks(CNN), Gaussian Mixture Models (GMM), and Support Vector Machines (SVM)with 5-Fold cross-validation accuracy as the metric.
Open Access
A Spectrally Efficient MIMO System with Sparse Matrix Precoding
(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Yadav, Prabhanshu; Vasavada, Yash
This thesis proposes a novel technique of sparse matrix-based precoding at thetransmitter of a Multiple Input Multiple Output (MIMO) system. We proposedtwo sparse matrix precoded MIMO systems. Our first proposal improves thespectral efficiency beyond the existing spectral efficiency of Precoding-aided SpatialModulation (PSM-MIMO) system. Our second proposal increases spectralefficiency compared to an existing MIMO system.Both proposals use a two-stage precoding approach in which the conventionalzero-forcing (ZF) MIMO precoder, which inverts the matrix MIMO channel, iscombined with a sparse matrix precoding. With the conventional ZF precoder, thedegrees of freedom (DoF) available at the transmitter equals the number of antennasat the receiver. By adding another layer of precoding using a sparse matrix,we increase the DoF at the transmitter, thereby facilitating an increase in spectralefficiency. We demonstrate proof of the concept (PoC) by simulation-driven experiments.Our PoC is based on the ML (Maximum Likelihood) detection at thereceiver. ML detection has quite high complexity. We propose a belief propagationalgorithm at the receiver which is more practical to implement in a real-worldsystem. The belief propagation algorithm leverages the sparseness of the precodingmatrix and has low computational complexity.
Open Access
Self-Supervised Speech Representation for Speech Recognition
(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Chaturvedi, Shreya Sanjay; Patil, Hemant A.; Sailor, Hardik B.
Voice Assistants (VAs) are nowadays an integral part of human�s life. The low resource applications of VAs, such as regional languages, children speech, medical conversation, etc are the key challenges faced during development of these VAs. On a broader perspective, VAs consist of three parts, namely, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text to Speech (TTS) model. This thesis is focused on one part of them, i.e., ASR. In particular, opti- mization of low resource ASR is targeted with the application of children�s speech. Initially, a data augmentation technique was proposed to improve the performance of isolated hybrid DNN HMM ASR for children�s speech. Hence, we have used CycleGAN based augmentation technique, where children to children voice conversion is performed. Here, for conversion of characteristics, the speech signals were categorized into two classes based on the fundamental frequency threshold of speech. In this work, a detailed experimental analysis of various augmentation, such as SpecAugment, speed perturbation, and volume perturbation are done w.r.t. to ASR. Further, to optimize low resource ASR, the self supervised learning, i.e., wav2vec 2.0 have been explored. It is a semi supervised approach, where pretraining is performed with unlabelled data and then finetuned with labelled data. In addition, the fusion of Noisy Student Teacher (NST) learning is done with self supervised learning techniques. The key achievement of this work was efficient use of unlabelled data and even though the process involves iterative training, redundant training was negligible. The filtering of pseudo labelled data was done before utilizing it for finetuning. After Acoustic Model (AM) decoding, the Language Model (LM) was also used to optimize the performance. Additional work was also done in the direction of replay Spoofed Speech Detection (SSD). In this work, the significance of Delay and Sum (DAS) beamformer was investigated over State of the Art (SoTA) Minimum Variance Distortionless Response (MVDR) beamforming technique for replay SSD.
Open Access
Improved Multi-scale Retinex For Image Enhancement Using Guided Filter And Customized Sigmoid Function With Its Implementation On FPGA
(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Bhanwal, Paras; Agrawal, Yash; Khare, Manish
Image enhancement is a technique used in digital image processing to remove or overcome the effects of noise, low illumination, blurriness, or color loss in the digital image. These effects arise during the process of image acquisition. Various other factors such as environmental conditions and data loss during image transmission can also affect the image quality. The presence of the these effects degrade the overall image quality. In application such as medical imaging, defence, aerial surveillance, traffic monitoring and others, where digital images are used for crucial purposes, it becomes very important to enhance the image before it can be used for the required purpose. In low light environmental conditions when images are acquired by camera, poor contrast and color losses can be seen in several regions of the acquired image. To enhance the image under such conditions, researchers have proposed various techniques. Some techniques produce good contrast but lacks in color reproduction, while other produces good colors along with good contrast but intensify the noise present in the dark regions of the image. In order to mitigate the issue of noise amplification while providing good color and contrast, we have proposed a retinex based image enhancement technique that uses a customized sigmoid function and guided filter for the image enhancement. We have compared the proposed method with the existing image enhancement ethods on both qualitative and quantitative basis. For qualitative analysis we have tested the proposed method for multiple images, which are obtained under different environmental conditions and in different surroundings. For quantitative analysis we have used various image quality measures such as entropy, peak signal to noise ratio and others for comparison. The proposed technique provide good contrast in the areas affected by poor contrast and produce good colors in the same. The proposed method is capable of suppressing the enhancement of noise, hence showcasing its superiority with the compared techniques.
Open Access
Significance of Teager Energy Operator for Speech Applications
(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Therattil, Anand Saju; Patil, Hemant A.
Speech is used in various applications apart from voice communications, such as pathology detection, severity-level classification of dysarthria, and replay spoof speech detection for voice biometric and voice assistants. The first part of this thesis work deals with the development of the countermeasure (CM) system for replay Spoof Speech Detection (SSD). Replay attack on voice biometric, refers to the fraudulent attempt made by an imposter to spoof another person�s identity by replaying the pre-recorded voice samples in front of an Automatic Speaker Veri- fication (ASV) system or Voice Assistants (VAs). Lastly, the dysarthria, which is neuromotor speech disorder is studied and analysed using various speech processing and deep learning approaches. Dysarthria, Parkinson�s disease, Cerebral Palsy, etc. are types of atypical speech, which impairs neuromotor functions of the human body. Among these, dysarthria is one of the most common atypical speech. To analyse the dysarthic condition of the patient depends on the severity level, which is generally provided by Speech Language Pathologist (SLPs). However, to make the assessment immune to human biases and errors, this thesis is oriented towards developing the severity level classification system using signal processing and deep learning approaches for dysarthric speech. This presents analysis of dysarthic vs. normal speech using the Teager Energy Operator (TEO) based Teager Energy Cepstral Coefficients (TECC), and Squared Energy Operator (SEO) based Squared Energy Cepstral Co-efficients (SECC) as the frontend features. These features provided as input for deep learning and pattern recognition model predicts the severitylevel class for dysarthria. Lastly, the generalization of the countermeasure system for the replay attacks on the ASV systems and VAs is analysed using the TEO based TECC feature set. The generalization of the CM system is presented through the cross database evaluation between the Voice Spoofing Detection Corpus (VSDC), ASVspoof 2017 version 2.0 and ASVspoof 2019 PA datasets. Further, the analysis of One point Replay (1PR) and Two Point Replay (2PR) are presented in this thesis.
Open Access
Identification of Block-Sparse Systems using Adaptive Filtering Algorithms
(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Sonia; Das, Rajib Lochan
An adaptive filter is a system with a linear filter that has a transfer function controlled byvariable parameters and a means to adjust those parameters according to an optimizationalgorithm. Adaptive filters are used for linear time-variant systems where thecharacteristics of the systems keep on changing with time. Therefore, adaptive filters arerequired for some applications when some parameters of the desired processingoperation are not known in advance or are changing.In the world of adaptive algorithms, sparse system identification has received a lot ofinterest. In numerous applications, including acoustic echo cancellation, interferencereduction in industrial settings, and biomedical engineering, system identification isregularly encountered. During the last ten years, system identification has been widelyused in a variety of signal processing applications, including wireless communication,radar imaging, and echo cancellation.A sparse impulse response is one in which a significant portion of the energy orinformation is concentrated in a few number of its impulse response coefficients. Thereare few non-zero or high coefficients and numerous tap-weights with zero or tiny valuesin various cases, such as network echo cancellation, where the impulse responses aresparse. Sparse systems come in a variety of forms. The conventional one is referred to asa block-sparse system, like TV transmission channels. The non-zero coefficients ofblock-sparse systems consist of one or more clusters, and a cluster is a set of non-zero orbig coefficients, in contrast to generic impulse response sparse systems where largecoefficients are distributed at random. This thesis has taken into consideration various existing adaptive algorithms, viz, LMS, NLMS, PNLMS, ZA-NLMS, ZA-PNLMS, BS PNLMS, BS-IPNLMS to identify a block-sparse system with the help of mean squareerror and the convergence rate of the coefficients. It continues to give a proposedalgorithm with some modifcations to get a better convergence rate for the coefficients ofan unknown system which is assumed to be a block-sparse system for our research.
Open Access
Classification of Pathological Infant Cries and Dysarthric Severity-Level
(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Kachhi, Aastha Bidhenbhai; Patil, Hemant A.; Sailor, Hardik B.
Vocal communication is the most important part of any individual�s life to convey their needs. Right from the first cry of neonates to the matured adult speech, required proper brain co-ordination. Any kind of lack in coordination between brain and speech producing system leads to pathology. Asphyxia, asthma, Sudden Death Syndrome, Deaf (SIDS), etc. are some of teh infant cry pathologies and neuromotor speech disorders such as Dysarthria, Parkinson�s Disease, Cere- bal Palsy, etc. are some of the adult speech-related pathologies. These pathologies lead to damaged or paralysed articulatory movements in speech production and rendering unintelligible words. Infants as well as adults suffering from any of the pathologies face difficulties in conveying the emotions. The infant cry classification and analysis is a highly non invasive method for identifying the reason behind the crying. The present work in this thesis is directed towards analysing and classifying the normal vs. pathological cries using signal processing approaches. Various signal processing methods, such as Constant Q Transform (CQT), Heisenberg�s Uncertainty Principle (U-Vector) and Teager Energy Operator (TEO) are analysed in this thesis. Spectrographic analysis using ten different cry modes in a cry signal is also analysed in this work. In addition to this, an attempt has also been made to analyse various pathologies using the form invariance property of the CQT. In addition to the infant cry analysis, classification of normal vs. pathological cries using 10 fold cross validation on Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) have been adopted. In recent the years, dysarthria has also become one of the major speech technology issue for models, such as Automatic Speech Recognition systems. Dysarthric severity-level classification, has gained immense attention via researchers in the recent years. The dysarthric severity level classification aids in knowing the advancement of the disease, and it�s treatment. In this thesis, the dysarthric speech has been analysed using various signal processing operators, such as TEO, and Linear Energy Operator (LEO) for four different dysarthric severity level against normal speech. With increasing use of artificial intelligence, there has been a significant increase in the use of deep learn- ing methods for pattern classification task. To that effect, the severity level classifi- cation of dysarthric speech, deep learning techniques, such as Convolutional Neural Network (CNN), Light CNN (LCNN), and Residual Neural Network (ResNet) have been adopted. Finally, the performance of various signal processing-based feature has been measured using various performance evaluation methods, such as F1-Score, J-Statics, Matthew�s Correlation Coefficient (MCC), Jaccard�s Index, Hamming Loss, Linear Discriminant Analysis (LDA), and latency period for the better practical deployment of the system.
Open Access
Development of Countermeasures for Voice Liveness and Spoofed Speech Detection
(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Chodingala, Piyushkumar Kiritbhai; Patil, Hemant A.
An Automatic Speaker Verification (ASV) or voice biometric system performs machine based authentication of speakers using voice signals. ASV is a voice biometric system which has applications, such as banking transactions using mobile phones. Personal information, and banking details, demand more robust security of ASV systems. Furthermore, the Voice Assistants (VAs) are also known for the convenience of controlling most of the surrounding devices, such as user�s personal device, door locks, electric appliances, etc. However, these ASV and VA systems are also vulnerable to various spoofing attacks, such as details, twins, Voice Conversion (VC), Speech Synthesis (SS), and replay. In particular, the user�s voice command can be conveniently recorded and played back by the imposter (attacker) with negligible cost. Hence, the most harmful attack (replay attack) of morphing user�s voice command can be performed easily. Hence, this thesis aims to develop countermeasure to protect these ASV and VA systems from replay attacks. In addition, this thesis is also an attempt to develop Voice Liveness Detection (VLD) task as countermeasure for replay attack. In this thesis, the novel Cochlear Filter Cepstral Coefficients based Instanta neous Frequency using Quadrature Energy Separation Algorithm (CFCCIF-QESA) feature set is proposed for replay Spoofed Speech Detection (SSD) on ASV systems. Performance of the proposed feature set is evaluated using publicly avail- able datasets such as, ASVSpoof 2017 v2.0 and BTAS 2016. Furthermore, the significance of Delay and Sum (DAS) beamformer over state of the art Minimum Variance Distortionless Response (MVDR) for replay SSD on VAs. Finally, the wavelet based features are proposed for VLD task. The performance of proposed wavelet-based approaches are evaluated using recently released POp noise COr pus (POCO).
Open Access
Implementation of ALU using RTL to GDSII flow and on NEXYS 4 DDR FPGA board
(2021) Kachhadiya, Radhika J.; Parekh, Rutu; Agrawal, Yash
An ALU is the major part of the CPU which performs various arithmetic and logical operations. It is one of the most frequently used modules in the processor. This paper presents the implementation of 8-bit ALU using RTL to GDSII stream. The tools used for implementation are Cadence tools, Genus and Innovus. The technology node used for implementation is the 45nm technology node and 180nm technology node. The major focus of this thesis is the design optimization in terms of area, delay and power as the industry demands the chips with high speed and low power. Further, the results of both 45nm and 180nm has been compared. The improvement by using 45nm technology in area is 89.59%, in delay is 43.23% and in power is 4.56%. In addition to that, the implementation of 4-bit ALU is done on the FPGA board. The board used is the NEXYS 4 DDR FPGA board.
Open Access
Analysis of security vulnerabilities in 5G standalone network
(2021) Haque, Meemoh; Mekala, Priyanka; Goel, Supriya
The first release of the 5G protocol specifications,3rd Generation Partnership Project (3GPP) Release 15 was released in 2017 and the first 5G protocol security specifications was published in 2018. There are various development made in the area of 5G which aims to connect various aspects of human life, no matter Whether it is rural area or urban,5G aims to provide a higher network speed, low latency and ubiquitous connectivity. 5G illustrates the convergence of various use cases of wireless communication and computer networking , which includes the components such as Software Defined Networks (SDN),Network Functions Virtualization (NFV) and the edge cloud.Due to the convergence of both technology it leads to various security challenges in SDN/NFV when connected with the 5G network.In future 5G will play a very crucial role in our life,thus network must ensure that all its components and the services which it is providing to the users must be secure. The threat landscape in 5G is huge as with 5G a large number of devices will be connected with the network. The Manuscript discusses about various vulnerability and security threats that exists in 5G networks. A complete end to end 5G standalone test bed is used for analyzing the security threats. Device capabilities ie.core and radio capabilities and pre-authentication signalling message are not security protected in 5G. Security features as compared to legacy network has been improved like the encryption of International Mobile Subscriber Identity (IMSI) but still there are known vulner ability that existed in LTE still exist in 5G which need to be investigated before deploying 5G worldwide.

M Tech (EC) Dissertations

Browse

Filters

Settings

Sort By

Results per page

Search Results