Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
2 results
Search Results
Item Open Access Deep learning techniques for speech pathology applications(2020) Purohit, Mirali Virendrabhai; Patil, Hemant A.Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.Item Open Access Defending machine learning models against adversarial attacks using GANs(Dhirubhai Ambani Institute of Information and Communication Technology, 2019) Malaviya, Shubham M.; Vasavada, YashWe have used Generative Adversarial Network (GAN) to defend against adversarial attacks. Pixel-wise and perceptual distance measures for images are used in the GAN training. We have used five different distance measures, Reconstruction error, Structural SIMilarity (SSIM), Multi-Scale SSIM, Peak signal-to-noise ratio (PSNR), and Frechet Inception Distance (FID), in the GAN training. Although accuracies achieved against adversarial attacks with the proposed idea is not at par with the state of the art pproaches such as [38], the generator trained using FID is able to generate good quality images in lesser number of iterations. Using onlym a perceptual distance measure in the cost function does not guarantee the convergence of GAN training.