M Tech Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3
Browse
8 results
Search Results
Item Open Access Unsupervised speaker-invariant feature representations for QbE-STD(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) R., Sreeraj; Patil, Hemant A.Query-by-Example Spoken Term Detection (QbE-STD) is the task of retrieving audio documents relevant to the user query in spoken form, from a huge collection of audio data. The idea in QbE-STD is to match the audio documents with the user query, directly at acoustic-level. Hence, the macro-level speech information, such as language, context, vocabulary, etc., cannot create much impact. This gives QbE-STD advantage over Automatic Speech Recognition (ASR) systems. ASR system faces major challenges in audio databases that contain multilingual audio documents, Out-of-Vocabulary words, less transcribed or labeled audio data, etc. QbE-STD systems have three main subsystems. They are feature extraction, feature representation, and matching subsystems. As a part of this thesis work, we are focused on improving the feature representation subsystems of QbE-STD. Speech signal needs to be reformed to a speaker-invariant representation, in order to be used in speech recognition tasks, such as QbE-STD. Speech-related information in an audio signal is primarily hidden in the sequence of phones that are present in the audio. Hence, to make the features more related to speech, we have to analyze the phonetic information in the speech. In this context, we propose two representations in this thesis, namely, Sorted Gaussian Mixture Model (SGMM) posteriorgrams and Synthetically Minority Oversampling TEchniquebased (SMOTEd) GMM posteriorgrams. Sorted GMM tries to represent phonetic information using a set of components in GMM, while SMOTEd GMM tries to improve the balance of various phone classes by providing the uniform number of features for all the phones. Another approach to improve speaker-invariability of audio signal is to reduce the variations caused by speaker-related factors in speech. We have focused on the spectral variations that exist between the speakers due to the difference in the length of the vocal tract, as one such factor. To reduce the impact of this variation in feature representation, we propose to use two models, that represent each gender, characterized by different spectral scaling, based on Vocal Tract Length Normalization (VTLN) approach. Recent technologies in QbE-STD use neural networks and faster computavii tional algorithms. Neural networks are majorly used in the feature representation subsystems of QbE-STD. Hence, we also tried to build a simple Deep Neural Network (DNN) framework for the task of QbE-STD. DNN, thus designed is referred to unsupervised DNN (uDNN). This thesis is a study of different approaches that could improve the performance of QbE-STD. We have built the state-of-the-art model and analyzed the performance of the QbE-STD system. Based on the analysis, we proposed algorithms that can impact on the performance of the system. We also studied further the limitations and drawbacks of the proposed algorithms. Finally, this thesis concludes by presenting some potential research directions.Item Open Access FPGA implementation of environment/noise classification using neural networks(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Ambasana, Nikita B.; Zaveri, Mazad SThe purpose of this thesis is to give an insight into the implementation of a system of neural networks, for the tasks of Noise/Environment Modeling, Feature Extraction and Classification of Noise/Environment, on a Field Programmable Gate Array (FPGA). A methodology for creating baseline architecture for a new system of neural networks has been followed, to give worst case estimates. After necessary analysis an estimate of hardware utilization, within a specific FPGA (XC3S250E Spartan 3E Device) and the Time for Computation, for each of the machines used, is given. It also summarizes the Performance-Price Ratio in terms of Time of Computation and Hardware for Logic simplementation, for different degrees of parallelism in the system.Item Open Access Speaker recognition over VoIP network(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Goswami, Parth A.; Patil, Hemant A.This thesis deals with the Automatic Speaker Recognition (ASR) system over narrowband Voice over Internet Protocol (VoIP) networks. There are several artifacts of VoIP network such as speech codec, packet loss and packet re-ordering, network jitter & echo. In this thesis, packet loss is considered as the research issue in order to investigate performance degradation for an ASR system, due to packet loss. As the voice packets travel over Internet Protocol (IP) network, they tend to take different routes. Some of them are dropped by the channel due to congestion and some are rejected by the receiver. This packet loss reduces the perceptual quality of speech. Therefore, it is natural to expect that packet loss may affects the performance of an ASR system. To alleviate this degradation in ASR system performance due to packet loss, novel interleaving schemes and lossy training method are proposed. It is shown in the present work that these interleaving schemes and lossy training methods significantly help in improving the performance of an ASR system.Item Open Access Particle swarm optimization based synthesis of analog circuits using neural network performance macromodels(Dhirubhai Ambani Institute of Information and Communication Technology, 2009) Saxena, Neha; Mandal, Sushanta KumarThis thesis presents an efficient an fast synthesis procedure for an analog circuit. The proposed synthesis procedure used artificial neural network (ANN) models in combination with particle swarm optimizer. ANN has been used to develop macro-models for SPICE simulated data of analog circuit which takes transistor sizes as input and produced circuit specification as output in negligible time. The particle swarm optimizer explore the specfied design space and generates transistor sizes as potential solutions. Several synthesis results are presented which show good accuracy with respect to SPICE simulations. Since the proposed procedure does not require an SPICE simulation in the synthesis loop, it substantially reduces the design time in circuit design optimization.Item Open Access FPGA implementation of direct sequence spread spectrum techniques(Dhirubhai Ambani Institute of Information and Communication Technology, 2008) Choudhary, Vivek Kumar; Dubey, RahulThis work presents the performance, noise analysis and FPGA implementation of Direct Sequence Spread Spectrum technique. Performance of signal increases as increasing parity bits in Hamming code algorithm. Increasing parity noise goes reduce therefore received signal close to its original value, but adding parity band-width requirement also increases. This work is bases on the IS-95 standard for CDMA (Code Division Multiple Access) Digital Cellular.Item Open Access Study of bayesian learning of system characteristics(Dhirubhai Ambani Institute of Information and Communication Technology, 2008) Sharma, Abhishek; Jotwani, Naresh D.This thesis report basically deals with the scheduling algorithms implemented in our computer systems and about the creation of probabilistic network which predicts the behavior of system. The aim of this thesis is to provide a better and optimized results for any system where scheduling can be done. The material presented in this report will provide an overview of the field and pave the way to studying subsequent topics which gives the detailed theories on Bayesian networks, learning the Bayesian networks and the concepts related to the process scheduling. Bayesian network is graphical model for probabilistic relationships among a set of random variables (either discrete or continuous). These models having several advantages over data analysis. The goal of learning is to find the Bayesian network that best represents the joint probability distribution. One approach is to find the network that maximizes the likelihood of the data or (more conveniently) its logarithm. We describe the methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with complete data also. We relate Bayesian network methods for learning, to learn from data samples generated from the operating system scheduling environment. The various results produced, tested and verified for scheduling algorithms (FCFS, SJF, RR and PW) by an Operating System Scheduling Simulator implemented in programming language JAVA. Here, the given code is modified according to requirements and fulfilling the necessary task.Item Open Access Music genre classification using principal component analysis and auto associative neural network(Dhirubhai Ambani Institute of Information and Communication Technology, 2006) Ballaney, Abhishek V.; Mitra, Suman K.; Maitra, AnutoshThe aim of music genre classification is to classify music pieces according to their style. Principal Component Analysis (PCA) is applied on raw music signals to capture the major components for each genre. As a large number of principal components are obtained for different cases, the purpose of applying PCA is not satisfied. This led to feature vector extraction from the music signal and building a model to capture the feature vector distribution of a music genre. Timbre modelling is done using Mel Frequency Cepstral Coefficients (MFCCs). The modelling of decision logic is based on Auto Associative Neural Network (AANN) models, which are feed-forward neural networks that perform identity mapping on the input space. The property of a five layer AANN model to capture the feature vector distribution is used to build a music genre classification system. This system is developed using a music database of 1000 songs spanning equally over 10 genres.Item Open Access Design of architecture of artificial neural network : design and construction of a model for creation of an architecture of artificial neural network based on distributed genetic algorithms(Dhirubhai Ambani Institute of Information and Communication Technology, 2004) Rahi, Sajid S.; Chaudhary, SanjayThe objective of the work is to design and construct a model for creation of architecture of feed forward artificial neural network. The distributed genetic algorithms are used to design and construct the system. This thesis describes various encoding schemes suggested by researchers for the evolution of architecture of artificial neural network using genetic algorithm. This research proposes new encoding scheme called object� based encoding for the evolution of architecture and also proposes data structures, genetic operators and repair algorithms for the system development. For evolution of weights during training, genetic algorithm is used. For evolution of weights, two dimensional variable length encoding scheme is proposed. For the same, two-point layer crossover and average crossover are proposed. The experiments are carried out on the developed system for the problems like 3-bit even parity. Which combination of genetic operators are more efficient for better design of artificial neural network architecture, is concluded by the experiments.