M Tech Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 10 of 10
  • ItemOpen Access
    Translation of Hindi in Roman Script into English: Use of Transformer
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Modi, Parth; Joshi, Manjunath V.
    Translation from one language to another is a complex problem in machine learningand one in which the machine still cannot achieve satisfactory results. Therecent focus for solving this challenge has been on neural machine translation(NMT) techniques using architectures such as recurrent neural network (RNN)and long short-term memory (LSTM). Even though they give slightly better resultsthan the previously available conventional techniques, the transformer canoutperform these NMT techniques. To the best of our knowledge work is yet tobe carried out in translating Hindi language sentences written in Roman (English)letters into English. In this report, we discuss how the architecture of transformerthat uses attention mechanism is used to translate Hindi language sentences writtenin Roman letters into English sentences. Since there was no dataset availabletill now, our work also involves creating a dataset for training and testing. Ourresults are compared with other approaches using BLEU score as a measure.
  • ItemOpen Access
    Feature Selection Methods in Twin Support Vector Machines
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Dodiya, Ruchita; Anand, Pritam
    For the development of a machine learning model both parameter tuning and feature selection are necessary . The model�s hyper parameters need to be tuned toachieve the best values because they have a significant impact on how well themodel works and the objective of feature selection is to identify the most important subset of features that contribute to reliable predictions and model understanding.The primary goal of this study is to examine the effectiveness of feature selectiontechniques when used Twin Support Vector Machines (TWSVM) and traditionalSupport Vector Machines (SVM). We want to determine that the feature selectiontechnique results is the best performance increase for TWSVM and SVM by conducting extensive experiments on multiple datasets. The results of this study willgive important information about how feature selection will improve the classification accuracy and effectiveness.The methodology used in this study involves applying different kinds of parameter tuning and feature selection techniques for Support Vector Machines (SVM)and Twin Support Vector Machines (TWSVM) using linear and RBF kernels. Weused a hybrid approach to parameter tuning and feature selection. Optimized thehyper parameters using the Grid Search and Simulated Annealing (SA) methods.Then, with SA-based parameter tuning, we combined the Binary GravitationalSearch Algorithm (BGSA) and Teaching-Learning-Based Optimization (TLBO) forfeature selection.We use these techniques to enhance the performance of SVM and TWSVM modelsby tuning their parameters and selecting useful features. Our results show thatfeature selection methods are more effective at selecting relevant features whileusing less computation time in TWSVM compare to SVM.
  • ItemOpen Access
    Automatic Text Translation of Multilingual Sentences using Transformer
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Hari Charan, Edara Veera Venkata; Joshi, Manjunath V.; Hati, Avik
    Machine translation from one language to another is a complex problem in machine learning and one in which the machine still cannot achieve satisfactory results. The recent focus for solving this challenge has been on neural machine translation (NMT) techniques, by using architectures such as recurrent neural network (RNN) and long term short memory (LSTM). But the architecture of transformer is able to outperform these NMT techniques. The architecture of the transformer has been successfully utilized to build models that target a single language pair translation or translation among multiple languages. But it currently lacks research in the area of translation of multilingual sentences, where each sentence is in the form of a mixture of languages. In this work we will establish a model based on the transformer architecture that can translate multilingual sentences into a single language, with the help of a multilingual neural machine translation (MNMT) model and custom made datasets.
  • ItemOpen Access
    Deep learning techniques for speech pathology applications
    (2020) Purohit, Mirali Virendrabhai; Patil, Hemant A.
    Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.
  • ItemOpen Access
    Learning cross domain relations using deep learning
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kotecha, Dhara; Joshi, Manjunath V.
    The Generative Adversarial Networks (GAN) have achieved exemplary performance in generating realistic images. They also perform image to image translation and produce good results for the same. In this thesis, we explore the use of GAN for performing cross domain image mapping for facial expression transfer. In facial expression transfer, the expressions of source image is transferred on the target image. We use a DiscoGAN (Discovery GAN) model for the task. Using a DiscoGAN, image of the target is generated with the facial features of the source. It uses feature matching loss along with the GAN objective and reconstruction loss. We propose a method to train the DiscoGAN with paired data of source and target images. In order to learn cross domain image mapping, we train the DiscoGAN with a batch size of 1. In our next work, we propose an algorithm to binarize the degraded document images in this thesis. We incorporate U-Net for the task at hand. We model document image binarization as a classification problem wherein we generate an image which is a result of classification of each pixel as text or background. Optimizing the cross entropy loss function, we translate the input degraded image to the corresponding binarized image. Our approach of using U-Net ensures low level feature transfer from the input degraded image to the output binarized image and thus it is better than using a simple convolution neural network. Our method of training leads to the desired results faster when both the degraded document and the ground truth binarized images are available for training and it also generalizes well. The results obtained are significantly better than the state-of-theart techniques and the approach is simpler than other deep learning approaches for document image binarization.
  • ItemOpen Access
    Imbalanced bioassay data classification for drug discovery
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Shah, Jeni Snehal; Joshi, Manjunath V.
    All the methods developed for pattern recognition will show inferior performance if the dataset presented to it is imbalanced, i.e. if the samples belonging to one class are much more in number compared to the samples from the other class/es. Due to this, imbalanced dataset classification has been an active area of research in machine learning. In this thesis, a novel approach to classifying imbalanced bioassay data is presented. Bioassay data classification is an important task in drug discovery. Bioassay data consists of feature descriptors of various compounds and the corresponding label which denotes its potency as a drug: active or inactive. This data is highly imbalanced, with the percentage of active compounds ranging from 0.1% to 1.4%, leading to inaccuracies in classification for the minority class. An approach for classification in which separate models are trained by using different features derived by training stacked autoencoders (SAE) is proposed. After learning the features using SAEs, feed-forward neural networks (FNN) are used for classification, which are trained to minimize a class sensitive cost function. Before learning the features, data cleaning is performed using Synthetic Minority Oversampling Technique (SMOTE) and removing Tomek links. Different levels of features can be obtained using SAE. While some active samples may not be correctly classified by a trained network on a certain feature space, it is assumed that it can be classified correctly in another feature space. This is the underlying assumption behind learning hierarchical feature vectors and learning separate classifiers for each feature space. vi
  • ItemOpen Access
    Learning to rank: using Bayesian networks
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Gupta, Parth; Mjumder, Prasenjit; Mitra, Suman K.
    Ranking is one of the key components of an Information Retrieval system. Recently supervised learning is involved for learning the ranking function and is called 'Learning to Rank' collectively. In this study we present one approach to solve this problem. We intend to test this problem in di erent stochastic environment and hence we choose to use Bayesian Networks for machine learning. This work also involves experimentation results on standard learning to rank dataset `Letor4.0'[6]. We call our approach as BayesNetRank. We compare the performance of BayesNetRank with another Support Vector Machine(SVM) based approach called RankSVM [5]. Performance analysis is also involved in the study to identify for which kind of queries, proposed system gives results on either extremes. Evaluation results are shown using two rank based evaluation metrics, Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG).
  • ItemOpen Access
    Human action recognition in video
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Kumari, Sonal; Mitra, Suman K.
    Action recognition is a central problem in computer vision which is also known as action recognition or object detection. Action is any meaningful movement of the human and it is used to convey information or to interact naturally without any mechanical devices. It is of utmost importance in designing an intelligent and efficient human–computer interface. The applications of action recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. Human action recognition is motivated by some of the applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people etc. In any Action Recognition System, a video stream can be captured by using a fixed camera, which may be mounted on the computer or somewhere else. Then some preprocessing steps are done for removing the noise caused because of illumination effects, blurring, false contour etc. Background subtraction is done to remove the static or slowly varying background. In this thesis, multiple background subtraction algorithms are tested and then one of them selected for action recognition system. Background subtraction is also known as foreground/background segmentation or background segmentation or foreground extraction. These terms are frequently used interchangeably in this thesis. The selection of background segmentation algorithm is done on the basis of result of these algorithms on the action database. Good background segmentation result provides a more robust basis for object class recognition. The following four methods for extracting the foreground which are tested: (1) Frame Difference, (2) Background Subtraction, (3) Adaptive Gaussian Mixture Model (Adaptive GMM) [25], and (4) Improved Adaptive Gaussian Mixture Model (Improved Adaptive GMM) [26] in which the last one gives the best result. Now the action region can be extracted in the original video sequences with the help of extracted foreground object. The next step is the feature extraction which deals with the extraction of the important feature (like corner points, optical flow, shape, motion vectors etc.) from the image frame which can be used for tracking in the video frame sequences. Feature reduction is an optional step which basically reduces the dimension of the feature vector. In order to recognize actions, any learning and classification algorithm can be employed. The System is trained by using a training dataset. Then, a new video can be classified according to the action occurring in the video. Following three features are applied for the action recognition task: (1) distance between centroid and corner point, (2) optical flow motion estimation [28, 29], (3) discrete Fourier transform (DFT) of the image block. Among these the proposed DFT feature plays very important role in uniquely identifying any specific action from the database. The proposed novel action recognition model uses discrete Fourier transform (DFT) of the small image block.

    For the experimentation, MuHAVi data [33] and DA-IICT data are used which includes various kinds of actions of various actors. Following two supervised recognition techniques are used: K-nearest neighbor (KNN) and the classifier using Mahalanobis metric. KNN is parameterized classification techniques where K parameter is to be optimized. Mahalanobis Classifier is non-parameterized classification technique, so no need to worry about parameter optimization. To check the accuracy of the proposed algorithm, Sensitivity and False alarm rate test is performed. The results of this tests show that the proposed algorithm proves to be quite accurate in action recognition in video. And to compare result of the recognition system confusion matrices are created and then compared with other recognition techniques. All the experiments are performed in MATLAB®.

  • ItemOpen Access
    Moment based image segmentation
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2010) Chawla, Charu; Mitra, Suman K.
    Usually, digital image of scene is not same as actual; it may degrade because of environment, camera focus, lightening conditions, etc. Segmentation is the key step before performing other operations like description, recognition, scene understanding, indexing, etc. Image segmentation is the identification of homogeneous regions in the image. This is accomplished by segmenting an image into subsets and later assigning the individual pixels to classes. There are various approaches for segmentation to identify the object and its spatial information. These approaches employ some features of the input image(s). The concept of feature is used to denote a piece of information which is relevant for solving the computational task related to a certain application. The moment is an invariant feature used in the pattern recognition field to recognize the test object from the database. The key point of using moment is to provide a unique identification for each object irrespective of its transformations. The moment is the weighted average intensity of pixels. It is used for object recognition so far. Now the idea is to use moment in object classification field. The propose method is to compute Set of Moments as a feature for each pixel to get information of the image. This information can be used further in its detail analysis or decision making systems by classification techniques. Moment requires an area to compute it. Hence, window based method is used for each pixel in the image. All possible windows have been defined in which current pixel is placed at different positions and moment is computed for each window representation. The moments define a relationship of that pixel with its neighbors. The set of moments computed will be feature vector of that pixel. After obtaining the feature vector of pixels, k-means classification technique is used to classify these vectors in k number of classes. The different types of moments are used to classify the images namely: Statistical, Geometric, Legendre moments. Experiments are performed using moments with different window sizes to analyze their effect on execution time and other features. The comparative study is performed on various moments using different window sizes. The comparison is done using mismatching between moments, window sizes and their computation time. The implementation is also performed on noisy images. The results conclude that the proposed method probably gives better result than pixel based classification. The Statistical moment gives better result as compared to Geometric and Legendry moment. Its computation time is also less because it does not involve polynomial function in computation. The window size also affects the segmentation. The small window size preserves edge information in segmented image. The computation time and noise tolerance of proposed algorithm also increases as window size increases. Hence, the selections of window size have trade between computation time and image quality. All the experiments have been performed on both gray and colour scale images in MATLAB(R).
  • ItemOpen Access
    Study of bayesian learning of system characteristics
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2008) Sharma, Abhishek; Jotwani, Naresh D.
    This thesis report basically deals with the scheduling algorithms implemented in our computer systems and about the creation of probabilistic network which predicts the behavior of system. The aim of this thesis is to provide a better and optimized results for any system where scheduling can be done. The material presented in this report will provide an overview of the field and pave the way to studying subsequent topics which gives the detailed theories on Bayesian networks, learning the Bayesian networks and the concepts related to the process scheduling. Bayesian network is graphical model for probabilistic relationships among a set of random variables (either discrete or continuous). These models having several advantages over data analysis. The goal of learning is to find the Bayesian network that best represents the joint probability distribution. One approach is to find the network that maximizes the likelihood of the data or (more conveniently) its logarithm. We describe the methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with complete data also. We relate Bayesian network methods for learning, to learn from data samples generated from the operating system scheduling environment. The various results produced, tested and verified for scheduling algorithms (FCFS, SJF, RR and PW) by an Operating System Scheduling Simulator implemented in programming language JAVA. Here, the given code is modified according to requirements and fulfilling the necessary task.