Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
Search Results
Item Open Access Translation of Hindi in Roman Script into English: Use of Transformer(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Modi, Parth; Joshi, Manjunath V.Translation from one language to another is a complex problem in machine learningand one in which the machine still cannot achieve satisfactory results. Therecent focus for solving this challenge has been on neural machine translation(NMT) techniques using architectures such as recurrent neural network (RNN)and long short-term memory (LSTM). Even though they give slightly better resultsthan the previously available conventional techniques, the transformer canoutperform these NMT techniques. To the best of our knowledge work is yet tobe carried out in translating Hindi language sentences written in Roman (English)letters into English. In this report, we discuss how the architecture of transformerthat uses attention mechanism is used to translate Hindi language sentences writtenin Roman letters into English sentences. Since there was no dataset availabletill now, our work also involves creating a dataset for training and testing. Ourresults are compared with other approaches using BLEU score as a measure.Item Open Access Feature Selection Methods in Twin Support Vector Machines(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Dodiya, Ruchita; Anand, PritamFor the development of a machine learning model both parameter tuning and feature selection are necessary . The model�s hyper parameters need to be tuned toachieve the best values because they have a significant impact on how well themodel works and the objective of feature selection is to identify the most important subset of features that contribute to reliable predictions and model understanding.The primary goal of this study is to examine the effectiveness of feature selectiontechniques when used Twin Support Vector Machines (TWSVM) and traditionalSupport Vector Machines (SVM). We want to determine that the feature selectiontechnique results is the best performance increase for TWSVM and SVM by conducting extensive experiments on multiple datasets. The results of this study willgive important information about how feature selection will improve the classification accuracy and effectiveness.The methodology used in this study involves applying different kinds of parameter tuning and feature selection techniques for Support Vector Machines (SVM)and Twin Support Vector Machines (TWSVM) using linear and RBF kernels. Weused a hybrid approach to parameter tuning and feature selection. Optimized thehyper parameters using the Grid Search and Simulated Annealing (SA) methods.Then, with SA-based parameter tuning, we combined the Binary GravitationalSearch Algorithm (BGSA) and Teaching-Learning-Based Optimization (TLBO) forfeature selection.We use these techniques to enhance the performance of SVM and TWSVM modelsby tuning their parameters and selecting useful features. Our results show thatfeature selection methods are more effective at selecting relevant features whileusing less computation time in TWSVM compare to SVM.Item Open Access Modeling performance and power matrix of disparate computer systems using machine learning techniques (Modeling Compiler Systems Selection)(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Mankodi, Amit; Bhatt, Amit; Chaudhury, BhaskarIn the last couple of decades, there has been an exponential growth in the processor,cache, and memory features of computer systems. These hardware featuresplay a vital role in determining the performance and power of a softwareapplication when executed on different computer systems. Furthermore, anyminor alterations in hardware features or applications can alter and impact theperformance and power consumption. Compute-intensive (compute-bound) applicationshave a higher dependence on processor features, while data-intensive(memory-bound) applications have a higher dependence on memory features. Tomatch the customized budgets in performance and power, selecting computersystems with appropriate hardware features (processor, cache, and memory) becomesextremely essential. To adhere to user-specific budgets, selecting computersystems requires access to physical systems to gather performance and powerutilization data. To expect a user to have access to physical systems to achievethis task is prohibitive in cost; therefore, it becomes essential to develop a virtualmodel which would obviate the need for physical systems.Researchers have used system-level simulators for decades to build simulatedcomputer systems using processor, cache, and memory features to provide estimatesof performance and power. In one approach, building virtual systemsusing a full-system simulator (FSS), provides the closest possible estimate of performanceand power measurement to a physical system. In the recent past, machinelearning algorithms have been trained on the above-mentioned accurate FSSmodels to predict performance and power for varying features in similar systems,achieving fairly accurate results. However, building multiple computer systemsin a full-system simulator is complex and an extremely slow process. The problem gets compounded due to the fact that access to such accurate simulators islimited.However, there is an alternative approach of utilizing the open-source gem5simulator using its emulation mode to rapidly build simulated systems. Unfortunately,it compromises the measurement accuracy in performance and power ascompared to FSS models. When these results are used to train any machine learningalgorithm, the predictions would be slightly inaccurate compared to thosetrained using FSS models. To make this approach useful, one needs to reduce theinaccuracy of the predictions that are introduced due to the nature and design ofthe gem5 functionality and as a consequence of this, the variation introduced dueto the types of applications, whether it is compute-intensive or data-intensive.This dissertation undertakes the above-mentioned challenge of whether onecan effectively combine the speed of the open-access gem5 simulated system alongwith the accuracy of a physical system to acquire accurate machine learning predictions.If this challenge is met, a user would be able to successfully select asystem either in the cloud or in the real world to run applications within ones�power and performance budget.In our proposed methodology, we first created several gem5 models usingthe emulation mode for available systems with varying features like the type ofprocessors (Instruction Set Architecture, speed and cache configuration), type ofmemory its speed and size. We executed compute-intensive and data-intensivebenchmark applications to these models to procure performance results. In thesecond step, 80% of the models, generated using the gem5 simulator in the emulationmode, were used to train machine learning algorithms like linear, supportvector, Gaussian, tree-based and neural network. The remaining 20% modelswere used for the purpose of performance prediction. It was found that the treebasedalgorithm predicted the closest performance values compared to the simulatedsystems� results obtained using the above-mentioned gem5 model. We subsequentlyused hardware configuration and application execution statistics datagenerated by the gem5 model and fed it to the Multicore Power Area and Timing(McPAT) modeling tool which would estimate power usage.To check the accuracy of the gem5 simulator results, the above-mentionedbenchmark applications were fed to real systems with identical features. Thegiven application code was modified to invoke the Performance Application ProgrammingInterface (PAPI) function to measure the power consumption. Therewas a sizeable difference between the results of the gem5 model and the real systemin terms of performance and power.We conceptualized the idea of using scaling and transfer learning in the contextof bridging the difference between predicted values to actual values. We proposeda scaling technique that can establish an application-specific scaling factorusing a correlation coefficient between hardware features and performance/power.This scaling factor would capture the difference and apply it to a set of predictedvalues to conform to those of the physical system. The results demonstrate thatfor selected benchmark applications the scaling technique achieves a predictionaccuracy of 75%-90% for performance and 60%-95% for power. The accuracy ofthe results validates that the scaling technique effectively attempts to bring predictedperformance and power values closer to that of physical systems to enablethe selection of an appropriate computer system(s).Another method to achieve better prediction values is to develop a modelbased on the existing transfer learning technique. To use the transfer learningmethod, we train the decision tree algorithm based on two sets of data; one, froma simulated system and the second from a closely matching physical system. Usingtrained models, we attempt to predict the performance and power of the targetphysical system. The target system is different from the source physical systemused for training the machine learning algorithm. This model uses performanceand power from a source physical system during training to bring predicted valuescloser to that of the target system. The results from the transfer learning techniquefor selected benchmark applications display the mean prediction accuracyfor different target systems to be between 10% to 50%.In this work, we have demonstrated that our proposed techniques, scalingand transfer learning, are effective in estimating fairly accurate performance andpower values for the physical system using the predicted values from a machinelearning model trained on a gem5 simulated systems dataset. Therefore, thesetechniques provide a method to estimate performance and power values for physicalcomputer systems, with known hardware features, without a need for accessto these systems. With estimated performance and power values coupled withhardware features of the physical systems, we can select system(s) based on userprovidedbudget/s of performance and power.Item Open Access Automatic Text Translation of Multilingual Sentences using Transformer(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Hari Charan, Edara Veera Venkata; Joshi, Manjunath V.; Hati, AvikMachine translation from one language to another is a complex problem in machine learning and one in which the machine still cannot achieve satisfactory results. The recent focus for solving this challenge has been on neural machine translation (NMT) techniques, by using architectures such as recurrent neural network (RNN) and long term short memory (LSTM). But the architecture of transformer is able to outperform these NMT techniques. The architecture of the transformer has been successfully utilized to build models that target a single language pair translation or translation among multiple languages. But it currently lacks research in the area of translation of multilingual sentences, where each sentence is in the form of a mixture of languages. In this work we will establish a model based on the transformer architecture that can translate multilingual sentences into a single language, with the help of a multilingual neural machine translation (MNMT) model and custom made datasets.Item Open Access 3D shape deformations : a lie group based approach(Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Bansal, Sumukh; Tatu, Aditya3D shapes are ubiquitous in many fundamental tasks of computer graphics and geometry processing. For many applications, new shapes have to be generated from the existing ones, for which it it imperative to understand and model shape of an object and its deformation. This thesis focuses on shape deformations and its applications. Real world 3D objects undergo complex, often non-rigid deformations. One way to model such deformations is using local affine transformations. It is thus important for applications like 3D animation, to understand the structure of affine transformations and come up with robust and efficient computational tools on the set of affine transformations. With such tools, applications like interactive shape deformation and mesh interpolation can be effectively dealt with. In this thesis, an interpolation framework for affine transformations, based on a Lie group representation of a tetrahedron is proposed. The proposed framework provides a intuitive closed form interpolation in all cases in contrast to existing approaches and preserves properties like isometry, reversibility, and monotonic change of volume. The proposed Lie group representation of the tetrahedron is extended to represent triangular and tetrahedral meshes. A detailed analysis of the invariance of the representation and interpolation to some choices made, is provided in the thesis. We demonstrate the applicability of the framework for several applications like interactive shape deformation, shape interpolation, morphing, and deformation transfer. The proposed interactive shape deformation algorithm is close to being real-time, while the mesh interpolation algorithm is able to handle nonregistered meshes and large deformation cases. The interactive shape deformavi tion algorithm is amenable to data-driven methods, and we hope to explore datadriven methods using our mesh representation in near future.Item Open Access Deep learning techniques for speech pathology applications(2020) Purohit, Mirali Virendrabhai; Patil, Hemant A.Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.Item Open Access Learning cross domain relations using deep learning(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kotecha, Dhara; Joshi, Manjunath V.The Generative Adversarial Networks (GAN) have achieved exemplary performance in generating realistic images. They also perform image to image translation and produce good results for the same. In this thesis, we explore the use of GAN for performing cross domain image mapping for facial expression transfer. In facial expression transfer, the expressions of source image is transferred on the target image. We use a DiscoGAN (Discovery GAN) model for the task. Using a DiscoGAN, image of the target is generated with the facial features of the source. It uses feature matching loss along with the GAN objective and reconstruction loss. We propose a method to train the DiscoGAN with paired data of source and target images. In order to learn cross domain image mapping, we train the DiscoGAN with a batch size of 1. In our next work, we propose an algorithm to binarize the degraded document images in this thesis. We incorporate U-Net for the task at hand. We model document image binarization as a classification problem wherein we generate an image which is a result of classification of each pixel as text or background. Optimizing the cross entropy loss function, we translate the input degraded image to the corresponding binarized image. Our approach of using U-Net ensures low level feature transfer from the input degraded image to the output binarized image and thus it is better than using a simple convolution neural network. Our method of training leads to the desired results faster when both the degraded document and the ground truth binarized images are available for training and it also generalizes well. The results obtained are significantly better than the state-of-theart techniques and the approach is simpler than other deep learning approaches for document image binarization.Item Open Access Imbalanced bioassay data classification for drug discovery(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Shah, Jeni Snehal; Joshi, Manjunath V.All the methods developed for pattern recognition will show inferior performance if the dataset presented to it is imbalanced, i.e. if the samples belonging to one class are much more in number compared to the samples from the other class/es. Due to this, imbalanced dataset classification has been an active area of research in machine learning. In this thesis, a novel approach to classifying imbalanced bioassay data is presented. Bioassay data classification is an important task in drug discovery. Bioassay data consists of feature descriptors of various compounds and the corresponding label which denotes its potency as a drug: active or inactive. This data is highly imbalanced, with the percentage of active compounds ranging from 0.1% to 1.4%, leading to inaccuracies in classification for the minority class. An approach for classification in which separate models are trained by using different features derived by training stacked autoencoders (SAE) is proposed. After learning the features using SAEs, feed-forward neural networks (FNN) are used for classification, which are trained to minimize a class sensitive cost function. Before learning the features, data cleaning is performed using Synthetic Minority Oversampling Technique (SMOTE) and removing Tomek links. Different levels of features can be obtained using SAE. While some active samples may not be correctly classified by a trained network on a certain feature space, it is assumed that it can be classified correctly in another feature space. This is the underlying assumption behind learning hierarchical feature vectors and learning separate classifiers for each feature space. viItem Open Access Learning to rank: using Bayesian networks(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Gupta, Parth; Mjumder, Prasenjit; Mitra, Suman K.Ranking is one of the key components of an Information Retrieval system. Recently supervised learning is involved for learning the ranking function and is called 'Learning to Rank' collectively. In this study we present one approach to solve this problem. We intend to test this problem in di erent stochastic environment and hence we choose to use Bayesian Networks for machine learning. This work also involves experimentation results on standard learning to rank dataset `Letor4.0'[6]. We call our approach as BayesNetRank. We compare the performance of BayesNetRank with another Support Vector Machine(SVM) based approach called RankSVM [5]. Performance analysis is also involved in the study to identify for which kind of queries, proposed system gives results on either extremes. Evaluation results are shown using two rank based evaluation metrics, Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG).Item Open Access Human action recognition in video(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Kumari, Sonal; Mitra, Suman K.Action recognition is a central problem in computer vision which is also known as action recognition or object detection. Action is any meaningful movement of the human and it is used to convey information or to interact naturally without any mechanical devices. It is of utmost importance in designing an intelligent and efficient human–computer interface. The applications of action recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. Human action recognition is motivated by some of the applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people etc. In any Action Recognition System, a video stream can be captured by using a fixed camera, which may be mounted on the computer or somewhere else. Then some preprocessing steps are done for removing the noise caused because of illumination effects, blurring, false contour etc. Background subtraction is done to remove the static or slowly varying background. In this thesis, multiple background subtraction algorithms are tested and then one of them selected for action recognition system. Background subtraction is also known as foreground/background segmentation or background segmentation or foreground extraction. These terms are frequently used interchangeably in this thesis. The selection of background segmentation algorithm is done on the basis of result of these algorithms on the action database. Good background segmentation result provides a more robust basis for object class recognition. The following four methods for extracting the foreground which are tested: (1) Frame Difference, (2) Background Subtraction, (3) Adaptive Gaussian Mixture Model (Adaptive GMM) [25], and (4) Improved Adaptive Gaussian Mixture Model (Improved Adaptive GMM) [26] in which the last one gives the best result. Now the action region can be extracted in the original video sequences with the help of extracted foreground object. The next step is the feature extraction which deals with the extraction of the important feature (like corner points, optical flow, shape, motion vectors etc.) from the image frame which can be used for tracking in the video frame sequences. Feature reduction is an optional step which basically reduces the dimension of the feature vector. In order to recognize actions, any learning and classification algorithm can be employed. The System is trained by using a training dataset. Then, a new video can be classified according to the action occurring in the video. Following three features are applied for the action recognition task: (1) distance between centroid and corner point, (2) optical flow motion estimation [28, 29], (3) discrete Fourier transform (DFT) of the image block. Among these the proposed DFT feature plays very important role in uniquely identifying any specific action from the database. The proposed novel action recognition model uses discrete Fourier transform (DFT) of the small image block.For the experimentation, MuHAVi data [33] and DA-IICT data are used which includes various kinds of actions of various actors. Following two supervised recognition techniques are used: K-nearest neighbor (KNN) and the classifier using Mahalanobis metric. KNN is parameterized classification techniques where K parameter is to be optimized. Mahalanobis Classifier is non-parameterized classification technique, so no need to worry about parameter optimization. To check the accuracy of the proposed algorithm, Sensitivity and False alarm rate test is performed. The results of this tests show that the proposed algorithm proves to be quite accurate in action recognition in video. And to compare result of the recognition system confusion matrices are created and then compared with other recognition techniques. All the experiments are performed in MATLAB®.