Theses and Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1

Browse

Search Results

Now showing 1 - 10 of 12
  • ItemOpen Access
    Impact of Weather Conditions on Macroscopic Traffic Stream Variables in an Intelligent Transportation System
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2021) Nigam, Archana; Srivastava, Sanjay
    "Accurate prediction of the macroscopic traffic stream variables such as speed and flow is essential for the traffic operation and management in an Intelligent Transportation System (ITS). Adverse weather conditions like fog, rainfall, and snowfall affect the driver’s visibility, vehicle’s mobility, and road capacity. Accurate traffic forecasting during inclement weather conditions is a non-linear and complex problem as it involves various hidden features such as time of the day, road characteristics, drainage quality, etc. With recent computational technologies and huge data availability, such a problem is solved using data-driven approaches. Traditional data-driven approaches used shallow architecture which ignores the hidden influencing factor and is proved to have limitations in a high dimensional traffic state. Deep learning models are proven to be more accurate for predicting traffic stream variables than shallow models because they extract the hidden features using the layerwise architecture. The impact of weather conditions on traffic is dependent on various hidden features. The rainfall effect on traffic is not directly proportional to the distance between the weather stations and the road segment because of terrain feature constraints. The prolonged rainfall weakens the drainage system, affects soil absorption capability, which causes waterlogging. Therefore, to capture the spatial and prolonged impact of weather conditions, we proposed the soft spatial and temporal threshold mechanism. Another concern with weather data is the traffic data has a high spatial and temporal resolution compared to it. Therefore, missing weather data is difficult to ignore, the spatial interpolation techniques such as Theissen polygon, inverse distance weighted method, and linear regression methods are used to fill out the missing weather data. The deep learning models require a large amount of data for accurate prediction.The ITS infrastructure provides dense and complete traffic data. The installation and maintenance of ITS infrastructures are costly; therefore, the majority of road segments are dependent on cost-effective alternate sources of traffic data. The alternate source of traffic data provides sparse, incomplete, and erroneous information. To overcome the data sparsity issue, we proposed a mechanism to generate fine-grained synthetic traffic data using the SUMO traffic simulator. We studied the impact of rainfall on the traffic stream variables on the arterial, subarterial, and collector roads. An empirical model is designed and calibrated for a variety of traffic and weather conditions. The Krauss car-following model in SUMO is upgraded to use the proposed empirical model for computing the vehicle speed. The simulation model is validated by comparing the synthetic data with the ground truth data under various traffic and weather conditions. We find that the empirical model accurately captures the effect of rainfall on the traffic stream variables, and the synthetic data shows a very good match with the ground truth data. We adopted multiple deep learning models because of their underlying characteristics to extract the spatiotemporal features from the traffic and weather data. Convolutional Neural Network (CNN) model has the characteristics to extract neighboring pixels correlation. The sequence learning models, Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) learn dependencies in the data based on the past and the current information. We designed the hybrid deep learning models, CNN-LSTM and LSTM-LSTM. The former model extracts the spatiotemporal features and the latter model uses these features as memory. The latter model predicts the traffic stream variables depending upon the memory and the temporal input. The hybrid models are effective in learning the long-term dependency between the traffic and weather data. We performed various experiments to validate the deep learning models, we use the synthetic traffic data generated by SUMO using the empirical model for different road types (arterial, sub-arterial, and collector) and different road networks (single, small, and large). The results show that the deep learning model trained with the traffic and rainfall data gives better prediction accuracy than the model trained without rainfall data. The performance of the LSTM-LSTM model is better than the other models in all the scenarios. Considering the large road network, where roads are prone to waterlogging, under long-term dependency LSTM-LSTM outperforms the other deep learning models including RNN, CNN, LSTM, CNN-LSTM, and existing models. For the worst-case scenario, the traffic prediction error of LSTM-LSTM is between 3-15% for 15 to 60-minute future time instances, which is in line with the accuracy needed for ITS applications."
  • ItemOpen Access
    Deep Learning for Severity Level-based Classification of Dysarthria
    (2021) Gupta, Siddhant; Patil, Hemant A.
    Dysarthria is a motor speech disorder in which muscles required to speak somehow gets damaged or paralyzed resulting in an adverse effect to the articulatory elements in the speech and rendering the output voice unintelligible. Dysarthria is considered to be one of the most common form of speech disorders. Dysarthria occurs as a result of several neurological and neuro-degenerative diseases, such as Parkinson’s Disease, Cerebral palsy, etc. People suffering from dysarthria face difficulties in conveying vocal messages and emotions, which in many cases transform into depression and social isolation amongst the individuals. Dysarthria has become a major speech technology issue as the systems that work efficiently for normal speech, such as Automatic Speech Recognition systems, do not provide satisfactory results for corresponding dysarthric speech. In addition, people suffering from dysarthria are generally limited by their motor functions. Therefore, development of voice assisted systems for them become all the more crucial. Furthermore, analysis and classification of dysarthric speech can be useful in tracking the progression of disease and its treatment in a patient. In this thesis, dysarthria has been studied as a speech technology problem to classify dysarthric speech into four severity-levels. Since, people with dysarthria face problem during long speech utterances, short duration speech segments (maximum 1s) have been used for the task, to explore the practical applicability of the thesis work. In addition, analysis of dysarthric speech has been done using different methods such as time-domain waveforms, Linear prediction profile, Teager Energy Operator profile, Short-Time Fourier Transform etc., to distinguish the best representative feature for the classification task. With the rise in Artificial Intelligence, deep learning techniques have been gaining significant popularity in the machine classification and pattern recognition tasks. Therefore, to keep the thesis work relevant, several machine learning and deep learning techniques, such as Gaussian Mixture Models (GMM), Convolutional Neural Network (CCN), Light Convolutional Neural Network (LCNN), and Residual Neural Network (ResNet) have been adopted. The severity levelbased classification task has been evaluated on various popular measures such as, classification accuracy and F1-scores. In addition, for comparison with the short duration speech, classification has also been done on long duration speech (more than 1 sec) data. Furthermore, to enhance the relevance of the work, experiments have been performed on statically meaningful and widely used Universal Access-Speech Corpus.
  • ItemOpen Access
    Deep learning techniques for speech pathology applications
    (2020) Purohit, Mirali Virendrabhai; Patil, Hemant A.
    Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.
  • ItemOpen Access
    Apparel attributes classification using deep learning
    (2020) Desai, Harsh Sanjaykumar; Jat, P.M
    Apparel attributes classification finds a practical applications in E-Commerce. The project is for www.Blibli.com website which is an E-commerce Platform in Indonesia and a partner of Coviam Technologies. This report describes an approach to classify attributes such as material, neck/collar, sleeves type etc. specific to various apparels using Natural Language Processing and Deep Learning techniques. The classified products based on attributes will be used as filters on search results page to enhance and improve search mechanism of website. We have classified 95% apparel products based on material attribute and achieved 87% test accuracy on neck/collar attribute classification. The report is divided into four main parts which covers: Introduction, DataSet Preparation, Methodology and the Experimentation. Lastly, other similar work performed during internship along with the future work is discussed.
  • ItemOpen Access
    Clickbait detection using deep learning Techniques
    (2020) Parikh, Apurva Ketanbhai; Majumder, Prasenjit
    With the growing shift towards news consumption primarily through social media sites like Twitter, Facebook etc., most of the news agencies are prompting their stories on social media platform. These news agencies are publishing fake news on social media to generate revenue by enticing users to click on their articles. To increase the number of readers agencies use eye-catchy headlines accompanied with article link, which attract the reader to read the article. These attractive headlines are called Clickbaits. Usually, clickbait article does not meet the expectation of the user. In this work we try to develop an end-to-end clickbait detection system using Transformer based model Bidirectional Encoder Representations from Transformers (BERT). We also found few clickbait specific features which we hypothesised can be utilised along with BERT model to develop a better classifier. Our proposed approach using BERT significantly outperformed baseline paper which utilised BiLSTM.
  • ItemOpen Access
    Applications of deep-learning at digital communication receiver
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Nanavati, Tilak Digantkumar; Vasavada, Yash
    Modulation and demodulation are fundamental modules for communication systems. The modulation techniques — Offset QPSK (OQPSK), p/2 BPSK, p/4 QPSK and GMSK — are frequently applied in the power-constrained wireless communication links (e.g., the terminal transmission links of several 2G, 3G and 4G terrestrial and satellite air-interface standards). However, their detailed numerical comparison of the performance and functional characteristics are currently lacking in the literature. The prior studies have focused on a comparison of at the most two of these four schemes (typically OQPSK versus GMSK). One of the objectives of this thesis is to bridge this gap. We provide a detailed comparison of (i) the spectral regrowth and (ii) probability of bit error Perrb versus Eb/N0 performance of these four modulation schemes in the presence ofAM/AMandAM/PM non-linearities with varying backoff (BO). We believe that our results with key observations will be beneficial in selecting an appropriate modulation technique when designing practical communication systems. Another crucial component of communication and signal processing systems is the estimation of channel parameters. In the practical communication systems, the varying channel conditions and non-linear channel impairments make the task of estimation more challenging. We propose a Deep Learning (DL) application at digital communication receiver to estimate the channel impairments that are difficult to describe with a rigid mathematical tractable model. Another objective of our research work is to develop a learned parameter estimator that effectively captures the non-linear functional mappings and produces accurate estimations. The results for Phase Offset (PO) impairment estimations obtained with our proposed approach give competitive accuracy concerning its baseline equivalent. Lastly, we demonstrate the learning-based modulation classifier that potentially solves the misclassification problem presented in an earlier study.
  • ItemOpen Access
    Augmenting dialogue generation using dialogue act embeddings: a transfer learning approach
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Bisht, Abhimanyu Singh; Majumder, Prasenjit
    The following work looks at contemporary end-to-end dialogue systems with the aim of improving dialogue generation in an open-domain setting. It provides an overview of popular literature in the domain of dialogue generation, followed by a brief look at how human dialogue is understood from the perspective of Linguistics and Cognitive Science. We try to extract useful ideas from these domains of research and implement them in a transfer learning approach where a pretrained language model is supplemented with dialogue act information using special embeddings. The hypothesis behind the proposed approach is that the dialogue act information will aid the generation process. The proposed approach is then compared with a baseline approach on their performance on the DailyDialog[12] dataset using perplexity as the evaluation metric. Though the proposed approach is a significant improvement over the baseline, the contribution of the Dialogue Act Embeddings in the development is shown to be marginal via ablation analysis.
  • ItemOpen Access
    Auditory representation learning
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Sailor, Hardik B.; Patil, Hemant A.
    Representation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be helpful to the pattern classifier. Specifically, the unsupervised RL has gained a significant interest in the feature learning in various signal processing areas including the speech and audio processing. Recently, various RL methods are used to learn the auditorylike representations from the speech signals or its spectral representations. In this thesis, we propose a novel auditory representation learning model based on the Convolutional Restricted Boltzmann Machine (ConvRBM). The auditorylike subband filters are learned when the model is trained directly on the raw speech and audio signals with arbitrary lengths. The learned auditory frequency scale is also nonlinear similar to the standard auditory frequency scales. However, the ConvRBM frequency scale is adapted to the sound statistics. The primary motivation for the development of our model is to apply in the Automatic Speech Recognition (ASR) task. Experiments on the standard ASR databases show that the ConvRBM filterbank performs better than the Mel filterbank. The stability analysis of the model is presented using Lipschitz continuity condition. The proposed model is improved by using annealing dropout and Adam optimization. Noise-robust representation is achieved by combining ConvRBM filterbank with an energy estimation using the Teager Energy Operator (TEO). As a part of the research work for the MeitY, Govt. of India sponsored consortium project, the ConvRBM is used as a front-end for the ASR system in the speech-based access for the agricultural commodities in the Gujarati language. Inspired by the success in the ASR task, we applied our model in three audio classification tasks, namely, Environmental Sound Classification (ESC), synthetic and replay Spoof Speech Detection (SSD) in the context of the Automatic Speaker Verification (ASV), and Infant Cry Classification (ICC).We further propose the two layer auditory model by stacking two ConvRBMs. We refer it as an Unsupervised Deep Auditory Model (UDAM) and it performed well compared to the single layer ConvRBM in the ASR task.
  • ItemOpen Access
    Imbalanced bioassay data classification for drug discovery
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Shah, Jeni Snehal; Joshi, Manjunath V.
    All the methods developed for pattern recognition will show inferior performance if the dataset presented to it is imbalanced, i.e. if the samples belonging to one class are much more in number compared to the samples from the other class/es. Due to this, imbalanced dataset classification has been an active area of research in machine learning. In this thesis, a novel approach to classifying imbalanced bioassay data is presented. Bioassay data classification is an important task in drug discovery. Bioassay data consists of feature descriptors of various compounds and the corresponding label which denotes its potency as a drug: active or inactive. This data is highly imbalanced, with the percentage of active compounds ranging from 0.1% to 1.4%, leading to inaccuracies in classification for the minority class. An approach for classification in which separate models are trained by using different features derived by training stacked autoencoders (SAE) is proposed. After learning the features using SAEs, feed-forward neural networks (FNN) are used for classification, which are trained to minimize a class sensitive cost function. Before learning the features, data cleaning is performed using Synthetic Minority Oversampling Technique (SMOTE) and removing Tomek links. Different levels of features can be obtained using SAE. While some active samples may not be correctly classified by a trained network on a certain feature space, it is assumed that it can be classified correctly in another feature space. This is the underlying assumption behind learning hierarchical feature vectors and learning separate classifiers for each feature space. vi
  • ItemOpen Access
    Personalized News-Feeds Recommendation System
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2017) Paliwal, Ankit; Dasgupta, Sourish
    "The idea of personalization of recommendations is a very important factor - both for users as well as organizations. Users want their experience on a website to be as comfortable as possible, and the organizations want to lure more and more users on their platform. Whether it is shopping on-line or gathering information from across the world, Recommendation engines are changing the way people communicate with these on-line systems, and helping them to make their experiences better. Personalizing News-feeds recommendations is one such system, that helps users on a platform to stay updated with news from all over the globe. Every user has his/her own preferences and interests, which he/she seems to prefer over others. Our aim here, was to design one such system that is able to show the users what they are interested in and not bother them with unwanted material. Their have been many past researches on this topic, each has handled the problem in their own unique way. The major idea behind the systems, however, remains more or less the same, and that is to capture user’s interests. Once the system is able to do that precisely, the recommendation part becomes easy. Different authors use different tools and technologies to do this. Some use Topic Modelling, some others use Deep Learning and some people who use different variations of Hybrid recommendation systems. In this work, I have used Topic Modelling and the idea of penalizing these topics, based on what user prefers to see and what he does not. Our whole program runs as a layer above the model generated using Latent Dirichlet Allocation."