Theses and Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1

Browse

Search Results

Now showing 1 - 6 of 6
  • ItemOpen Access
    Impact of Weather Conditions on Macroscopic Traffic Stream Variables in an Intelligent Transportation System
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2021) Nigam, Archana; Srivastava, Sanjay
    "Accurate prediction of the macroscopic traffic stream variables such as speed and flow is essential for the traffic operation and management in an Intelligent Transportation System (ITS). Adverse weather conditions like fog, rainfall, and snowfall affect the driver’s visibility, vehicle’s mobility, and road capacity. Accurate traffic forecasting during inclement weather conditions is a non-linear and complex problem as it involves various hidden features such as time of the day, road characteristics, drainage quality, etc. With recent computational technologies and huge data availability, such a problem is solved using data-driven approaches. Traditional data-driven approaches used shallow architecture which ignores the hidden influencing factor and is proved to have limitations in a high dimensional traffic state. Deep learning models are proven to be more accurate for predicting traffic stream variables than shallow models because they extract the hidden features using the layerwise architecture. The impact of weather conditions on traffic is dependent on various hidden features. The rainfall effect on traffic is not directly proportional to the distance between the weather stations and the road segment because of terrain feature constraints. The prolonged rainfall weakens the drainage system, affects soil absorption capability, which causes waterlogging. Therefore, to capture the spatial and prolonged impact of weather conditions, we proposed the soft spatial and temporal threshold mechanism. Another concern with weather data is the traffic data has a high spatial and temporal resolution compared to it. Therefore, missing weather data is difficult to ignore, the spatial interpolation techniques such as Theissen polygon, inverse distance weighted method, and linear regression methods are used to fill out the missing weather data. The deep learning models require a large amount of data for accurate prediction.The ITS infrastructure provides dense and complete traffic data. The installation and maintenance of ITS infrastructures are costly; therefore, the majority of road segments are dependent on cost-effective alternate sources of traffic data. The alternate source of traffic data provides sparse, incomplete, and erroneous information. To overcome the data sparsity issue, we proposed a mechanism to generate fine-grained synthetic traffic data using the SUMO traffic simulator. We studied the impact of rainfall on the traffic stream variables on the arterial, subarterial, and collector roads. An empirical model is designed and calibrated for a variety of traffic and weather conditions. The Krauss car-following model in SUMO is upgraded to use the proposed empirical model for computing the vehicle speed. The simulation model is validated by comparing the synthetic data with the ground truth data under various traffic and weather conditions. We find that the empirical model accurately captures the effect of rainfall on the traffic stream variables, and the synthetic data shows a very good match with the ground truth data. We adopted multiple deep learning models because of their underlying characteristics to extract the spatiotemporal features from the traffic and weather data. Convolutional Neural Network (CNN) model has the characteristics to extract neighboring pixels correlation. The sequence learning models, Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) learn dependencies in the data based on the past and the current information. We designed the hybrid deep learning models, CNN-LSTM and LSTM-LSTM. The former model extracts the spatiotemporal features and the latter model uses these features as memory. The latter model predicts the traffic stream variables depending upon the memory and the temporal input. The hybrid models are effective in learning the long-term dependency between the traffic and weather data. We performed various experiments to validate the deep learning models, we use the synthetic traffic data generated by SUMO using the empirical model for different road types (arterial, sub-arterial, and collector) and different road networks (single, small, and large). The results show that the deep learning model trained with the traffic and rainfall data gives better prediction accuracy than the model trained without rainfall data. The performance of the LSTM-LSTM model is better than the other models in all the scenarios. Considering the large road network, where roads are prone to waterlogging, under long-term dependency LSTM-LSTM outperforms the other deep learning models including RNN, CNN, LSTM, CNN-LSTM, and existing models. For the worst-case scenario, the traffic prediction error of LSTM-LSTM is between 3-15% for 15 to 60-minute future time instances, which is in line with the accuracy needed for ITS applications."
  • ItemOpen Access
    English Handwritten Word Recognition
    (2021) Shah, Vidit; Khare, Manish; Bhilare, Shruti
    Today, tons of data is being generated every day and this helps with the automation of several tasks. Automated recognition of handwritten words from images is one such challenging task. This can be done by extracting the important features out of an image. The major challenge for handwritten word recognition over optical word recognition is the inherent variation in the handwriting styles. To recognize such words there must be a model or a system. Thus, it is of utmost importance to build handwritten word recognition models with high accuracy. The model will face multiple challenges that need to be overcome to accurately predict the given word on its own. This model can be used in pharmaceuticals to convert the prescription or report images into scanned documents and store the relevant information from it. In this work, I will be building a deep-learningbased odel for the English Handwritten Dataset that can recognize the words from the images. Dataset used here is the IAM word dataset. This dataset is publicly available. CNN architecture helps to extract features from images. Features could be in the form of edges or blurred images. RNN helps to learn the model from the previous states and predict the output for the next state. This process is called sequential learning. Combining the strength of feature extraction from CNN and sequence learning from RNN i.e. C-RNN, I got 72.46% accuracy and 11.88% character error rate. Accuracy depends on the dataset used for training purposes.
  • ItemOpen Access
    Image Super Resolution Using Deep Neural Networks
    (2021) Singh, Harsh Vardhan; Kumar, Ahlad
    The recent outbreak of COVID-19 has motivated researchers to contribute in the area of medical imaging using artificial intelligence and deep learning. Superresolution (SR), in the past few years, has produced remarkable results using deep learning methods. The ability of deep learning methods to learn the non-linear mapping from low-resolution (LR) images to their corresponding high-resolution (HR) images leads to compelling results for SR in diverse areas of research. In this paper, we propose a deep learning based image super-resolution architecture in Tchebichef transform domain. This is achieved by integrating a transform layer into the proposed architecture through a customized Tchebichef convolutional layer (TCL). The role of TCL is to convert the LR image from the spatial domain to the orthogonal transform domain using Tchebichef basis functions. The inversion of the aforementioned transformation is achieved using another layer known as the Inverse Tchebichef convolutional Layer (ITCL), which converts back the LR images from the transform domain to the spatial domain. It has been observed that using the Tchebichef transform domain for the task of SR takes the advantage of high and low-frequency representation of images that makes the task of super-resolution simplified. We, further, introduce transfer learning approach to enhance the quality of Covid based medical images. It is shown that our architecture enhances the quality of X-ray and CT images of COVID-19, providing a better image quality that helps in clinical diagnosis. Experimental results from our architecture provides competitive results when compared with most of the deep learning methods employed using a fewer number of trainable parameters.
  • ItemOpen Access
    Document Language Classification Using Deep Learning Approaches
    (2021) Shah, Sarathi Surendra; Joshi, Manjunath V.
    Optical character recognition (OCR) refers to the task of recognizing the characters or text from digital document images. OCR is a widely researched area for the past many years due to its applications in various fields. It helps in the natural language processing of the documents, convert the document text to speech, semantic analysis of the text, searching in the documents etc. Multilingual OCR works with documents having more than one language. Different OCR models have been created and optimized for a particular language. However, while dealing with multiple languages or translation of documents, one needs to detect the language of the document first and then give it as input to a model-specific to that language. So, while performing OCR on multilingual documents, it is better to first recognize the language of the document and then give it as input to the OCR model optimized for that particular language. Most of the researched work in this area focuses on identifying scripts, but considering that the Convolutional Neural Network (CNN) can learn appropriate features, our work focuses on language detection using learned features. We have proposed two classification models using CNN where one model classifies Gujarati and English language at word-level and the other classifies six Indian languages at page-level. We use a hierarchical based method in which a binary classification followed by the multiclass classification is used to improve detection accuracy for page-level classification. Largely, the current approaches do not use hierarchy and hence fail to identify the language correctly. The proposed hierarchical approach is used to detect six Indian languages namely: Tamil, Telugu, Kannada, Hindi, Marathi, Gujarati, using the CNN from printed documents based on the text content in a page. Experiments are performed on scanned government documents, and results indicate that the proposed approach performs better than the other similar methods. Advantage of our approach is that it is based on features extracted from the entire page rather than the words or characters, and it can also be applied to handwritten documents.
  • ItemOpen Access
    Image Captioning Using Visual And Semantic Attention Mechanism
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2021) Patel, Abhikumar; Khare, Manish; Kumar, Ahlad
    Image captioning is a method of generating captions/descriptions for the image. Image captioning have many applications in various fields like image indexing for content based image retrieval, Self-driving car, for visually impaired persons, in smart surveillance system and many more. It connects two major research communities of computer vision and natural language processing. The main challenges in image captioning are to recognize the important objects, their attributes, and their visual relationships of objects within an image, then it also needs to generate syntactically and semantically correct sentences. Currently, most of the architectures for image captioning are based on the encoder-decoder model, in which the image is first encoded using CNN to get an abstract version of the image then it is decoded using RNN to get proper caption for the image. So finally I have selected one base paper which was based on visual attention on the image to attend the most appropriate region of the image while generating each word for the caption. But they have miss one important factor while generating the caption for the image which was visual relationships between the objects present in the image. So I have decided to add one relationship detector module to that model to consider the relationships between objects. After combining this module with existing show-attend and tell model we get the caption for the image which consider the relationships between object, which ultimately enhance the quality of the caption for the image. I have performed experiments on various publicly available standard datasets like Flickr8k dataset, Flickr30k dataset and MSCOCO dataset.
  • ItemOpen Access
    Blind inpainting and super-resolution using convolutional neural network
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2016) Sohoney, Surabhi; Joshi, Manjunath V.
    In this work, we propose a combined approach to two image processing problems:Image Inpainting and Image Super-Resolution(SR). A number of efficient techniqueshave been developed for solving these two problems using deep learning,separately. Researchers have developed hierarchical approaches to solve theseproblems, first in-paint and then super-resolve but there is not much advancementfor solving them simultaneously. There are many applications where both inpaintingand super-resolution are desired simultaneously like digital reconstructionof invaluable artwork in heritage sites, immersive walk-through systems etc.We present a supervised learning based approach for simultaneous blind inpaintingand super-resolution using Deep Convolutional Neural Network. Networklearns mapping between corrupted image patches and true image patches as wellas mapping from low resolution features to high resolution features. Trained deepconvolutional neural network accepts corrupted low resolution (LR) image as inputand outputs a clean high resolution (HR) image. Our network is capable of removingcomplex patterns from an image and providing higher resolution. However,our focus is limited to simultaneous scratch inpainting and super-resolution.