Theses and Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1

Browse

Search Results

Now showing 1 - 10 of 33
  • ItemOpen Access
    Shadow Detection and Removal from video using Deep Learning
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Dodiya, Krutika; Khare, Manish; Gohel, Bakul
    The removal of shadow from images is crucial in computer vision as it can enhancethe interpretability and visual quality of images. This research work proposesa cascade U-Net architecture for the shadow removal, consisting of twostages of U-Net Architecture. In the first stage, a U-Net is trained using theshadow images and their corresponding ground truth to predict the shadow freeimages. The second stage uses the predicted shadow free images and groundtruth as input to another U-Net, which further refines the shadow removal results.This cascade U-Net architecture enables the model to learn and refine theshadow removal progressively, leveraging both the initial predictions and groundtruth.Experimental evaluations on benchmark datasets demonstrate that our approachachieves notably good performance in both qualitative and quantitative evaluations.By using both objective metrics such as Structural Similarity Index(SSIM),and Root mean Square Error (RMSE), and subjective evaluations where humanobservers rate the quality of the shadow removal results, our approach was foundto outperform other state-of-the-art methods. Overall, our proposed cascade UNetarchitecture offers a promising solution for the shadow removal that canimprove image quality and interpretability
  • ItemOpen Access
    Image Processing Using Digital Programming on FPGA
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Kachchhi, Hardi; Agrawal, Yash; Khare, Manish
    Image processing is a way to transform an image into digital form and after thatperform some operations on it that helps to improve images for human interpretationand extract useful information from it. It is essential for a wide range ofapplications. It allows for enhancing and restoring images, extracting featuresfor object recognition, compressing images for efficient storage and transmission,analyzing images for computer vision tasks, enabling medical diagnostics andtreatment, and interpreting data from remote sensing.Field Programmable Gate Array (FPGA) is preferred for image processing dueto their parallel processing capabilities, reconfigurability, low latency, energy efficiency,pipelining support, customization options, real-time processing capabilities,and ease of integration. These advantages make FPGAs a powerful tool forimplementing high-performance and efficient image processing solutions acrossvarious applications.To implement various filters in Image processing, we have developed a methodthat performs various edge detection techniques using FPGAs and displaying theimage on the monitor through Video Graphics Array (VGA)Controller. Edge detectionfilters and blurring filters are an indispensable part of Image processing invarious fields due to their ability to extract information, enhance visual quality,and enable decision-making based on visual data .
  • ItemOpen Access
    Copy-Move Tampering: Some New Approaches of the Detection and Localization in a Digital Image
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Diwan, Anjali; Roy, Anil K.
    Images speak. They tell us stories. Digital images carry plethora of information. The availability of cost-effective digital camera-enabled devices has made capturing images a child’s play. Statistics witnesses that the use of social networking sites has influenced people’s appetite for digital images. Billions and billions of photos are uploaded, shared and forwarded on these platforms. This makes every user an active source of the digital information. The availability of easy-to-use image editing software helps novice and experts as well capable of creating realistic alterations in these digital images. These alterations could be harmless changes for fun or serious image tampering with malicious intentions. This fact raises eyebrows and questions the authenticity of a digital image. When these digital images are used for specific purposes like news broadcasts, research publications, sports, entertainment, fashion, advertisements, legal proceedings etc., this problem becomes more critical and challenging. Therefore, digital image tampering has since long attracted the research community of image processing. Among various tampering operations, copy-move tampering is one of the easiest approaches and therefore the most common approach. In the copy-move tampering process, copying and pasting are done on the same image. Hence colour, noise component, intensity range and other properties of the image remain almost unchanged. It makes tampering detection difficult when no clue about the tampering is available other than the image itself. Further, to camouflage tampering some tricks to hide the footprint of tampering are used, such as blurring of the edges of the copy-pasted pat of the image. Technically this can be achieved by some image processing methods, e.g., JPEG compression, addition of Gaussian noise, brightness change, colour reduction, contrast adjustment etc. Occasionally some geometrical transformations such as scaling and rotation of the copied regions before pasting them somewhere else in the same image are also noticed. All these make the tampering detection a challenging task. Our study focuses on copy-move tampering detection in a digital image, either simply, i.e., without any post-processing trick or affected with different geometrical transformations and image processing methods. We first look into the tampering detection, i.e., identification and localization, using block-based technique. The first two approaches of this thesis are those, one uses LPP (Locality Preserving Projection) and the second is based on NPE (Neighborhood Preserving Embedding). Both are dimensionality reduction techniques while preserving the information of neighborhood. We find that LPP based approach worked well for simple copy-move tampering but performed poorly in case of multiple copy-move tampering and for images with self-similar structures such as some historical monuments. NPE based approach showed considerable improvement in simple copy-move images with post-processing and multiple copy-move tampering detection, however, it could not nail the tampering detection in case of self-similar images. Also, the block-based technique happens to be computationally expensive since it does a pixel-by-pixel comparison in search of a detailed clue of the tampered regions. When the copy-move region is affected with geometrical transformation, one needs a more robust clue for tampering detection. This clue must be rotation and scale-invariant. This made us concentrate on the keypoint based approach for the simple reason that image keypoints are geometrical transformation invariant. We propose to use a combination of the CenSurE keypoint detector and FREAK descriptor, which detects tampering when the image also undergoes change through scale or rotation or both following a copy-move attempt. We find that this approach also works well for simple and multiple copy-move tampering detection like in case of our two block-based approaches. The problem occurs when an image has only a few keypoints. It is observed in case of smooth images of natural landscape such as images of sky or sea or a uniform field etc. To address such situation, we propose our fourth and last approach which is based on CNN (Convolution Neural Network) and image keypoints. We have combined image information generated by CNN and CenSurE keypoints to detect and localize copy-move tampered regions. This approach enables tampering detection when the copy-move region is affected with different post-processing and geometrical transformations even in case of varying textures like smooth, coarse, or highly textured images. All these four approaches are discussed in this thesis in detail. We used several standard datasets available in public domain for performing exhaustive experiments. These are CMFD, GRIP, and CoMoFoD, MICC-F600, MICC-F220, Coverage and CASIA-II datasets. Comparison of results with some of the recently reported results of other research groups help us conclude that our approaches perform better in most of the cases and remain comparable in rest. We also discuss the future scope of our work.
  • ItemOpen Access
    On designing DNA codes and their applications
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2019) Limbachiya, Dixita; Gupta, Manish K.
    Bio-computing uses the complexes of biomolecules such as DNA (Deoxyribonucleic acid), RNA (Ribonucleic acid) and proteins to perform the computational processes for encoding and processing the data. In 1994, L. Adleman introduced the field of DNA computing by solving an instance of the Hamiltonian path problem using the bunch of DNA sequences and biotechnology lab methods. An idea of DNA hybridization was used to perform this experiment. DNA hybridization is a backbone for any computation using the DNA sequences. However, it is also cause of errors. To use the DNA for computing, a specific set of the DNA sequences (DNA codes) which satisfies particular properties (DNA codes constraints) that avoid cross-hybridization are designed to perform a particular task. Contributions of this dissertation can be broadly divided into two parts as 1) Designing the DNA codes by using algebraic coding theory. 2) Codes for DNA data storage systems to encode the data in the DNA. The main research objective in designing the DNA codes over the quaternary alphabets {A, C, G, T}, is to find the largest possible set of M codewords each of length n such that they are at least at the distance d and satisfies the desired constraints which are feasible with respect to practical implementation. In the literature, various computational and theoretical approaches have been used to design a set of DNA codes which are sufficiently dissimilar. Furthermore, DNA codes are constructed using coding theoretic approaches using fields and rings. In this dissertation, one such approach is used to generate the DNA codes from the ring R = Z4 + wZ4, where w2 = 2 + 2w. Some of the algebraic properties of the ring R are explored. In order to define an isometry from the elements of the ring R to DNA, a new distance called Gau distance is defined. The Gau distance motivates the distance preserving map called Gau map f. Linear and closure properties of the Gau map are obtained. General conditions on the generator matrix over the ring R to satisfy reverse and reverse complement constraints on the DNA code are derived. Using this map, several new classes of the DNA codes which satisfies the Hamming distance, reverse and reverse complement constraints are given. The families of the DNA codes via the Simplex type codes, first order and rth order Reed-Muller type codes and Octa type codes are developed. Some of the general results on the generator matrix to satisfy the reverse and reverse complement constraints are given. Some of the constructed DNA codes are optimal with respect to the bounds on M, the size of the code. These DNA codes can be used for a myriad of applications, one of which is data storage. DNA is stable, robust and reliable. Theoretically, it is estimated that one gram of DNA can store 455 EB (1 Exabyte = 1018 bytes). These properties make the DNA a potential candidate for data storage. However, there are various practical constraints for the DNA data storage system. In this work, we construct DNA codes with some of the DNA constraints to design efficient codes to store data in DNA. One of the practical constraints in designing DNA codes for storage is the repeated bases (runlengths) of the same DNA nucleotides. Hence, it is essential that each DNA codeword should avoid long runlengths. In this thesis, codes are proposed for data storage that will dis-allow runlengths of any base to develop DNA data storage error-free codes. A fixed GC-weight u (the occurrence of G and C nucleotides in a DNA codeword) is another requirement for DNA codewords used in DNA storage. DNA codewords with large GC-weight lead to insertion and deletion (indel) errors in DNA reading and amplification process thus, it is crucial to consider a fixed GCweight for DNA code. In this work, we propose methods that generate families of codes for the DNA data storage systems that satisfy no-runlength and fixed GC-weight constraints for the DNA codewords used for data storage. The first is the constrained codes which use the quaternary code and the second is DNA Golay subcodes that use the ternary encoding. The constrained quaternary coding is presented to generate DNA codes for the data storage. We give a construction algorithm for finding families of DNA codes with the no-runlength and fixed GC-weight constraints. The number of DNA codewords of fixed GC-weight with the no-runlength constraint is enumerated. We note that the prior work only gave bounds on the number of such codewords while in this work we count the number of these DNA codewords exactly. We observe that the bound mentioned in the previous work does not take into account the distance of the code which is essential for data reliability. Thus, we consider distance to obtain a lower bound on the number of codewords along with the fixed GC-weight and no-runlength constraints. In the second method, we demonstrate the Golay subcode method to encode the data in a variable chunk architecture of the DNA using ternary encoding. N.Goldman et al. introduced the first proof of concept of the DNA data storage in 2013 by encoding the data without using error correction in the DNA which motivated us to implement this method. While implementing this method, a bottleneck of this approach was identified which limited the amount of data that can be encoded due to fix length chunk architecture used for data encoding. In this work, we propose a modified scheme using a non-linear family of ternary codes based on the Golay subcode that includes flexible length chunk architecture for data encoding in DNA. By using the Golay ternary subcode, two substitution errors can be corrected. In a nutshell, the significant contributions of this thesis are designing DNA codes with specific constraints. First, DNA codes from the ring using algebraic coding by defining a new type of distance (Gau distance) and map (Gau map) are proposed. These DNA codes satisfy reverse, reverse complement and complement with the minimum Hamming distance constraints. Several families of these DNA codes and their properties are studied. Second, DNA codes using constrained coding and Golay subcode method are developed that satisfy norunlength and GC-weight constraints for a DNA data storage system.
  • ItemOpen Access
    Image segmentation fusion by edge detection techniques
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Koshti, Nishant; Banerjee, Asim
    Image Segmentation is one of the basic building blocks in image processing.It is a pre-processing task to make an image processable for further operations such as noise removal,decomposition,morphological operations etc.It is the first step in object identification in an image.It may also be used in compression to compress different areas, segments of an image, at different compression qualities.It differentiates objects in an image from the background of an image.There are different types of segmentation techniques such as color,region growing,split and merge,grayscale,edge detection etc.The technique that should be applied mostly depends on the kind of image given.Segmentation mainly derives the homogeneity of an image.That is it partitions an image into distinct regions that are meant to correlate strongly with objects or features of interest in the image. Segmentation can also be regarded as a process of grouping together pixels that have similar attributes. The level to which the subdivision is carried depends on the problem being solved. That is,segmentation should stop when the objects of interest in an application have been isolated. There is no point in carrying segmentation past the level of detail required to identify those elements.
  • ItemOpen Access
    Learning cross domain relations using deep learning
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kotecha, Dhara; Joshi, Manjunath V.
    The Generative Adversarial Networks (GAN) have achieved exemplary performance in generating realistic images. They also perform image to image translation and produce good results for the same. In this thesis, we explore the use of GAN for performing cross domain image mapping for facial expression transfer. In facial expression transfer, the expressions of source image is transferred on the target image. We use a DiscoGAN (Discovery GAN) model for the task. Using a DiscoGAN, image of the target is generated with the facial features of the source. It uses feature matching loss along with the GAN objective and reconstruction loss. We propose a method to train the DiscoGAN with paired data of source and target images. In order to learn cross domain image mapping, we train the DiscoGAN with a batch size of 1. In our next work, we propose an algorithm to binarize the degraded document images in this thesis. We incorporate U-Net for the task at hand. We model document image binarization as a classification problem wherein we generate an image which is a result of classification of each pixel as text or background. Optimizing the cross entropy loss function, we translate the input degraded image to the corresponding binarized image. Our approach of using U-Net ensures low level feature transfer from the input degraded image to the output binarized image and thus it is better than using a simple convolution neural network. Our method of training leads to the desired results faster when both the degraded document and the ground truth binarized images are available for training and it also generalizes well. The results obtained are significantly better than the state-of-theart techniques and the approach is simpler than other deep learning approaches for document image binarization.
  • ItemOpen Access
    Detection and localization of tampering in a digital medical image using discrete wavelet transform
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Gadhiya, Tushar; Roy, Anil K.; Mitra, Suman K.
    Use of digital images has increased tremendously in medical science as a diagnosis tool. It made investigation easier and quick. But at the same time it raises the question of authenticity of the image under scrutiny. Authenticity of the digital image has been very important in the areas like scientific research, legal proceedings, lifestyle publications, brand marketing, forensic investigations, government documents etc. With the help of powerful and easy to use image editing software like Microsoft Paint and Photoshop, it became extremely easy to tamper with a digital image for malicious objective. Digital form of the image draws attention of many researcher towards automatic diagnosis system for image analysis and enhancement. These kinds of systems use harmless image manipulation operations like brightness enhancement, gamma correction, contrast enhancement etc. which improve quality of the image. It helps in better diagnosis. So it should not be considered as a tampering. Likely and reported tampering of malicious intention may be found in medical claims, health insurances or even legal battles in which a medical problem may influence the judicial decision. Since use of digital images in medical profession still is in nascent stage, we addressed the likelyto- be-wrong-use of such input in this thesis. We propose an algorithm to enable anybody to detect if or not a tampering is done with such malicious intention. And if it is so, the almost precise localization of such tempering can also be done successfully in a suspect digital medical image. The basis of our proposed algorithm is the hash-based representation of a digital image. We use discrete wavelet transform as a tool. It allows us to identify direction of tampering. The direction of tampering helps us converge on the tampered object in the localization area. We will show that our algorithm is robust against harmless manipulation, sensitive enough for even a minute tampering. In case of multiple tampering, proposed method is able to identify location as well as direction of multiple tampering, while some of the existing methods fail in this area. Our proposed technique is fast and generates smaller hash, as it works with smaller hash function in comparison with the similar available techniques.
  • ItemOpen Access
    Object-background segmentation from video
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Domadiya, Prashant; Mitra, Suman K.
    Fast and accurate algorithms for background-foreground separation are an essential part of

    any video surveillance system. GMM (Gaussian Mixture Models) based object segmentation

    methods give accurate results for background-foreground separation problems but are

    computationally expensive. In contrast, modeling with only single Gaussian improves the

    time complexity with the reduction in the accuracy due to variations in illumination and

    dynamic nature of the background. It is observed that these variations affect only a few

    pixels in an image. Most of the background pixels are unimodal. We propose a method

    to account for the dynamic nature of the background and low lighting conditions. It is an

    adaptive approach where each pixel is modeled as either unimodal Gaussian or multimodal

    Gaussians. The flexibility in terms of number of Gaussians used to model each pixel, along

    with learning when it is required approach reduces the time complexity of the algorithm

    significantly. To resolve problems related to false negative due to the homogeneity of color

    and texture in foreground and background, a spatial smoothing is carried out by K-means,

    which improves the overall accuracy of proposed algorithm. The shadow causes the problem

    in many applications which rely on segmentation results. Shadow cause variation in

    RGB values of pixels, RGB value dependent GMM based method can’t remove shadow

    from detection results. The preprocessing stage involving illumination invariant representation

    takes care of the object shadow as well.

  • ItemOpen Access
    Estimating depth from monocular video under varying illumination
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Sarupuri, Bhuvaneshwari; Tatu, Aditya
    Ability to perceive depth and reconstruct 3D surface of an image is a basic function of many areas of computer vision. Since 2D image is the projection of 3D scene in two dimension, the information about depth is lost. Many methods were introduced to estimate the depth using single, two or multiple images. But most of the previous work carried out in the area of depth estimation is carried out in the field of stereo-vision. These stereo techniques need two images, a whole setup to acquire them and there are many setbacks in correspondence and hardware implementation. Many cues can be used to model the relation between depth and features to learn depth from a single image using multi-scale Markov Random fields[1]. Here we use Gabor filters to extract texture variation cue and improvise the depth estimate using shape features. This same approach is used for estimating depth from videos by incorporating temporal coherence. In order to do this, optical flow is used and we introduce a novel method of computing optical flow using texture features. Since texture features extract dominant properties from an image which are almost invariant to illumination, the texture based optical flow is robust to large uniform illuminations which has lot of application in outdoor navigation and surveillance.
  • ItemOpen Access
    Automatic target image detection for morphing
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Vyas, Jaladhi; Joshi, Manjunath V.
    In this thesis,we propose a novel approach for automatic target image detection for morphing based on 3D textons and contrast. Given the source image consisting of human frontal face and training images having human and animal faces our algorithm finds the target image automatically from the target database. There are two major advantages of our approach. It solves the problem of manual selection of target image as done by the researchers in morphing community. By detecting it automatically, one may achieve smooth transition from source to destination. Our algorithm aims at finding the best target animal face image considering human face as a source. A histogram model based on 3D textons and contrast is built and chi-square distance is used between the histogram models of source and target images to find the best target. After detecting the target image, the control points for the source and target image are automatically detected using facial geometry, eye map operator and K-means clustering. The superiority of our algorithm over other methods is that it just needs source image and training database and the entire morphing process is done automatically. The experiments were conducted using four class of images that include human, cheetah, lion and monkey respectively in which human class is used as the source. Our target detection results are verified using Structural Similarity Index (SSIM) measure between source and intermediate morphed image. Experiments on a fairly large dataset have been carried out to show the usefulness and capability of our method.