Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
Search Results
Item Open Access Semantic Segmentation Based Object Detection for Autonomous Driving(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Prajapati, Harsh; Maiti, Tapas KumarThis research focuses on solving the autonomous driving problem which is necessaryto fulfill the increasing demand of autonomous systems in today�s world.The key aspect in addressing this challenge is the real-time identification andrecognition of objects within the driving environment. To accomplish this, weemploy the semantic segmentation technique, integrating computer vision, machinelearning, deep learning, the PyTorch framework, image processing, and therobot operating system (ROS). Our approach involves creating an experimentalsetup using an edge device, specifically a Raspberry Pi, in conjunction with theROS framework. By deploying a deep learning model on the edge device, we aimto build a robust and efficient autonomous system that can accurately identifyand recognize objects in real time.Item Open Access Position Estimation of Intelligent Artificial Systems Using 3D Point Cloud(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Patel, Vraj; Maiti, Tapas KumarThe three-dimensional reality collected by various sensors such as LiDAR scanners,depth cameras and stereo cameras, is represented by point cloud data. Thecapacity of point clouds to provide rich geometric information about the surroundingsmakes them essential in various applications. Robotics, autonomouscars, augmented reality, virtual reality, and 3D reconstruction all use point clouds.They allow for object detection, localization, mapping, scene comprehension, andlightweight LiDAR SLAM has significant implications for various fields, includingrobotics, autonomous navigation, and augmented reality. Developing compactand efficient LiDAR SLAM systems makes it possible to unlock the potentialof lightweight platforms, enabling their deployment in a wide range of applicationsthat require real-time position mapping, and localization capabilities whileensuring practicality, portability, and cost-effectiveness.immersive visualization. Working with point clouds, on the other hand, presentssubstantial complications. Some primary issues are managing a vast volumeof data, dealing with noise and outliers, dealing with occlusions and missingdata, and conducting efficient processing and analysis. Furthermore, point cloudsfrequently necessitate complicated registration, segmentation, feature extraction,and interpretation methods, necessitating computationally costly processing. Addressingthese issues is critical for realizing the full potential of point cloud datain a variety of real-world applications.SLAM is a key technique in robotics and computer vision that addresses the challengeof estimating a robot�s pose and constructing a map of its environment. Itfinds applications in driverless cars, drones, and augmented reality, enabling autonomousnavigation without external infrastructure or GPS. Challenges includesensor noise, drift, and uncertainty, requiring robust sensor calibration, motionmodeling, and data association. Real-time speed, computing constraints, andmemory limitations are important considerations. Advanced techniques such asfeature extraction, point cloud registration, loop closure detection, and Graph-SLAM optimization algorithms are used. Sensor fusion, map representation, anddata association techniques are vital for reliable SLAM performance.The aim is to create a compact and lightweight LiDAR based SLAM that can beeasily integrated into various platforms without compromising on the accuracyand reliability of SLAM algorithms. Hence, we implemented a lightweight SLAMalgorithm on our dataset with various background situations and a few modificationsto the existing SLAM algorithm to improve the results. We have performedSLAM by using LiDAR sensor without the use of IMU or GPS sensor. TheItem Open Access Common Object Segmentation in Dynamic Image Collection using Attention Mechanism(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Baid, Sana; Hati, AvikSemantic segmentation of image groups is a crucial task in computer vision that aims to identify shared objects in multiple images. This work presents a deep neural network framework that exhibits congruity between images, thereby cosegmenting common objects. The proposed network is an encoderdecoder network where the encoder extracts high level semantic feature descriptors and the decoder generates segmentation masks. The task of cosegmentation between the images is boosted by an attention mechanism that leverages semantic similarity between feature descriptors. This attention mechanism is responsible for understanding the correspondence between the features, thereby determining the shared objects. The resultant masks localize the shared foreground objects while suppressing everything else as background. We have explored multiple attention mechanisms in 2 image input setup and have extended the model that outper forms the others for dynamic image input setup. The term dynamic image connotes that varying number of images can be input to the model, simultaneously, and the result will be the segmentation of common object from all of the input images. The model is trained end to end on image group dataset generated from the PASCALVOC 2012 [7] dataset. The experiments are conducted on other benchmark datasets as well and we can infer superiority of our model from the results achieved. Moreover, an important advantage of the proposed model is that it runs in linear time as opposed to quadratic time complexity observed in most works.Item Open Access Comparative Study: Neural Networks on MCUs at the Edge(2021) Anand, Harshita; Bhatt, AmitComputer vision has evolved excessively over the years, the sizes of the processor and camera shrinking, rising the computational complexity and power and also becoming affordable, making it achievable to be integrated onto embedded systems. It has several critical applications that require a Huge accuracy and vast real-time response in order to achieve a good user experience. The Neural network (NN) poses as an attractive choice for embedded vision architectures due to their superior performance and better accuracy in comparison to the traditional processing algorithms. Due to the security and latency issues which make larger systems unattractive for certain time-dependent applications, we require an always-on system; this application has a highly constrained power budget and needs to be typically run on tiny microcontroller systems having limited memory and compute capability. The NN design model must consider these above constraints. We have performed NN model explorations and evaluated the embedded vision applications including person detection, object detection, image classifications, and facial recognition on resource-constrained microcontrollers. We trained a variety of neural network architectures present in the literature, comparing their accuracies and memory/compute requirements. We present the possibility of optimizing the NN architectures in a way for them to be able to fit among the computational and memory criteria for the microcontroller systems without salvaging the accuracy. We also delve into the concepts of the depth-wise separable convolutional neural network (DS-CNN) and convolutional neural network (CNN) both of which are utilized in MobileNet Architecture. This thesis aims to present a comparative analysis based on the performance of edge devices in the field of embedded computer vision. The three parameters under major focus are latency, accuracy, and million operations, in this study.Item Open Access On designing DNA codes and their applications(Dhirubhai Ambani Institute of Information and Communication Technology, 2019) Limbachiya, Dixita; Gupta, Manish K.Bio-computing uses the complexes of biomolecules such as DNA (Deoxyribonucleic acid), RNA (Ribonucleic acid) and proteins to perform the computational processes for encoding and processing the data. In 1994, L. Adleman introduced the field of DNA computing by solving an instance of the Hamiltonian path problem using the bunch of DNA sequences and biotechnology lab methods. An idea of DNA hybridization was used to perform this experiment. DNA hybridization is a backbone for any computation using the DNA sequences. However, it is also cause of errors. To use the DNA for computing, a specific set of the DNA sequences (DNA codes) which satisfies particular properties (DNA codes constraints) that avoid cross-hybridization are designed to perform a particular task. Contributions of this dissertation can be broadly divided into two parts as 1) Designing the DNA codes by using algebraic coding theory. 2) Codes for DNA data storage systems to encode the data in the DNA. The main research objective in designing the DNA codes over the quaternary alphabets {A, C, G, T}, is to find the largest possible set of M codewords each of length n such that they are at least at the distance d and satisfies the desired constraints which are feasible with respect to practical implementation. In the literature, various computational and theoretical approaches have been used to design a set of DNA codes which are sufficiently dissimilar. Furthermore, DNA codes are constructed using coding theoretic approaches using fields and rings. In this dissertation, one such approach is used to generate the DNA codes from the ring R = Z4 + wZ4, where w2 = 2 + 2w. Some of the algebraic properties of the ring R are explored. In order to define an isometry from the elements of the ring R to DNA, a new distance called Gau distance is defined. The Gau distance motivates the distance preserving map called Gau map f. Linear and closure properties of the Gau map are obtained. General conditions on the generator matrix over the ring R to satisfy reverse and reverse complement constraints on the DNA code are derived. Using this map, several new classes of the DNA codes which satisfies the Hamming distance, reverse and reverse complement constraints are given. The families of the DNA codes via the Simplex type codes, first order and rth order Reed-Muller type codes and Octa type codes are developed. Some of the general results on the generator matrix to satisfy the reverse and reverse complement constraints are given. Some of the constructed DNA codes are optimal with respect to the bounds on M, the size of the code. These DNA codes can be used for a myriad of applications, one of which is data storage. DNA is stable, robust and reliable. Theoretically, it is estimated that one gram of DNA can store 455 EB (1 Exabyte = 1018 bytes). These properties make the DNA a potential candidate for data storage. However, there are various practical constraints for the DNA data storage system. In this work, we construct DNA codes with some of the DNA constraints to design efficient codes to store data in DNA. One of the practical constraints in designing DNA codes for storage is the repeated bases (runlengths) of the same DNA nucleotides. Hence, it is essential that each DNA codeword should avoid long runlengths. In this thesis, codes are proposed for data storage that will dis-allow runlengths of any base to develop DNA data storage error-free codes. A fixed GC-weight u (the occurrence of G and C nucleotides in a DNA codeword) is another requirement for DNA codewords used in DNA storage. DNA codewords with large GC-weight lead to insertion and deletion (indel) errors in DNA reading and amplification process thus, it is crucial to consider a fixed GCweight for DNA code. In this work, we propose methods that generate families of codes for the DNA data storage systems that satisfy no-runlength and fixed GC-weight constraints for the DNA codewords used for data storage. The first is the constrained codes which use the quaternary code and the second is DNA Golay subcodes that use the ternary encoding. The constrained quaternary coding is presented to generate DNA codes for the data storage. We give a construction algorithm for finding families of DNA codes with the no-runlength and fixed GC-weight constraints. The number of DNA codewords of fixed GC-weight with the no-runlength constraint is enumerated. We note that the prior work only gave bounds on the number of such codewords while in this work we count the number of these DNA codewords exactly. We observe that the bound mentioned in the previous work does not take into account the distance of the code which is essential for data reliability. Thus, we consider distance to obtain a lower bound on the number of codewords along with the fixed GC-weight and no-runlength constraints. In the second method, we demonstrate the Golay subcode method to encode the data in a variable chunk architecture of the DNA using ternary encoding. N.Goldman et al. introduced the first proof of concept of the DNA data storage in 2013 by encoding the data without using error correction in the DNA which motivated us to implement this method. While implementing this method, a bottleneck of this approach was identified which limited the amount of data that can be encoded due to fix length chunk architecture used for data encoding. In this work, we propose a modified scheme using a non-linear family of ternary codes based on the Golay subcode that includes flexible length chunk architecture for data encoding in DNA. By using the Golay ternary subcode, two substitution errors can be corrected. In a nutshell, the significant contributions of this thesis are designing DNA codes with specific constraints. First, DNA codes from the ring using algebraic coding by defining a new type of distance (Gau distance) and map (Gau map) are proposed. These DNA codes satisfy reverse, reverse complement and complement with the minimum Hamming distance constraints. Several families of these DNA codes and their properties are studied. Second, DNA codes using constrained coding and Golay subcode method are developed that satisfy norunlength and GC-weight constraints for a DNA data storage system.Item Open Access Image question and answering(Dhirubhai Ambani Institute of Information and Communication Technology, 2019) Mehta, Archan; Khare, ManishIn the past few years, Question Answering on Images has achieved a lot of attention from researchers. It is a must-researched topic, since it covers both the domain of Computer Vision and Natural Language Processing. Recent Advances have found out that it also covers Knowledge Representation and Reasoning. Datasets for Image Question Answering have been created since 2014, but all have their own drawbacks, since implementing it with different methods, one can get different accuracy. I have studied and analyzed different approaches made to solve the problem, and which technique is more efficient on a given datasets. Different datasets contains different type of question pairs and images. We have also analyzed about the different types of datasets used, and based on that also analyze which one is better and gives better results, and also see what future work is possible and how datasets as well as methods can be improved in the future.Item Open Access Locality preserving projection: a study and applications(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Shikkenawis, Gitam; Mitra, Suman KLocality Preserving Projection (LPP) is a recently proposed approach for dimensionality reduction that preserves the neighbourhood information and obtains a subspace that best detects the essential data manifold structure. Currently it is widely used for finding the intrinsic dimensionality of the data which is usually of high dimension. This characteristic of LPP has made it popular among other available dimensionality reduction approaches such as Principal Component Analysis (PCA). A study on LPP reveals that it tries to preserve the information about nearest neighbours of data points, thus may lead to misclassification in the overlapping regions of two or more classes while performing data analysis. It has also been observed that the dimension reducibility capacity of conventional LPP is much less than that of PCA. A new proposal called Extended LPP (ELPP) which amicably resolves two issues mentioned above is introduced. In particular, a new weighing scheme is designed that pays importance to the data points which are at a moderate distance, in addition to the nearest points. This helps to resolve the ambiguity occurring at the overlapping regions as well as increase the reducibility capacity. LPP is used for a variety of applications for reducing the dimensions one of which is Face Recognition. Face Recognition is one of the most widely used biometric technology for person identification. Face images are represented as highdimensional pixel arrays and due to high correlation between the neighbouring pixel values; they often belong to an intrinsically low dimensional manifold. The distribution of data in a high dimensional space is non-uniform and is generally concentrated around some kind of low dimensional structures. Hence, one of the ways of performing Face Recognition is by reducing the dimensionality of the data and finding the subspace of the manifold in which face images reside. Both LPP and ELPP are used for Face and Expression Recognition tasks. As the aim is to separate the clusters in the embedded space, class membership information may add more discriminating power. With this in mind, the proposal is further extended to the supervised version of LPP (SLPP) that uses the known class labels of data points to enhance the discriminating power along with inheriting the properties of ELPPItem Open Access Fingerprint image preprocessing for robust recognition(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Munshi, Paridhi; Mitra, Suman KFingerprint is the oldest and most widely used form of biometric identification. Since they are mainly used in forensic science, accuracy in the fingerprint identification is highly important. This accuracy is dependent on the quality of image. Most of the fingerprint identification systems are based on minutiae matching and a critical step in correct matching of fingerprint minutiae is to reliably extract minutiae from the fingerprint images. However, fingerprint images may not be of good quality. They may be degraded and corrupted due to variations in skin, pressure and impression conditions. Most of the feature extraction algorithms work on binary images instead of the gray scale image and results of the feature extraction depends upon the quality of binary image used. Keeping these points in mind, image preprocessing including enhancement and binarization is proposed in this work. This preprocessing is employed prior to minutiae extraction to obtain a more reliable estimation of minutiae locations and hence to get a robust matching performance. In this dissertation, we give an introduction to the ngerprint structure and identification system . A discussion on the proposed methodology and implementation of technique for fingerprint image enhancement is given. Then a rough-set based method for binarization is proposed followed by the discussion on the methods for minutiae extraction. Experiments are conducted on real fingerprint images to evaluate the performance of the implemented techniques.Item Open Access Back-view based visual hand gesture recognition system(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Sharma, Harish; Banerjee, AsimGesture recognition is a fascinating area of research due to its applications to HCI (human computer interaction), entertainment, and communication between deaf/ mute people etc. Gesture can be dynamic or static depending upon the application. Static gestures can be called postures. Dynamic gestures are collection or sequence of postures. Our method is an attempt to classify various postures in American Sign Language (ASL) for a wearable computer device like “Sixth Sense” (developed at MIT media lab) [17]. We are working with new set of features including verticalhorizontal histogram of a posture-shape. We are using Linear Discriminant Analyzer (LDA) Classifier for the purpose of classification. Also, our work is an attempt to raise some issues regarding the kind of problem that can rise during posture-shape recognition and how a simple classification technique with a new feature set can give fairly good results.Item Open Access Human action recognition in video(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Kumari, Sonal; Mitra, Suman K.Action recognition is a central problem in computer vision which is also known as action recognition or object detection. Action is any meaningful movement of the human and it is used to convey information or to interact naturally without any mechanical devices. It is of utmost importance in designing an intelligent and efficient human–computer interface. The applications of action recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. Human action recognition is motivated by some of the applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people etc. In any Action Recognition System, a video stream can be captured by using a fixed camera, which may be mounted on the computer or somewhere else. Then some preprocessing steps are done for removing the noise caused because of illumination effects, blurring, false contour etc. Background subtraction is done to remove the static or slowly varying background. In this thesis, multiple background subtraction algorithms are tested and then one of them selected for action recognition system. Background subtraction is also known as foreground/background segmentation or background segmentation or foreground extraction. These terms are frequently used interchangeably in this thesis. The selection of background segmentation algorithm is done on the basis of result of these algorithms on the action database. Good background segmentation result provides a more robust basis for object class recognition. The following four methods for extracting the foreground which are tested: (1) Frame Difference, (2) Background Subtraction, (3) Adaptive Gaussian Mixture Model (Adaptive GMM) [25], and (4) Improved Adaptive Gaussian Mixture Model (Improved Adaptive GMM) [26] in which the last one gives the best result. Now the action region can be extracted in the original video sequences with the help of extracted foreground object. The next step is the feature extraction which deals with the extraction of the important feature (like corner points, optical flow, shape, motion vectors etc.) from the image frame which can be used for tracking in the video frame sequences. Feature reduction is an optional step which basically reduces the dimension of the feature vector. In order to recognize actions, any learning and classification algorithm can be employed. The System is trained by using a training dataset. Then, a new video can be classified according to the action occurring in the video. Following three features are applied for the action recognition task: (1) distance between centroid and corner point, (2) optical flow motion estimation [28, 29], (3) discrete Fourier transform (DFT) of the image block. Among these the proposed DFT feature plays very important role in uniquely identifying any specific action from the database. The proposed novel action recognition model uses discrete Fourier transform (DFT) of the small image block.For the experimentation, MuHAVi data [33] and DA-IICT data are used which includes various kinds of actions of various actors. Following two supervised recognition techniques are used: K-nearest neighbor (KNN) and the classifier using Mahalanobis metric. KNN is parameterized classification techniques where K parameter is to be optimized. Mahalanobis Classifier is non-parameterized classification technique, so no need to worry about parameter optimization. To check the accuracy of the proposed algorithm, Sensitivity and False alarm rate test is performed. The results of this tests show that the proposed algorithm proves to be quite accurate in action recognition in video. And to compare result of the recognition system confusion matrices are created and then compared with other recognition techniques. All the experiments are performed in MATLAB®.