Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
7 results
Search Results
Item Open Access Comparative Study: Neural Networks on MCUs at the Edge(2021) Anand, Harshita; Bhatt, AmitComputer vision has evolved excessively over the years, the sizes of the processor and camera shrinking, rising the computational complexity and power and also becoming affordable, making it achievable to be integrated onto embedded systems. It has several critical applications that require a Huge accuracy and vast real-time response in order to achieve a good user experience. The Neural network (NN) poses as an attractive choice for embedded vision architectures due to their superior performance and better accuracy in comparison to the traditional processing algorithms. Due to the security and latency issues which make larger systems unattractive for certain time-dependent applications, we require an always-on system; this application has a highly constrained power budget and needs to be typically run on tiny microcontroller systems having limited memory and compute capability. The NN design model must consider these above constraints. We have performed NN model explorations and evaluated the embedded vision applications including person detection, object detection, image classifications, and facial recognition on resource-constrained microcontrollers. We trained a variety of neural network architectures present in the literature, comparing their accuracies and memory/compute requirements. We present the possibility of optimizing the NN architectures in a way for them to be able to fit among the computational and memory criteria for the microcontroller systems without salvaging the accuracy. We also delve into the concepts of the depth-wise separable convolutional neural network (DS-CNN) and convolutional neural network (CNN) both of which are utilized in MobileNet Architecture. This thesis aims to present a comparative analysis based on the performance of edge devices in the field of embedded computer vision. The three parameters under major focus are latency, accuracy, and million operations, in this study.Item Open Access Beamforming using learning based algorithms(2020) Parekh, Naitik; Vasavada, YashWith an increasing number of subscribers to the terrestrial cellular satellite-based services, there is a resultant rise in the demand for the data rate, and there is a growing need for advanced antenna and signal processing schemes that improve the power and the spectral efficiencies. Adaptive beamforming using antenna arrays is one such technique. When multiple signals are impinging on the antenna array, beamforming can be used for increasing the signal to noise ratio (SNR) (achieved by increasing the Directivity of the formed beam along the direction of interest) and thereby for source separation/interference mitigation. In this thesis, we propose several new algorithms to improve the practical effectiveness of beamforming. These algorithms range from computationally complex closed-form solution to iterative estimation and optimization techniques. Out of all these algorithms, some require precise knowledge of the channel model, or some are based on prior assumptions, which, when violated, will deteriorate the performance of the system. The Neural Network (NN) based solutions are gaining popularity in communication system design. The NN operates in the blind mode, and it does not require a detailed a-priori mathematical model of the channel. It has shown some promising results in terms of accurately approximating some known algorithms with reduced complexity. The NN can effectively trade the performance with the complexity. Most of the applications of the NN aim at reducing the computational complexity of the existing approaches; little or no efforts have been spent to come up with an indigenous approach to do beamforming using NN.We have proposed a few beamforming schemes using NN. Our results show that the learned models can provide improvements in the suppression of interference and the number of pilot symbols required.Item Open Access Thinning of handwritten Gujarati numerals(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Panara, Bhavika; Mitra, Suman K.; Banerjee, Asimskeletonization "also known as Thinning" is one of the crucial approach of featureextraction and recognition task.Thinning is a most significant pre-processingtechnique in many image processing applications such as Character recognition,shape analysis and computer vision. Thinning is the process to find single widthpixel line from the multi-width pixel which preserve the lines,curves and arcs.Thinningmakes recognition task easy and more efficient due to the fact that thinned imageof the character is less complex than the original character. Hence, OCR systemis most reliant on the performance of the thinning algorithm.This thesis proposesmedial axis based thinning algorithm to obtain skeleton from the original character.A two-phase is used to obtain the final skeleton image. Initially, we generateprimary skeleton image using the medial axis based approach in the first phase.In the second phase, we use Auto-encoder neural network to obtain the betterskeleton. The proposed thinning algorithm help to preserve the shape of the characterand also ensure unit width. The experiment is conducted on handwrittenGujarati numerals characters. we use a large set of performance measurement parameterfor performance evaluation. we achieve a better result than other existingthinning algorithms.Item Open Access Learning cross domain relations using deep learning(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kotecha, Dhara; Joshi, Manjunath V.The Generative Adversarial Networks (GAN) have achieved exemplary performance in generating realistic images. They also perform image to image translation and produce good results for the same. In this thesis, we explore the use of GAN for performing cross domain image mapping for facial expression transfer. In facial expression transfer, the expressions of source image is transferred on the target image. We use a DiscoGAN (Discovery GAN) model for the task. Using a DiscoGAN, image of the target is generated with the facial features of the source. It uses feature matching loss along with the GAN objective and reconstruction loss. We propose a method to train the DiscoGAN with paired data of source and target images. In order to learn cross domain image mapping, we train the DiscoGAN with a batch size of 1. In our next work, we propose an algorithm to binarize the degraded document images in this thesis. We incorporate U-Net for the task at hand. We model document image binarization as a classification problem wherein we generate an image which is a result of classification of each pixel as text or background. Optimizing the cross entropy loss function, we translate the input degraded image to the corresponding binarized image. Our approach of using U-Net ensures low level feature transfer from the input degraded image to the output binarized image and thus it is better than using a simple convolution neural network. Our method of training leads to the desired results faster when both the degraded document and the ground truth binarized images are available for training and it also generalizes well. The results obtained are significantly better than the state-of-theart techniques and the approach is simpler than other deep learning approaches for document image binarization.Item Open Access Generative Adversarial Networks for Speech Technology Applications(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Shah, Neil; Patil, Hemant A.The deep learning renaissance has enabled the machines to understand the observed data in terms of a hierarchy of representations. This allows the machines to learn complicated nonlinear relationships between the representative pairs. In context of the speech, deep learning architectures, such as Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs) are the traditional supervised learning algorithms employing Maximum Likelihood (ML)-based optimization. These techniques reduce the numerical estimates between the generated and the groundtruth. However, the performance gap between the generated representation and the groundtruth in various speech applications is due to the fact that the numerical estimation may not correlate with the human perception mechanism. On the other hand, the Generative Adversarial Networks (GANs) reduces the distributional divergence, rather than minimizing the numerical errors and hence, may synthesize the samples with improved perceptual quality. However, the vanilla GAN (v-GAN) architecture generates the spectrum that may belong to the true desired distribution but may not correspond to the given spectral frames at the input. To address this issue, the Minimum Mean Square Error (MMSE) regularized, MMSE-GAN and CNN-GAN architectures are proposed for the Speech Enhancement (SE) task. The objective evaluation shows the improvement in the speech quality and suppression of the background interferences over the state-ofthe- art techniques. The effectiveness of the proposed MMSE-GAN is explored in other speech technology applications, such as Non-Audible Murmur-to-Whisper Speech Conversion (NAM2WHSP), Query-by-Example Spoken Term Detection (QbE-STD), and Voice Conversion (VC). In QbE-STD, a DNN-based GAN with a cross-entropy regularization is proposed for extracting an unsupervised posterior feature representation (uGAN-PG), trained on labeled Gaussian Mixture Model (GMM) posteriorgram. Moreover, the ability of Wasserstein GAN (WGAN) in improving the optimization stability and providing a meaningful loss metric that correlates to the generated sample quality and the generator's convergence is also exploited. To that effect, MMSE-WGAN is proposed for the VC task and its performance is compared with the MMSE-GAN and DNN-based approaches.Item Open Access Multi-class diagnosis of diabetic retinopathy using deep learning(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Shrivastava, Udit; Joshi, Manjunath V.Diabetic Retinopathy is the main cause of blindness in the modern world. As per studies, around 40-45% of people suffering from diabetes have DR in their later stages of life. All the forms of diabetic eye disease have the potential to cause vision impairment or blindness. Early stages of DR shows very small and intricate features like micro-aneurysms (swelling of blood vessels), hard exudates (protein deposits), whereas the severe and proliferative stages show more prominent features like hemorrhages (blood clot), neovascularization (abnormal growth of vessels), macular edema etc. Detecting such small and complex features through Fundus images is a very tedious and time-consuming process and requires an experienced ophthalmologist. This demands an automated diagnosis system which can vastly reduce the burden on the clinicians. In this thesis, we propose a Convolutional Neural Network (CNN) based automated diagnosis system that can classify various stages of diabetic retinopathy accurately. A hierarchical approach is adopted for classification in which we break down our classification task into two stages. In the first stage we perform binary classification and find out the true positive and negative samples and in the second stage, five class classification is performed with the images which were classified as true positive, false positive and false negative in the first stage of classification. Our proposed method uses the Inception-v3 net for feature extraction. Our proposed method uses the Inception-v3 net for feature extraction in which we use the features of second last layer features and also the features from the second last layer of an auxiliary classifier. These extracted features are concatenated into a single feature vector to train a Support Vector Machine (SVM). For multiclass classification, SVM classifies sample into one of the five classes. Experiments are conducted on "Kaggle" dataset and our proposed approach attains an accuracy of 91% on validation data for binary classification and 78% for multiclass classification. The results obtained are better than the recent methods on multiclass classification of diabetic retinopathyItem Open Access Design of architecture of artificial neural network : design and construction of a model for creation of an architecture of artificial neural network based on distributed genetic algorithms(Dhirubhai Ambani Institute of Information and Communication Technology, 2004) Rahi, Sajid S.; Chaudhary, SanjayThe objective of the work is to design and construct a model for creation of architecture of feed forward artificial neural network. The distributed genetic algorithms are used to design and construct the system. This thesis describes various encoding schemes suggested by researchers for the evolution of architecture of artificial neural network using genetic algorithm. This research proposes new encoding scheme called object� based encoding for the evolution of architecture and also proposes data structures, genetic operators and repair algorithms for the system development. For evolution of weights during training, genetic algorithm is used. For evolution of weights, two dimensional variable length encoding scheme is proposed. For the same, two-point layer crossover and average crossover are proposed. The experiments are carried out on the developed system for the problems like 3-bit even parity. Which combination of genetic operators are more efficient for better design of artificial neural network architecture, is concluded by the experiments.