PhD Theses
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/2
Browse
2 results
Search Results
- Item - Open Access Facial expression recognition: feature based approaches to deep learning techniques(Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Sujata; Mitra, Suman K.Facial expression recognition (FER) is a problem of pattern recognition that invites the attention of computer vision researchers for the last three decades. However, the problem is still alive due to challenges such as - blurring, illumination variation, pose variation, face image captured in the unconstrained environment, and so on. In the beginning, hand-crafted features followed by classical classification mechanism through a classifier have been studied for various features as well as various classifiers. The hand-crafted features that are associated with changes in expression are hard to extract due to the individual distinction and variations in emotional states. With the induction of deep neural network (DNN) and convolution neural network (CNN), a change in the techniques of facial expression recognition is observed both in terms of efficiency and handling various challenges mentioned above. The modular approach presented here mimics the capability of the human to identify a person with a limited facial part. Facial parts like eyes, nose, lips, and forehead contribute more to the expression recognition task. In this thesis, we have addressed classical feature-based approaches to deep learning techniques. This thesis presents approaches for Facial Expression Recognition (FER). Firstly, we propose two dimensional Taylor expansion for the facial feature extraction as well as to handle the local illumination. Most procedures just used the arrangement with global illumination varieties and thus yielded more unsatisfactory recognition performances within the case of natural illumination variations that are usually uncontrolled within the globe. Hence, to address the brightening variety issue, at that point we presented the (LL) Laplace-Logarithmic area in this article for further improving the exhibition. We applied the proposed 2D Taylor expansion theorem in the facial feature extraction phase and formulated the 2DTFP method. In our second FER approach, we propose a histogram of second-order gradients (HSOG) for the feature extraction. Most of the popular local image descriptors in the literature, such as SIFT, HOG, DAISY, LBP and GLOH, only use the first-order gradient information related to slope and elasticity, e.g., length, area, etc. of a surface, and therefore partly characterize the geometric properties of an image. We exploit the local image descriptor that extracts the histogram of second-order gradients (HSOG), which capture the local curvatures of differential geometry, i.e., cliffs, ridges, summits, valleys, basins, etc. That gives us a different shape index. The shape index is computed from the curvatures, and its different values correspond to different shapes. That different shape corresponds to different expressions of the face. Much work has been done in this field where local texture, features have been extracted and used in the classification. Due to the very local nature of this information, the dimension of the feature vector achieved for the full image is very high, posing computational challenges in real-time expression recognition. In recent times, Dimensionality Reduction methods have been successfully used in image recognition tasks. Here we propose two Dimensionality Reduction methods E-PCA (Euler Principal Component Analysis) and CS-ONPP (Orthogonal Neighborhood Preserving Projection with Class Similarity-based neighborhood). It proved to be gaining huge margin in terms of feature vector length while maintaining the same recognition accuracy. Classical FER methods do well in certain well-controlled cases. The fundamental issue with hand-crafted features based arrangement approaches is that they require space learning and not generalize well like in the complex dataset. Deep learning is fast becoming a go-to tool for many artificial intelligence problems due to its ability to overcome other approaches and even humans in many problems. DNN has millions of parameters. To get an optimal set of parameters, we need to have a lot of data to train. Even if we have a lot of data, training generally requires multiple iterations, and it takes a toll on the computing resources. The task of fine-tuning a network is to tweak the parameters of an already-trained network so that it adapts to the new task at hand. Here we propose two deep learning-based methods. The first method is DNNFG (DNN based on Fourier transform followed by Gabor filtering), where we used pre-trained model VGG16 with fine tuning for extracting the facial features. VGG16 is chosen due to the fact of its effective performance in visible detection and speedy convergence. It's concerning 138 million parameters and contains 13 convolutional layers, followed by 3 fully-connected layers (FCs). Since the VGG framework not designed for the FER tasks, so we modified the framework according to our requirements. And the second is 2DNN (Double-channel based Deep Neural Network). Where we utilized VGGFace architecture, VGGFace is trained on 2.6M face images from 2.6k different people. VGGFace architecture is the same as VGG16. Input images are just different in VGGFace other architecture is the same as VGG16. Adapt VGGFace to FER problem, VGGFace is fine-tuned. It easily utilized local and global information about the expressions. DNN based methods improved recognition accuracy compared to classical approaches. Facial expression recognition (FER) experiments are performed on a number of the benchmark FER databases. Here experiments performed on the four benchmark databases, which are JAFFE, VIDEO, CK+, OULU-CASIA. Basically thesis addresses the classical facial expression recognition approaches and its shortcomings, then moved to deep learning-based approaches to handle these shortcomings. It performed well compared to handcrafted methods. Also, experimentally proved in the thesis that a modular approach is to perform better than holistic approach.
- Item - Open Access Variants of orthogonal neighborhood preserving projections for image recognition(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Koringa, Purvi Amrutlal; Mitra, Suman K.With the increase in the resolution of image capturing sensors and data storagecapacity, a huge increase in image data is seen in past decades. This informationupsurge has created a huge challenge for machines to perform tasks such as imagerecognition, image reconstruction etc. In image data, each observation or a pixelcan be considered as a feature or a dimension, thus an image can be represented asa data point in the very high-dimensional space. Most of these high-dimensionalimages lie on or near a low-dimensional manifold. Performing machine learningalgorithms on this high-dimensional data is computationally expensive and usuallygenerates undesired results because of the redundancy present in the imagedata. Dimensionality Reduction (DR) methods exploit this redundancy withinthe high-dimensional image space and explore the underlying low-dimensionalmanifold structure based on some criteria or image properties such as correlation,similarity, pair-wise distances or neighborhood structure.This study focuses on variants of one such DR technique, Orthogonal NeighborhoodPreserving Projections (ONPP). ONPP searches for a low-dimensionalrepresentation that preserves the local neighborhood structure of high-dimensionalspace. This thesis studies and addresses some of the issues with the existingmethod and provides the solution for the same. ONPP is a three-step procedure,in which the first step defines a local neighborhood followed by the secondstep which defines locally linear neighborhood relationship in high-dimensionalspace, the third step seeks a lower-dimensional subspace that preserved the relationshipsought in the second step.The major issues with existing ONPP technique are local linearity assumptioneven with varying size of the neighborhood, strict distance based or classmembership based neighborhood selection rule, non-normalized projections orsusceptibility to the presence of outliers in the data. This study proposes variviiants of ONPP by suggesting modification in each of these steps to tackle abovementioned problems that better suit image recognition application. This thesisalso proposes a 2-dimensional variant that overcomes the limitation of NeighborhoodPreserving Projections (NPP) and Orthogonal Neighborhood PreservingProjections (ONPP) while performing image reconstruction. All the new proposalsare tested on benchmark data-sets of face recognition and handwritten numeralsrecognition. In all cases, the new proposals outperform the conventionalmethod in terms of recognition accuracy with reduced subspace dimensions.Keywords: Dimensionality Reduction, manifold learning, embeddings, NeighborhoodPreserving Projection (NPP), Orthogonal Neighborhood Preserving Projections(ONPP), image recognition, face recognition, text recognition, image reconstruction
