M Tech Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3
Browse
Search Results
Item Open Access Analysis of voice biometric attacks: detection of synthetic vs natural speech(Dhirubhai Ambani Institute of Information and Communication Technology, 2014) S, Adarsa; Patil, Hemant A.The improvement in text-to-speech (TTS) synthesis also poses the problem of biometric attack on speaker verification system. In this context, it is required to analyse the performance of these system for false acceptance rate to impostor using artificial speech and incorporate features in the system to make it robust to these attacks. The aim of this study, is to understand different aspects and hence extract appropriate features for distinction of natural and synthetic speech. The study focuses on understanding those aspects which gives naturalness to human speech that the present day TTS systems fail to capture. Three different aspects, viz., Fourier transform phase, nonlinearity and speech prosody are analysed. The performance of each feature is evaluated and a comparative study of each of the features is presented. The results obtained provides an evaluation of the naturalness of the synthetic speech used and provides features to improve robustness against biometric attacks in speaker verification systems.Item Open Access Locality preserving projection: a study and applications(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Shikkenawis, Gitam; Mitra, Suman KLocality Preserving Projection (LPP) is a recently proposed approach for dimensionality reduction that preserves the neighbourhood information and obtains a subspace that best detects the essential data manifold structure. Currently it is widely used for finding the intrinsic dimensionality of the data which is usually of high dimension. This characteristic of LPP has made it popular among other available dimensionality reduction approaches such as Principal Component Analysis (PCA). A study on LPP reveals that it tries to preserve the information about nearest neighbours of data points, thus may lead to misclassification in the overlapping regions of two or more classes while performing data analysis. It has also been observed that the dimension reducibility capacity of conventional LPP is much less than that of PCA. A new proposal called Extended LPP (ELPP) which amicably resolves two issues mentioned above is introduced. In particular, a new weighing scheme is designed that pays importance to the data points which are at a moderate distance, in addition to the nearest points. This helps to resolve the ambiguity occurring at the overlapping regions as well as increase the reducibility capacity. LPP is used for a variety of applications for reducing the dimensions one of which is Face Recognition. Face Recognition is one of the most widely used biometric technology for person identification. Face images are represented as highdimensional pixel arrays and due to high correlation between the neighbouring pixel values; they often belong to an intrinsically low dimensional manifold. The distribution of data in a high dimensional space is non-uniform and is generally concentrated around some kind of low dimensional structures. Hence, one of the ways of performing Face Recognition is by reducing the dimensionality of the data and finding the subspace of the manifold in which face images reside. Both LPP and ELPP are used for Face and Expression Recognition tasks. As the aim is to separate the clusters in the embedded space, class membership information may add more discriminating power. With this in mind, the proposal is further extended to the supervised version of LPP (SLPP) that uses the known class labels of data points to enhance the discriminating power along with inheriting the properties of ELPPItem Open Access Fingerprint image preprocessing for robust recognition(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Munshi, Paridhi; Mitra, Suman KFingerprint is the oldest and most widely used form of biometric identification. Since they are mainly used in forensic science, accuracy in the fingerprint identification is highly important. This accuracy is dependent on the quality of image. Most of the fingerprint identification systems are based on minutiae matching and a critical step in correct matching of fingerprint minutiae is to reliably extract minutiae from the fingerprint images. However, fingerprint images may not be of good quality. They may be degraded and corrupted due to variations in skin, pressure and impression conditions. Most of the feature extraction algorithms work on binary images instead of the gray scale image and results of the feature extraction depends upon the quality of binary image used. Keeping these points in mind, image preprocessing including enhancement and binarization is proposed in this work. This preprocessing is employed prior to minutiae extraction to obtain a more reliable estimation of minutiae locations and hence to get a robust matching performance. In this dissertation, we give an introduction to the ngerprint structure and identification system . A discussion on the proposed methodology and implementation of technique for fingerprint image enhancement is given. Then a rough-set based method for binarization is proposed followed by the discussion on the methods for minutiae extraction. Experiments are conducted on real fingerprint images to evaluate the performance of the implemented techniques.Item Open Access Person identification using face and speech(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Parmar, Ajay; Joshi, Manjunath V.In this thesis, we present a multimodal biometric system using face and speech features. Multimodal biometrics system uses two or more intrinsic physical or behaviour traits to provide better recognition rate than unimodal biometric systems. Face recognition is built using principal component analysis (PCA) and the Gabor filters. In Face recognition, PCA is applied to Gabor filter bank response of the face images. Speaker recognition is built using amplitude modulation - frequency modulation (AM-FM) features. AM-FM features are weighted-instantaneous frequency of the analytical signal. Finally, weighted sum of score of face and speaker recognition system is used for person identification. Performance of our system is evaluated by using ORL database for face images and ELSDSR database for speech. Experimental results show better recognition rate for the multimodal sytem when compared to unimodal systemItem Open Access Person recognition from their hum(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Madhavi, Maulik C.; Patil, Hemant A.In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work is based on using Mel filterbank-based cepstral features for person recognition task. The first task was consisted of data collection and corpus design procedure for humming. For this purpose, humming for old Hindi songs from around 170 subjects are used. Then feature extraction schemes were developed. Mel filterbank follows the human perception for hearing, so MFCC was used as state-of- the-art feature set. Then some modifications in filterbank structure were done in order to compute Gaussian Mel scalebased MFCC (GMFCC) and Inverse Mel scale-based MFCC (IMFCC) feature sets. In this thesis mainly two features are explored. First feature set captures the phase information via MFCC utilizing VTEO (Variable length Teager Energy Operator) in time-domain, i.e., MFCC-VTMP and second captures the vocal-source information called as Variable length Teager Energy Operator based MFCC, i.e., VTMFCC. The proposed feature set MFCCVTMP has two characteristics, viz., it captures phase information and other it uses the property of VTEO. VTEO is extension of TEO and it is a nonlinear energy tracking operator. Feature sets like VTMFCC captures the vocal-source information. This information exhibits the excitation mechanism in the speech (hum) production process. It is found to be having complementary nature of information than the vocal tract information. So the score-level fusion based approach of different source and system features improves the person recognition performance.Item Open Access Human action recognition in video(Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Kumari, Sonal; Mitra, Suman K.Action recognition is a central problem in computer vision which is also known as action recognition or object detection. Action is any meaningful movement of the human and it is used to convey information or to interact naturally without any mechanical devices. It is of utmost importance in designing an intelligent and efficient human–computer interface. The applications of action recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. Human action recognition is motivated by some of the applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people etc. In any Action Recognition System, a video stream can be captured by using a fixed camera, which may be mounted on the computer or somewhere else. Then some preprocessing steps are done for removing the noise caused because of illumination effects, blurring, false contour etc. Background subtraction is done to remove the static or slowly varying background. In this thesis, multiple background subtraction algorithms are tested and then one of them selected for action recognition system. Background subtraction is also known as foreground/background segmentation or background segmentation or foreground extraction. These terms are frequently used interchangeably in this thesis. The selection of background segmentation algorithm is done on the basis of result of these algorithms on the action database. Good background segmentation result provides a more robust basis for object class recognition. The following four methods for extracting the foreground which are tested: (1) Frame Difference, (2) Background Subtraction, (3) Adaptive Gaussian Mixture Model (Adaptive GMM) [25], and (4) Improved Adaptive Gaussian Mixture Model (Improved Adaptive GMM) [26] in which the last one gives the best result. Now the action region can be extracted in the original video sequences with the help of extracted foreground object. The next step is the feature extraction which deals with the extraction of the important feature (like corner points, optical flow, shape, motion vectors etc.) from the image frame which can be used for tracking in the video frame sequences. Feature reduction is an optional step which basically reduces the dimension of the feature vector. In order to recognize actions, any learning and classification algorithm can be employed. The System is trained by using a training dataset. Then, a new video can be classified according to the action occurring in the video. Following three features are applied for the action recognition task: (1) distance between centroid and corner point, (2) optical flow motion estimation [28, 29], (3) discrete Fourier transform (DFT) of the image block. Among these the proposed DFT feature plays very important role in uniquely identifying any specific action from the database. The proposed novel action recognition model uses discrete Fourier transform (DFT) of the small image block.For the experimentation, MuHAVi data [33] and DA-IICT data are used which includes various kinds of actions of various actors. Following two supervised recognition techniques are used: K-nearest neighbor (KNN) and the classifier using Mahalanobis metric. KNN is parameterized classification techniques where K parameter is to be optimized. Mahalanobis Classifier is non-parameterized classification technique, so no need to worry about parameter optimization. To check the accuracy of the proposed algorithm, Sensitivity and False alarm rate test is performed. The results of this tests show that the proposed algorithm proves to be quite accurate in action recognition in video. And to compare result of the recognition system confusion matrices are created and then compared with other recognition techniques. All the experiments are performed in MATLAB®.
Item Open Access Eye localization in video: a hybrid approach(Dhirubhai Ambani Institute of Information and Communication Technology, 2010) Kansara, Bena; Mitra, Suman K.Location of eyes is an important process for operations such as orientation correction, which are necessary pre-processes for face recognition. As eyes are one of the main features of the human face, the success of facial feature analysis and face recognition depends greatly on eye detection. It is advantageous to detect eyes before other facial features because the position of other facial features can be estimated using eye position and golden ratio. Since relative position of eyes and interocular distance are nearly constant for different individuals, eye localization is also useful in face normalization. Hence, eye localization is a very important component for any face recognition system. Various approaches to eye localization have been proposed and can be classified as feature based approaches, template based approaches and appearance based approaches. Feature based methods explore eye characteristics - such as edge and intensity of iris - to identify some distinctive features around the eyes. In template based methods, a generic model of eye shape is designed; this template is then matched to the face image pixel by pixel to find the eyes. Appearance based methods detect eyes based on their photometric appearance. Template based and appearance based methods can detect eyes accurately but they are not efficient when considering time factor while feature based methods are efficient but do not give accurate results. So, by combining feature based method with template based or appearance based method, we can get better results. In the proposed algorithm, we have combined feature based eye LEM approach proposed by Mihir Jain, Suman K. Mitra, Naresh D. Jotwani in 2008 and appearance based Bayesian classi_er approach proposed by Everingham M., Zisserman A. in 2006 to achieve eye localization. The work of localizing eyes in a video is motivated by some of the applications where eye localization can serve very useful purpose such as to find drowsiness of a person driving a car, eye based control of computer systems for people with motor difficulties. To carry out eye localization, after doing some preprocessing which include frame separation from video and to convert it into gray-scale images, the proposed algorithm is applied on each of these frames. For the experimentation, we have taken videos of few people in the normal blink condition as well as in the sleepy condition. All the videos have been taken in the lab environment. To check the accuracy of the proposed algorithm, we have performed various tests, namely, Wilcoxon signed rank test, Mann-Whitney U test, Kolmogorov-Smirnov test, Sensitivity and False alarm rate tests. And the results of these tests show that the proposed algorithm proves to be quite accurate in localizing the eyes in a video. All the experiments have been carried out in MATLAB.