M Tech Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 6 of 6
  • ItemOpen Access
    Speaker recognition over VoIP network
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Goswami, Parth A.; Patil, Hemant A.
    This thesis deals with the Automatic Speaker Recognition (ASR) system over narrowband Voice over Internet Protocol (VoIP) networks. There are several artifacts of VoIP network such as speech codec, packet loss and packet re-ordering, network jitter & echo. In this thesis, packet loss is considered as the research issue in order to investigate performance degradation for an ASR system, due to packet loss. As the voice packets travel over Internet Protocol (IP) network, they tend to take different routes. Some of them are dropped by the channel due to congestion and some are rejected by the receiver. This packet loss reduces the perceptual quality of speech. Therefore, it is natural to expect that packet loss may affects the performance of an ASR system. To alleviate this degradation in ASR system performance due to packet loss, novel interleaving schemes and lossy training method are proposed. It is shown in the present work that these interleaving schemes and lossy training methods significantly help in improving the performance of an ASR system.
  • ItemOpen Access
    Person recognition from their hum
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Madhavi, Maulik C.; Patil, Hemant A.
    In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work is based on using Mel filterbank-based cepstral features for person recognition task. The first task was consisted of data collection and corpus design procedure for humming. For this purpose, humming for old Hindi songs from around 170 subjects are used. Then feature extraction schemes were developed. Mel filterbank follows the human perception for hearing, so MFCC was used as state-of- the-art feature set. Then some modifications in filterbank structure were done in order to compute Gaussian Mel scalebased MFCC (GMFCC) and Inverse Mel scale-based MFCC (IMFCC) feature sets. In this thesis mainly two features are explored. First feature set captures the phase information via MFCC utilizing VTEO (Variable length Teager Energy Operator) in time-domain, i.e., MFCC-VTMP and second captures the vocal-source information called as Variable length Teager Energy Operator based MFCC, i.e., VTMFCC. The proposed feature set MFCCVTMP has two characteristics, viz., it captures phase information and other it uses the property of VTEO. VTEO is extension of TEO and it is a nonlinear energy tracking operator. Feature sets like VTMFCC captures the vocal-source information. This information exhibits the excitation mechanism in the speech (hum) production process. It is found to be having complementary nature of information than the vocal tract information. So the score-level fusion based approach of different source and system features improves the person recognition performance.
  • ItemOpen Access
    Human action recognition in video
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Kumari, Sonal; Mitra, Suman K.
    Action recognition is a central problem in computer vision which is also known as action recognition or object detection. Action is any meaningful movement of the human and it is used to convey information or to interact naturally without any mechanical devices. It is of utmost importance in designing an intelligent and efficient human–computer interface. The applications of action recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. Human action recognition is motivated by some of the applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people etc. In any Action Recognition System, a video stream can be captured by using a fixed camera, which may be mounted on the computer or somewhere else. Then some preprocessing steps are done for removing the noise caused because of illumination effects, blurring, false contour etc. Background subtraction is done to remove the static or slowly varying background. In this thesis, multiple background subtraction algorithms are tested and then one of them selected for action recognition system. Background subtraction is also known as foreground/background segmentation or background segmentation or foreground extraction. These terms are frequently used interchangeably in this thesis. The selection of background segmentation algorithm is done on the basis of result of these algorithms on the action database. Good background segmentation result provides a more robust basis for object class recognition. The following four methods for extracting the foreground which are tested: (1) Frame Difference, (2) Background Subtraction, (3) Adaptive Gaussian Mixture Model (Adaptive GMM) [25], and (4) Improved Adaptive Gaussian Mixture Model (Improved Adaptive GMM) [26] in which the last one gives the best result. Now the action region can be extracted in the original video sequences with the help of extracted foreground object. The next step is the feature extraction which deals with the extraction of the important feature (like corner points, optical flow, shape, motion vectors etc.) from the image frame which can be used for tracking in the video frame sequences. Feature reduction is an optional step which basically reduces the dimension of the feature vector. In order to recognize actions, any learning and classification algorithm can be employed. The System is trained by using a training dataset. Then, a new video can be classified according to the action occurring in the video. Following three features are applied for the action recognition task: (1) distance between centroid and corner point, (2) optical flow motion estimation [28, 29], (3) discrete Fourier transform (DFT) of the image block. Among these the proposed DFT feature plays very important role in uniquely identifying any specific action from the database. The proposed novel action recognition model uses discrete Fourier transform (DFT) of the small image block.

    For the experimentation, MuHAVi data [33] and DA-IICT data are used which includes various kinds of actions of various actors. Following two supervised recognition techniques are used: K-nearest neighbor (KNN) and the classifier using Mahalanobis metric. KNN is parameterized classification techniques where K parameter is to be optimized. Mahalanobis Classifier is non-parameterized classification technique, so no need to worry about parameter optimization. To check the accuracy of the proposed algorithm, Sensitivity and False alarm rate test is performed. The results of this tests show that the proposed algorithm proves to be quite accurate in action recognition in video. And to compare result of the recognition system confusion matrices are created and then compared with other recognition techniques. All the experiments are performed in MATLAB®.

  • ItemOpen Access
    Vehicle detection and tracking
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2009) Rao, K. Ramprasad; Joshi, Manjunath V.
    Real time trafficc monitoring is one of the most challenging problems in machine vision. This is one of the most sorted out research topic because of the wide spec-trum of promising applications in many areas such as smart surveillance, military applications, etc. We present a method of extracting moving targets from a real-time video stream. This approach detects and classifies vehicles in image sequences of trafficc scenes recorded by a stationary camera. Our method aims at segregating cars from non-cars and to track them through the video sequence. A classication criteria based on the features is applied to these targets to classify them into categories: cars and non-cars. Each vehicle can be described by its features. The template region is estimated by means of minimum distance approach with respect to centroid of the obtained blob of the target. Extraction of features from each frame ensures eefficiency of the tracking system.
  • ItemOpen Access
    Study of face recognition systems
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2006) Patel, Hima M.; Mitra, Suman K.
    Face Recognition comes under the general area of object recognition and has attracted researchers in the pattern recognition community for the past thirty years. The significance of this area has grown rapidly largely for surveillance purposes. This thesis is on a study of face recognition techniques. Three new algorithms have been proposed, implemented and tested using standard databases and encouraging results have been obtained for all of them. The first algorithm uses modular Principal Component Analysis (PCA) for feature extraction and a multi class SVM classifier for classification. The algorithm has been tested for frontal face images, face images with variations in expression, pose and illumination conditions. Experimental results denote a 100% classification accuracy on frontal faces, 95% accuracy on expression variation images, 78% for pose variation and 67% for illumination variation images. The next algorithm concentrates solely on the illumination variation problem. Edginess method based on one dimensional processing of signals is used to extract an edginess map. Application of PCA on the edginess images gives the weight vectors which are used as features to a multi class SVM classifier. An accuracy of 100% has been obtained, proving the method to be tolerant to illumination variations. The final part of the thesis proposes a bayesian framework for face recognition. The nodes of the bayesian classifier are modelled as a Gaussian Mixture Model (GMM) and the parameters of the nodes are learnt using Maximum Likelihood Estimation (MLE) algorithm. The inferencing is done using the junction tree inferencing algorithm. An accuracy of 93.75% has been achieved.
  • ItemOpen Access
    Fractal based approach for face recognition
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2004) Athale, Suprita; Mitra, Suman K.
    An automated face recognition system is proposed in this dissertation. The system efficiently recognizes a candidate (test) image using the interdependence of the pixel that arises from the fractal compression of the image. The interdependence of the pixels is inherent within the fractal code in the form of chain of pixels. The mechanism of capturing these chains from the fractal codes is called pixel chaining. The present face recognition system tries to match pixel chains of the candidate image with that of the images present in the database. The work domain of the system is fractal codes but not the images. This leads to an advantage towards handling large database of face images.

    The system performance is found to be very satisfactory with the recognition rate of 98.4%. A minor improvement in the performance of the system over a few existing methods has been observed.