Theses and Dissertations

Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1

Browse

Search Results

Now showing 1 - 3 of 3
  • ItemOpen Access
    Estimating depth from monocular video under varying illumination
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Sarupuri, Bhuvaneshwari; Tatu, Aditya
    Ability to perceive depth and reconstruct 3D surface of an image is a basic function of many areas of computer vision. Since 2D image is the projection of 3D scene in two dimension, the information about depth is lost. Many methods were introduced to estimate the depth using single, two or multiple images. But most of the previous work carried out in the area of depth estimation is carried out in the field of stereo-vision. These stereo techniques need two images, a whole setup to acquire them and there are many setbacks in correspondence and hardware implementation. Many cues can be used to model the relation between depth and features to learn depth from a single image using multi-scale Markov Random fields[1]. Here we use Gabor filters to extract texture variation cue and improvise the depth estimate using shape features. This same approach is used for estimating depth from videos by incorporating temporal coherence. In order to do this, optical flow is used and we introduce a novel method of computing optical flow using texture features. Since texture features extract dominant properties from an image which are almost invariant to illumination, the texture based optical flow is robust to large uniform illuminations which has lot of application in outdoor navigation and surveillance.
  • ItemOpen Access
    Pose estimation from one conic correspondence
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Bhayani, Snehal I.; Tatu, Aditya
    In this thesis we attempt to solve the problem of camera pose estimation from one conic correspondence by exploiting the epipolar geometry. For this we make two important assumptions which simplify the geometry further. The assumptions are, (a) The scene conic is a circle and (b) The translation vector is contained in a known plane. These two assumptions are justified by noting that many artifacts in scenes(especially indoor scenes), contain circles, which are wholly in front of the camera. Additionally, there is a good possibility that the plane which contains the translation vector would be known. Through the epipolar geometry framework, a matrix equation is defined which relates the camera pose to one conic correspondence and the normal vector defining the scene plane. Through the assumptions, we simplify the system of polynomials in such a way that the task involving solution to a set of seven simultaneous polynomials in seven equations, is transformed into a task of solving only two polynomials in two variables, at the same time. For this we design a geometric construction. This method gives a set of finitely many camera pose solutions. We test our propositions through synthetic datasets and suggest an observation which helps in selecting a unique solution from the finite set of pose solutions. For synthetic dataset, the solution so obtained is quite accurate with an error of 10��4, and for real datasets, the solution is erroneous due to errors in camera calibration data we have. We justify this fact through an experiment. Additionally, the formulation of above mentioned seven equations relating the pose to conic correspondence and scene plane position, helps to understand that, how does the relative pose establish point and conic correspondences between the two images. We then compare the performance of our geometric approach with the conventional way of optimizing a cost function and show that the geometric approach gives us more accurate pose solutions.
  • ItemOpen Access
    Text description of image
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Sisodiya, Neha; Joshi, Manjunath V.
    Image comprises of something which is easy for human beings to understand but difficult for a machine to interpret. In this thesis, we propose an algorithm to obtain the textual description of the image content. In order to generate the output for a given image in terms of meaningful sentences that describes the image in the input, we have developed a stepwise procedure to fulfil the task. Problem statement is, given an image as an input, our system automatically generates the text description of the input image as output. Our aim is to understand scenario in an image i.e., describing given image automatically into simple sentences (English language). To accomplish our task four steps are involved 1) Segmentation 2) Recognition 3) Labelling 4) Sentence generation In first step segmentation is carried out using a novel approach of active contour model to separate the objects and background in the image. In order to separate the objects boundaries to get different regions present in the image first the segmentation is done which is helpful in the second step i.e., object Recognition. The object recognition is task of detecting and identifying objects in the scene depending on the feature vectors extracted from the image regions. We have extracted the features using SIFT (Scale Invariant Feature Transform) due to their invariant properties for recognition of an object. SIFT provides key point descriptors which we have used for labelling the object. In our method we try to recognize occluded and cluttered objects in the image and simultaneously improve segmentation by recognition and vice-a-versa. The next step is labelling the recognized objects i.e., which category the object belongs to and associate a label with it which is useful in next step i.e., generation of sentences. We have used SVM (Support Vector Machine) classifier for classifying the objects. Our final step involves generation and this is accomplished by linking labels by their meanings and form meaningful sentences as an output of our system.