M Tech Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/3
Browse
3 results
Search Results
Item Open Access Integrating semantics into biomedical information retrieval(Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Thakrar, Fenny; Majumder, PrasenjitIntegrating semantics into Biomedical Information Retrieval is concerned with studying the meaning of concepts and focusing on their relationships. We have used semantic document representation approach to applying domain-specific knowledge into the information retrieval system. Single and multi word concepts are extracted from the document using an external semantic structure UMLS Metathesaurus. Word sense disambiguation is performed on the extracted concepts to disambiguate different concept senses. And, the document is represented in the form of UMLS concepts. The documents and queries are represented in semantic space and fed to an information retrieval system to rank those documents, according to the given query. We have performed experiments on TREC 2014 CDS Task data and its 30 queries. Two types of retrieval techniques namely single word and multi word retrieval are experimented. The results obtained using conceptual information retrieval are compared with the results obtained using traditional term based retrieval. The conceptual IR approach proved better compared to term based IR system for the evaluation metrics MAP, P10 and RPrec. And, single word retrieval proved better compared to multi word retrieval technique for conceptual IR. Also, query expansion in conceptual IR system proved better compared to non query expanded conceptual IR system.Item Open Access Text retrieval from the degraded document images(Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Vasani, Hiral; Mitra, Suman K.Image binarization is used to obtain a black and white text document from a colored one. Basically, it can be taken as an image segmentation task that segments the text part from the background. Such a black and white document can be used in many applications, namely Optical Character Recognition (OCR). Text documents suffer from various types of degradations that make image binarization a challenging task. This thesis presents the work done to design a technique that segments text from the background. In this method, the document image is first darkened in order to enhance the text (foreground) in it. The text image is again processed separately so as to suppress the background. The two images so obtained are combined in such a way that the suppressed background is retained from the last image and enhanced text is used from the first image. Then this pre-processed image is binarized using an existing thresholding technique. The first binarized image is subjected to some post-processing in order to remove unwanted smaller components and other noise. The output image so obtained is compared to the ground truth results using some evaluation parameters. The results of the algorithm are compared to the existing Binarization techniques.Item Open Access Summarizing medical texts for effective retrieval(Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Iyer, Ganesh R; Majumder, PrasenjitUser centered health information retrieval is a challenging and important problem in information retrieval. In this work, we apply medical resources to bridge the vocabulary mismatch between lay-users and medical documents. We also applied text summarization techniques to reduce the document to relevant information while pruning irrelevant information. We provide a survey of medical resources and application of text summarization in information retrieval. The primary research goals were to investigate the use of medical resources in query expansion and text summarization in indexing. The experiments were performed as a part of a CLEF eHealth Task, overview of which is provided. From our experiments we observed that a summarized index can be used to replace a full collection index. Also a compression rate of 40-80% outperformed the baseline indicating that retrieval on the summarized collection can indeed improve performance. Using MeSH(Medical Subject Headings) as a thesaurus to supplement the query terms improved retrieval for certain queries. We obtained the best MAP score of 0.415, for all teams, using query expansion with discharge summaries.