Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
2 results
Search Results
Item Open Access Entity Based Query Processing For Retrieval And Summarization In Biomedical Domain(Dhirubhai Ambani Institute of Information and Communication Technology, 2021) Sankhavara, Jainisha; Majumder, PrasenjitExponential growth of biomedical literature poses different challenges in searching. To address complex information needs of the users, rigorous semantic processing of biomedical text is required. Biomedical information access emerges out as a new discipline for this reason. Traditional information access methods of matching, ranking, entity processing, entity-entity relationship processing, etc. are challenged in this domain. These are the major building blocks used to frame queries that represent complex information need in the area of biomedical and clinical information access. This thesis aims to do query processing using different IR and bioNLP techniques and to study their effects in retrieval and summarization. Various techniques of biomedical query reformulations are carried out and compared for biomedical document retrieval. Query expansion is one query reformulation technique which was carried out using relevance feedback and pseudo relevance feedback for biomedical document retrieval. Relevance feedback approach uses information regarding actual relevant documents to the query for feedback while pseudo relevance feedback approach does not have such information and uses top retrieved documents for feedback as they are assumed to be relevant to the query. One combined approach of relevance feedback and pseudo relevance feedback has been proposed which is based on feedback documentdiscovery and uses various classification and clustering techniques on biomedical documents to identify good document for feedback. This approach uses relevance feedback for a number of documents and tries to learn relevance for other documents for feedback. This feedback document discovery based query expansion approach shows improvement over relevance feedback based query expansion technique for biomedical document retrieval. An improved version of this feedback document discovery based query expansion approach where the features of entities are weighted based on the type of the entities and query is also proposed which shows improvement of the document retrieval system over the previous one without feature weighting. Automatic query expansion techniques based on feedback relies on two feedbacksources: feedback documents selection and feedback terms selection. In biomedical domain, medical entities are more meaningful than surface words. Therefore the entity based processing is necessary for any application in this domain. This thesis also includes a survey on advances in biomedical entity identification which includes biomedical entity identification process, various community identified challenges in the area, various resources available, approaches for biomedical entity identification and comparison of various techniques proposed in the literature for biomedical entity identification. UMLS is one biomedical resource which brings together many health and biomedical vocabularies and standards. UMLS contains biomedical entities with categorization and their relations with semantic information. A novel query expansion technique which uses knowledge from UMLS for feedback term selection is proposed where the queries are expanded using biomedical entities. The proposed method considers UMLS entities from a query with their related entities identified by UMLS and constructs query specific graph of biomedical entities for term selection. This query reformulation approach shows improvement over pseudo relevance feedback and state-of-the-art UMLS based query reformulation approaches. The amount of information for clinicians and clinical researchers is growing exponentially. These documents are long and number of topical documents are more. To synthesize the documents, text summarization attempts to reduce text so that the users can quickly understand relevant source information. In the biomedical domain, various summarization techniques are developed in recent years. Text summarization may be useful to medical practitioners with their information and knowledge management tasks. In this work we focus on query focused biomedical text summarization where the summary should be related to the query. The entity-based processing is incorporated in the summarization process along with word-embedding based similarity. The aim of this work is to use query reformulation in the summarization and see how it affects the summaries, whether expanded queries help to get better summaries.Item Open Access Query Processing in Different Domains(Dhirubhai Ambani Institute of Information and Communication Technology, 2021) Mishra, Sonal; Majumder, PrasenjitIn this modern era, digital content is exploding in every domain. Biomedical domain is also no exception.In this modern era, digital content is exploding in every domain. Biomedical domain is also no exception. Finding potentially relevant medical documents that can help to diagnose a particular disease is a challenging problem with the increase in biomedical documents over time. The medical queries are usually short and often contains just three to four words. The queries usually contain disease name, genetic variant, treatment for the disease.The law queries usually describe a situation and the documents that are retrieved belong to the Prior Cases document collections. Various methods of pre-retrieval query expansion is explored like word embeddings. These word embeddings are made from existing PubMed articles that are provided in the document collection. The set of experiments are performed on TREC 2018 and TREC 2020 datatsets. A detailed description has been provided in the thesis about these experiments and retrieval systems, as well as about the intuition behind the building the models. In this thesis we propose a cross relevance language model which is effective in finding potentially relevant biomedical documents from a biomedical document collection. Experiments on TREC 2018 and 2019 precision medicine track and FIRE AILA 2019 Track show that our proposed cross relevance language model is more effective compared to existing standard relevance language model for medical document retrieval.