Repository logo
Collections
Browse
Statistics
  • English
  • हिंदी
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Theses and Dissertations
  3. M Tech Dissertations
  4. Web content outlier detection using latent semantic indexing

Web content outlier detection using latent semantic indexing

Files

200511015.pdf (197.17 KB)

Date

2007

Authors

Paluri, Santosh Kumar

Journal Title

Journal ISSN

Volume Title

Publisher

Dhirubhai Ambani Institute of Information and Communication Technology

Abstract

Outliers are data elements different from the other elements in the category from which they are mined. Finding outliers in web data is considered as web outlier mining. This thesis explores web content outlier mining which finds applications in electronic commerce, finding novelty in text, etc. Web content outliers are text documents having varying contents from the rest of the documents taken from the same domain. Existing approaches for this problem uses lexical match techniques such as n-grams which are prone to problems like synonymy (expressing the same word in different ways), which leads to poor recall (an important measure for evaluating a search strategy). In this thesis we use Latent Semantic Indexing (LSI) to represent the documents and terms as vectors in a reduced dimensional space and thereby separating the outlying documents from the rest of the corpus. Experimental results using embedded outliers in chapter four indicate the proposed idea is successful and also better than the existing approaches to mine web content outliers.

Description

Keywords

Content analysis, Communication, Data mining, Web sites, Web databases, Semantics, Semantics of data, Semantic database models

Citation

Paluri, Santosh Kumar (2007). Web content outlier detection using latent semantic indexing. Dhirubhai Ambani Institute of Information and Communication Technology, vii, 36 p. (Acc.No: T00114)

URI

http://ir.daiict.ac.in/handle/123456789/151

Collections

M Tech Dissertations

Endorsement

Review

Supplemented By

Referenced By

Full item page
 
Quick Links
  • Home
  • Search
  • Research Overview
  • About
Contact

DAU, Gandhinagar, India

library@dau.ac.in

+91 0796-8261-578

Follow Us

© 2025 Dhirubhai Ambani University
Designed by Library Team