Affiliations: [a] Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur - 721 302, India. E-mails: [email protected], [email protected]
| [b] Cisco Systems, inc., Bangalore, India. E-mail: [email protected]
Abstract: In this work, we present a domain specific Information Retrieval (IR) system that identifies query and document topics and use them for better documents retrieval. We focus on retrieving documents having the specific types of information as that of the user query related to the tourism domain. Based on our past experience in handling tourism specific information, we observed that the query intent in the tourism domain largely span over a few major types. Based on this observation, we present an approach for document retrieval based on query and documents type identification. To do this, we have identified the major types (topics) in the tourism domain and built an ontology of the tourism domain. We developed a document classifier to identify the topic of web documents, and a query classifier to identify the topic of the user query, both pertaining to the tourism domain. The proposed IR system performs document retrieval by matching the type of user query with the matching type of documents. The experimental results show that the tourism specific topic identification of queries and documents improves the retrieval of documents having more specific information to satisfy user queries in the tourism domain.