Affiliations: National Institute of Information and Communications
Technology, 3-5 Hikaridai, Seika-cho, Soraku-gun, 619-0289 Kyoto, Japan | University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656
Tokyo, Japan
Note: [] Corresponding author: The main part of this work was done when
the author was at University of Tokyo. Tel.: +81 77498 6828; Fax: +81 77498
6960; E-mail: [email protected]
Abstract: With the increasing popularity of the Web, efficient approaches to
the information overload are becoming more necessary. Summarization of web
pages aims at detecting the most important contents from pages so that a user
can obtain a compact version of a web document or a group of pages.
Traditionally, summaries are constructed on static snapshots of web pages.
However, web pages are dynamic objects that can change their contents anytime.
In this paper, we discuss the research on temporal multi-document summarization
in the Web. We analyze the temporal contents of topically related collections
of web pages monitored for certain time intervals. The contents derived from
the temporal versions of web documents are summarized to provide information on
hot topics and popular events in the collection. We propose two summarization
methods that use changing and static contents of web pages downloaded at
defined time intervals. The first uses a sliding window mechanism and the
second is based on analyzing the time series of the document frequencies of
terms. Additionally, we introduce a novel sentence selection algorithm designed
for time-dependent scenarios such as temporal summarization.
Keywords: Web document summarization, temporal web page analysis, change detection and relevance, web collection