Abstract: Summaries of Web sites help Web users get an idea of the site
contents without having to spend time browsing the sites. Currently, manually
constructed summaries of Web sites by volunteer experts are available, such as
the DMOZ Open Directory Project. This research is directed towards automating
the Web site summarization task. To achieve this objective, an approach which
applies machine learning and natural language processing techniques is
developed to summarize a Web site automatically. The information content of the
automatically generated summaries is compared, via a formal evaluation process
involving human subjects, to DMOZ summaries, home page browsing and
time-limited site browsing, for a number of academic and commercial Web sites.
Statistical evaluation of the scores of the answers to a list of questions
about the sites demonstrates that the automatically generated summaries convey
the same information to the reader as DMOZ summaries do, and more information
than the two browsing options.