Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Velasquez, Juan D.
Affiliations: Department of Industrial Engineering, University of Chile, Santiago, Chile. E-mail: [email protected]
Abstract: The construction of a web site is a great challenge that integrates different elements such as the hyperlink structure, colors, pictures, movies and textual contents. In the latter, the correct textual content can be the key to attracting users to visit the site. In fact, many users visit a web site by using a web search engine such as, Google or Yahoo!, and continue exploring the site if it contains the information that they are looking for. In this paper, a methodology to extract the main words in a static web site is proposed. Furthermore, one of the key elements in this methodology is to determine which pages in a web site can further attract the users attention when they are browsing the site. These words are called web site keywords and by using them in the site textual content, significant improvements, from the point of view of the user, can be achieved. A web user's browsing behaviour can be classified in two categories: those of amateurs and experienced. The former is a user with little or no experience in using web-based systems. Their browsing behaviour is normally erratic and it can take them a considerable amount of time to find what they are looking for. The latter is a user with a greater amount of experience with web-based systems whose behaviour is more controlled and purpose driven, and thus takes them less time in determining whether the site contains worthwhile information. What is important, regarding the experienced web users is that there is a correlation between the amount of time spent on a webpage during a session and the extent to which they are interested in the page content. By using this characteristic, a feature vector is created in relation to the time spent on each page during a user's session. The described vectors are the input for two clustering algorithms: SOFM and K-means, which enables the extraction of significant patterns about users with similar or identical browsing behaviour and content preferences. Then, these patterns form the basis in identification of the web site keywords. In order to validate the proposed methodology, web data originated in a complex static web site belonging to a Chilean bank was used. From the clusters identified, a set of web site keywords were identified and their utility was tested on a group of real users, thus illustrating the effectiveness of the proposed methodology.
Keywords: Web site keywords, web site text content, web usage mining, web content mining
DOI: 10.3233/IDA-2012-0526
Journal: Intelligent Data Analysis, vol. 16, no. 2, pp. 327-348, 2012
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]