Affiliations: Department of Information and Communications Systems Engineering, University of the Aegean, Karlovassi, 83200, Samos, Greece | Department of Cultural Technology and Communication, University of the Aegean, Mytilene, 81100, Lesvos, Greece
Abstract: In this paper a statistical approach for estimating the evolution of categorized web page populations in web directories is proposed. The proposal is based on the capture-recapture method used in wildlife biological studies and it is modified according to the necessary assumptions and amendments for conducting the experiments on the web. During these experiments, web pages are likened to animals and the specific categories of web pages are likened to particular species of animals whose abundance, birth and survival rates are estimated. The capture-recapture model followed is a model that allows us to consider the populations under study as open. Thus, in the course of time the population evolves, meaning that new web pages are inserted in the study, while others are removed or become inactive, resembling the natural processes of migration or death. Artificial intelligence classifiers, capable of categorizing web pages, play the role of the biologists who recognize the species under study. In our work, four different simulations were conducted in order to evaluate the robustness of the model followed on the web paradigm, based on four different real classification cases. The paper provides the implementation details of our proposed web-based capture-recapture model, along with its initial assessment.
Keywords: Web evolution, web intelligence, web page categorization