Affiliations: [a] Human Computer Interaction and Robotics, University of Science and Technology, Daejeon, Korea | [b] Imaging Media Research Center, Korea Institute of Science and Technology, Seoul, Korea
Corresponding author: Suhyun Kim, Imaging Media Research Center, Korea Institute of Science and Technology, Seoul 02792, Korea. Tel.: +82 2 958 5114; Fax: +82 2 958 5769; E-mail: [email protected]
Abstract: Real world graphs are massive in size and often prohibitively expensive to analyze. Of the possible solutions, sampling is extracting a representative subgraph from a large graph that faithfully represents the actual graph. The prior research has developed several sampling methods but the samples produced by these methods fail to match important properties of the original graph and work poorly in maintaining its topology. We observed that the existing methods do not explore the neighborhood of sampled nodes fairly and hence yield suboptimal samples. In this paper, we introduce a novel approach in which we keep a list of candidate nodes that is populated with all the neighbors of nodes that have been sampled so far. With this approach, we can balance the depth and breadth of graph exploration to produce better samples. We evaluate the effectiveness of our approach using several real world datasets and show that it surpasses the existing state-of-the-art approaches in maintaining the properties of the original graph and retaining its structure. We also calculate Kolmogorov-Smirnov Distance and Jensen-Shannon Distance for quantitative evaluation of our approach.
Keywords: Graph sampling, big graphs, social network analysis