Interpolative self-training approach for link prediction

Aghababaei, Somayyeh; Makrehchi, Masoud

doi:10.3233/IDA-184390

Interpolative self-training approach for link prediction

Article type: Research Article

Authors: Aghababaei, Somayyeh^{a; *} | Makrehchi, Masoud^b

Affiliations: [a] LoyaltyOne, Toronto, ON, Canada | [b] Social Computing and Human Computation Lab, University of Ontario Institute of Technology, Oshawa, ON, Canada

Correspondence: [*] Corresponding author: Somayyeh Aghababaei, LoyaltyOne, Toronto, ON, Canada. E-mail: [email protected].

Abstract: In this paper, learning social networks from incomplete relationship data is proposed. Link prediction is addressed as a semi-supervised learning problem where the task is to predict a larger part of networks using available knowledge of smaller parts. By this assumption, social network extraction is translated into a classification problem. While in real case scenarios majority of links are unknown, we hypothesis self-training as the most common semi-supervised learning method can provide an effective approach for learning from unlabelled data. We proposed an interpolative self-training technique that leverages node information to generate a set of examples in learning phase along with their connections as their associated labels. The approach generates data by interpolation of documents assigned to a pair of nodes. Documents as the implicit content shared between the individual nodes provide a scope for the estimation of their similarities. Then generated training data are employed in a link prediction model with two different scenarios. The first scenario interprets the link prediction as a conventional classification problem in which we have examples from both positive (link) and negative (no-link) classes. However, the second scenario addresses more realistic case where only some positive examples (links or connections) are known. Social networks are usually very sparse structures. The sparsity of social networks implies that in the classification framework of link prediction, we deal with an imbalance class distribution in which among all possible links there are a few connections (positive class) vs. many disconnections (negative class). In order to deal with class skew and enhance the performance of the classifier, a data selection method based on node similarity was proposed. To evaluate the merit of the proposed methods, a set of experiments were conducted on co-authorship networks of 18 different domains. The result implies the feasibility of achieving significantly high performance for most of the networks using the proposed self training approach.

Keywords: Link prediction, classification problem, semi-supervised learning, self-training

DOI: 10.3233/IDA-184390

Journal: Intelligent Data Analysis, vol. 23, no. 6, pp. 1379-1395, 2019

Published: 8 November 2019

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia