You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Learning in unlabeled networks – An active learning and inference approach

Abstract

The task of determining labels of all network nodes based on the knowledge about network structure and labels of some training subset of nodes is called the within-network classification. It may happen that none of the labels of the nodes is known and additionally there is no information about number of classes (types of labels) to which nodes can be assigned. In such a case a subset of nodes has to be selected for initial label acquisition. The question that arises is: “labels of which nodes should be collected and used for learning in order to provide the best classification accuracy for the whole network?”. Active learning and inference is a practical framework to study this problem.

In this paper, set of methods for active learning and inference for within-network classification is proposed and validated. The utility score calculation for each node based on network structure is the first step in the entire process. The scores enable to rank the nodes. Based on the created ranking, a set of nodes, for which the labels are acquired, is selected (e.g. by taking top or bottom N from the ranking). The new measure-neighbour methods proposed in the paper suggest not obtaining labels of nodes from the ranking but rather acquiring labels of their neighbours. The paper examines 29 distinct formulations of utility score and selection methods reporting their impact on the results of two collective classification algorithms: Iterative Classification Algorithm (ICA) and Loopy Belief Prorogation (LBP).

We advocate that the accuracy of presented methods depends on the structural properties of the examined network. We claim that measure-neighbour methods will work better than the regular methods for networks with higher clustering coefficient and worse than regular methods for networks with low clustering coefficient. According to our hypothesis, based on clustering coefficient of a network we are able to recommend appropriate active learning and inference method.

Experimental studies were carried out on six real-world networks. In order to investigate our hypothesis, all analysed networks were categorized based on their structural characteristics into three groups. In addition, the representativeness of initial set of nodes for which the labels are obtained and its influence on classification accuracy was examined.

1.Introduction

In many real-world networks, a set of nodes and connections between them are known but the information about their characteristics can be fragmentary and not coherent. On many occasions, however, the information about nodes’ labels is essential, e.g. knowing users’ preferences or demographic profile is needed in the process of personalised recommendation of products or services. Of course, all labels can be obtained by asking everybody about them but, due to the scale of some networks and anonymity of many users in the online world, it may be a very time-consuming, costly and ineffective process. In order to reduce the resources required for manual acquisition of labels for all nodes, more sophisticated method, which enables to uncover labels of only limited number of nodes and based on this knowledge to perform the automatic classification for the rest of nodes, is needed. The research presented in this paper aims at addressing this issue by proposing an effective method for selection of starting nodes and acquisition of their labels that will serve as a training set for within-network classification task.

We also claim that there is no method that will work accurately in all cases and we show in our experiments that the accuracy of different methods strongly depend on some of the structural characteristics of a network.

For the illustration purposes we present an example of marketing campaign that will help to understand presented research. Let us assume that a marketing campaign has to be addressed to a given community of customers. The knowledge about relationships between customers is available (e.g. derived from their monitored interactions), hence, we can create a customer social network. The main purpose of the campaign is to propose new product only to those who are likely to buy it within the next year. However, the top management allocates to the campaign only a fixed amount of resources, which is not sufficient to target all community members. Thus, the question is which customers should be initially targeted in order to optimise the return on investment (ROI) of the entire campaign. ROI strongly depends on the quality of classification of community members into two classes: (i) customers and (ii) non-customers as we save resources by not sending the offer to non-customers.

One of the approaches to model the campaign is to use collective classification. In order to perform collective classification task, it is required to retrieve class labels for an initial population of nodes and next to use it in the inference process. Before classifying the whole network, some selected nodes need to be provided with the offer and their positive and negative responses together with the response rate should be collected. Afterwards, based on theirs behaviour as well as relationships between social network nodes, collective classification could model responses for the remaining nodes. The main issue is to determine which nodes should be selected to acquire their labels in order to maximise the performance of classification. An intuitive answer is that we should start with the nodes estimating the whole network most accurately. The solutions of the problem of which nodes’ labels should be obtained in order to perform the collective classification are called active learning or active inference approaches because they actively, not randomly, support selection of the learning set.

The problem of selecting appropriate nodes in order to start collective classification process is studied in this paper. First, the literature review in the areas of (i) collective classification, as well as (ii) active learning and inference methods is described in Section 2. Section 3 presents the method for selection of initial nodes for classification purposes proposed in this paper and in Section 4 there are revoked basic collective classification algorithms. The experiments using the proposed method and the datasets used in the process are described in Section 5 and discussed in Section 6. Finally, Section 7 summarises the main contribution of the paper.

2.Related work

The research area that is in the focus of this paper is active learning and inference methods for within-network classification and the literature that relates to this topic is presented below. However, first the problem of collective classification is discussed to give a general background of the field and facilitate the understanding for non-expert readers.

Fig. 1.

An illustration of within-network classification task. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

An illustration of within-network classification task. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

2.1.Collective classification

Although problem of classification in traditional machine learning is not new, together with the explosion of Web-based social networks [23], the new branch in this area called collective (relational) classification has emerged. The main difference between collective classification and traditional approach to classification problem is that the former one allows data to be dependent whereas the latter one assumes independent and identically distributed data (i.i.d.). Collective approach allows to consider both characteristics of nodes and topology of the network in the process of assigning node to a specific class. It means, that not only features of a node to be classified are taken into account but also the attributes and labels of related nodes (e.g. direct neighbours) can be considered [16]. Two approaches can be distinguished for classification of nodes in the network (i) within-network (Fig. 1) and (ii) across-network inference (Fig. 2). In the within-network classification [10] training nodes are directly connected to nodes whose labels are to be assigned in the classification process. In the across-network classification [22] models learnt from one network are applied to another similar network.

Fig. 2.

An illustration of across-network classification task. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

An illustration of across-network classification task. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

One of the issues in collective classification is to determine set of features that should be used in order to maximize the classification accuracy. Recent research in this area shows that the values of new attributes derived from structural properties of the graph, such as betweenness centrality, may improve the accuracy of the classification task [14]. Another confirmation of that fact and some more discussion about it comes from other research [18]. Another element that should be considered in the collective classification is the order of visiting nodes in the graph to perform re-labelling. The order of visiting the nodes influences the values of input values derived from the structure. To address that problem variety algorithms have been proposed. Random ordering [20] is a simple strategy used with iterative classification algorithms and can be quite robust. Another and most popular examples of collective classification methods are: Iterative Classification Algorithm (ICA) and Gibbs Sampling Algorithm (GSA), introduced by Geman & Geman in the image processing context [15]. Both of them belong to so called approximate local inference algorithms that are based on local conditional classifiers [30]. Next technique is a Loopy Belief Propagation (LBP) [28] that is a global approximate inference method used for collective classification. But, according to recent studies [17], above mentioned methods are not robust enough to work efficiently and accurately in sparsely labelled and large-scale environments. This is a very important conclusion as majority of Web- and technology-based social networks are very sparse and large. Another drawback is that they cannot be easily deployed within-multi-dimensional networks [19], what creates space for developing new, robust collective classifiers.

According to [12], a very promising area of research, which may contribute to solve the issues identified above, is building compound ensemble collective classifiers. As it has been already shown, despite the fact that ensemble methods are performing accurately for i.i.d. data, there is a lack of similar analysis for relational data. For instance, in [7] it was presented that bagging reduces total classification error in i.i.d. data by reducing the error due to variance. The extension of i.i.d. ensembles to improve classification accuracy for relational domains has been shown in [11]. That includes a method for constructing ensembles while accounting for the increased variance of network data and an ensemble method for reducing variance in the inference process. Another promising work [13] showed that different ensemble method – stacking – improves collective classification by reducing inference bias.

2.2.Active learning and inference in networks

As mentioned above active learning and inference are methods used to select nodes for which labels should be acquired in order to perform collective classification [1,2,29,31,32]. The main goal of these methods is to improve classification accuracy by choosing nodes in a non-random way. In contrast to passive methods where all labels are obtained once, active methods perform this task iteratively. Research results show, that in order to achieve similar accuracy, in some cases number of nodes to be queried for labels is logarithmic when comparing to passive methods [4]. It should also be emphasized that the goal of active inference and learning methods is different than e.g. seeding strategies where the most influential nodes are selected. Active inference and learning in the context of collective classification aims at selecting nodes that enable to minimize the classification error for the whole network. The main drawback of active learning and inference methods is that the set of queried labels is losing its i.i.d. characteristics, what may lead to spending querying budget on bias sampling [34] or propagating the information in areas, where surrounding nodes may cover the inner “islands” that are labeled differently [5].

To overcome some of the discussed limitations active inference and learning offer different approaches when dealing with separable and non-separable data, e.g. agnostic active learning [3] or query by committee [33].

One of the approaches to active learning in relational data is Reflect and Correct (RAC) method introduced in [5] and further developed in [6]. It is based on a simple intuition that the misclassified nodes are gathered close to each other; they are clustered together. Thus, misclassifying one node might cause wrong labelling of neighbours. Therefore, it is reasonable to acquire the actual label of representative nodes from such clusters and use it in the inference. In order to find these clusters a label utility function can be applied [6]. Authors introduce three types of features – local, relational and global – which are used as a learner of the classifier. These features measure three different aspects of misclassification. Local feature focuses on the attributes of a node, the relational takes into consideration its neighbourhood and the global feature measures the difference between the prior belief about the class distributions and posterior distributions based on the predictions. Having all these features available, authors of [6] use a training graph and the predictions of a collective model on this graph to learn the distribution of labels. This approach presents a reasonable assumption that we are having some budget to spend for acquiring labels. Authors compared the results of the introduced RAC method against two other approaches including their viral marketing approach and greedy acquisition showing that RAC method outperforms the others.

Another approach was proposed in [25]. The paper introduces a technique for node selection in an active learning framework. It selects a set of nodes that should be used in the collective classification based on a given limited budget. Using the idea of smoothness (similar distributions of independent attributes as well as relational features between nodes) it decides which nodes to select. The smoothness idea incorporates high utility from nodes that are close to each of the queried nodes. It is also proposed how to compute utility for each non-surveyed node and how to sample within the budget. However, in this method, authors assumed that network structure may not be available a priori, so the queries may reflect the labels and the neighbourhood of the node. A similar approach has been introduced in relational active learning proposal in [21]. The key idea behind this approach was to select these nodes to acquire the label, whose predictions are potentially most certain. It is worth to emphasize that this is inconsistent with many conventional utility metrics used in i.i.d. settings, which favour labelling nodes with high uncertainty.

Fig. 3.

Major steps of the active learning and inference method for within-network classification. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Major steps of the active learning and inference method for within-network classification. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

3.Relational active learning and inference

3.1.The method for active learning and inference in within-network classification based on node selection

The proposed method for active learning and inference in within-network classification task consists of five main steps, see Fig. 3. First, for a given unlabelled network, the utility scores for each network node are obtained by calculating the node’s structural measures such as degree centrality, closeness centrality, etc. In general, the utility score should reflect the usefulness of the node’s label in the process of within-network classification. Further discussion on considered utility scores is provided in Section 3.2. In addition, new ‘measure’-neighbour approaches for assessment of the nodes’ utility have been proposed in Section 3.3.

Afterwards, the previously obtained utility scores are sorted in the ascending or descending order; it enables to form nodes’ ranking. Depending on the type of the utility score, the most useful nodes are these with the highest or smallest score value. In the next step, the method selects nodes for which the label will be queried based on top N items in the ranking. Once the process of label acquisition is completed, the appropriate relational classification algorithm can be applied to perform within-network classification. Note, that this last step is not the main contribution of the paper and many different classification algorithms can be applied. In this research two of them were selected as the most representative and widely used, i.e. Iterative Classification Algorithm (ICA) and Loopy Belief Prorogation (LBP).

3.2.Utility scores

Utility scores reflect the usefulness of the node’s label in the process of within-network classification. It is expected that learning the relational classification model using some previously acquired labels will result with small classification error. The process of label acquisition in active learning should be expressed as simple optimization problem of nodes selection, but in the within-network classification setting it is not possible. The general requirement of the selection mechanism is to pick those nodes’ labels from label set L whose usage in relational inference will result in the smallest possible misclassification error E.

In general terms, the expected misclassification error for all unlabelled nodes viVUK on their labels YiUK with xi attributes depends on abilities of relational classification algorithm Φ that is learnt on previously acquired labels YK for the initial node set:

(1)YiYKE(Yi|X=xi,YK,Φ(YK)).
Therefore, the main problem is to obtain unknown values of YK for which classification algorithm Φ will provide proper results. Then, the aggregated error must reflect an expectation over all possible values of YK:
YiYKlLP(YK=l)(2)×E(Yi|X=xi,YK=l,Φ(YK=l)),
where P(YK=l) is the chance that YK takes a value of label l. Although the presented error is in general suitable for across-network classification, it does not comply with within-network classification. It is impossible to assess the correctness of classification for all nodes related to YiYK due to the lack of these labels. Thus, it is impossible to propose any utility score that will directly lead to classification error minimization.

Nevertheless, it is still possible to make use of other utility scores that reflect structural properties of nodes, relying on the assumption that classification error depends on these measures. Depending on the characteristics of the underlying network, a proper measure can be chosen from the vast variety of nodes structural measures [24] such as indegree centrality, outdegree centrality, betweenness centrality, clustering coefficient, hubness, authority, page rank.

3.3.A new ‘measure’-neighbour approach to utility score

To extend the typical structural measures approach, enumerated in Section 3.2, authors proposed and evaluated another method. Assuming that some nodes with the highest measures’ values may actually be located on the border of ‘classes’, it may be useful to pick not this node itself, but its neighbours. The intuition suggests that it may hold especially for the betweenness or page rank measures, since nodes with high betweenness and page rank may be located at the border of clusters or groups or may even play the role of bridges. In this case, it may be worth acquiring the label of the neighbour instead of the bridge label itself. For example, in the case of betweenness, we identify nodes with the highest value of betweenness and select their neighbours for label acquisition. By analogy, we can create indegree-neighbour utility score, page rank-neighbour utility score, etc. All of them focus on neighbours of nodes with a given property.

In order to confirm or reject the above concept, the authors performed set of experiments comparing the results of this approach with the typical measure-based methods, see e.g. Tables 4 and 6.

The neighbour selection heavily depends on the structure of the network. It may happen that in particular cases some nodes selected from rankings do not have neighbours. Therefore actual number of neighbouring nodes may be smaller than the size of sampled ranking. It was assumed that for each selected node from ranking it is selected only one neighbour. Thus, for instance if it is selected 10% of nodes from the network according to particular ranking, we may end up with smaller than 10% population of network constituted by neighbours (see Fig. 13). Moreover, for the propagation algorithm (LBP) when a node from the training set has no neighbours, the information about the label during the classification process will not be propagated. On the contrary the ICA method does not depend on network structure in the sense that even if the labelled node has no neighbours, the label may be assigned to nodes in other, even disconnected, components.

However, in general, this did not have adverse effect on the obtained results. The exception was one dataset (CS_PHD) where the network was highly disconnected and almost no nodes were labelled in the neighbour algorithm. In other cases, despite the fact that less nodes were used as an input to classification algorithm, LBP-neighbour method outperformed classical LBP in both approaches – top and bottom (i.e. when initial nodes were selected from top and bottom of the rankings created based on the utility scores respectively).

4.Within-network classification algorithms

There exist several algorithms for within-network classification. Two of them were utilized in experiments in Section 5.

The first algorithm is the Iterative Classification Algorithm (ICA). The basic idea behind ICA is quite simple. Considering a node viVUK, where VUK is a set of nodes with unknown labels, VUKV and V is the set of all nodes in the network, we aim at discovering vi’s label li. Having labels of vi’s neighbourhood known, ICA utilizes a local classifier Φ that takes the attribute values of nodes with known labels (VK) and returns the most appropriate label value li for vi from the class label set L, i.e. liL. If the knowledge of the neighbouring labels is only partial, the classification process needs to be repeated iteratively. In each iteration, labelling of each node vi is done using current best estimates of local classifier Φ and continues until the label assignments stabilize. A local classifier might be any function that is able to accomplish the classification task. It can range from simple to complex models like Naive Bayes, decision tree or SVM.

Algorithm 1 depicts the ICA algorithm as a pseudo-code where the local classifier is trained using the initially labelled nodes VK. It can be observed that the attributes utilized in classification depend on the current label assignment (lines 8 and 9 in Algorithm 1). Thus, the repetition of classification phase needs to be performed until labels stabilize or the maximum number of iteration is reached. Computation of nodes’ attributes (lines 2 and 8) is the calculation of various nodes’ structural measures describing profile of each node, including label-dependent and/or label-independent features [18]. Note that optimization of the model (line 4) is based on the local knowledge, since xi attribute of vi reflects only local information with vi’s perspective.

Algorithm 1.

Iterative Classification Algorithm (ICA), the idea based on [30]

Iterative Classification Algorithm (ICA), the idea based on [30]

Another method applied in experiments was the Loopy Belief Propagation algorithm (LBP). It is an alternative to ICA approach to perform collective classification. The main difference is that it defines a global objective function to be optimized, instead of performing local classifier optimization (ICA).

LBP is an iterative message-passing algorithm. The messages are transferred between all connected nodes vi and vj; where vi,vjV, (vi,vj)E, E is the set of network edges. These messages might be interpreted as belief to what extent vj label should be based on vi label.

The global objective function, which is optimized in LBP, is derived from the idea of pairwise Markov Random Field (pairwise MRF) [35]. In order to calculate the message for propagation, the calculation presented in Eq. (3) is performed.

mij(lj)=αliLΨij(li,lj)ϕ(li)(3)×vkVUKvjmki(li),
where mij(lj) denotes a message to be sent from node vi to vj, α is the normalization constant that ensures each message to sum to 1, Ψ and ϕ denote the clique potentials. For further details please see [30].

The calculation of believe b(li) for node vi can be expressed as in Eq. (4):

(4)b(li)=αϕ(li)vjVUKmji(li).

The LBP algorithm consists of two main phases: message passing that is repeated until the messages are stabilized and believe computation, see Algorithm 2.

Algorithm 2.

Loopy Belief Propagation (LBP), the idea based on [30]

Loopy Belief Propagation (LBP), the idea based on [30]

5.Experiments

5.1.Experimental set-up

In order to evaluate the proposed method for active learning and inference in terms of classification accuracy, the Iterative Classification (ICA) and Loopy Belief Propagation (LBP) algorithms were tested with various utility scores. The experimental scenario aims at examining the following structural measures used as utility scores:

  • indegree centrality,

  • outdegree centrality,

  • betweenness centrality,

  • clustering coefficient,

  • hubness,

  • authority,

  • page rank.

All of them were applied in two selection methods: nodes with the top (the greatest) and bottom (the smallest) values of individual scores. Independently, another new ‘measure’-neighbour method proposed in Section 3.3 was also evaluated. Its idea is to chose the neighbours of the node with the greatest/smallest value of a given utility score.

In total, 29 selection methods were tested: 14 for original structural measures (7 measures; ‘top’ or ‘bottom’ for each), 14 for ‘measure’-neighbour methods and a random selection. The random selection was repeated 14 times and the average error was taken as its final validation result.

The experiments were carried out on original dataset with labels acquired according to particular setting of selection method and utility score. Thanks to that, each dataset was split into known and unknown node sets. The models were learnt on acquired labels in nine distinct proportions (from 10% to 90% of known labels) and tested on the remaining part. In order to evaluate the quality of classification, the classification error was recorded. According to previously gathered experience on the configuration of the classification algorithms [17] the implementation of ICA was based on Random Forest base classifier [8] and it we used 50 iterations or 0.01 as relative change of labels in the LBP as the stop condition. ‘Measure’-neighbour version of training set selection (Section 3.3) used a draw with the uniform distribution from adjacent nodes.

Table 1

Basic properties of datasets utilized in experiments

DatasetGroupNodesEdgesDirectedClasses (labels)Avg. node degreeAvg. path lengthNo. of connected componentsModularityGraph densityNetwork diameterAvg. nodes clustering coeff.
AMD_NETWORK131934,385no16215.581.32210.1020.67820.824
NET_SCIENCE115882742yes261.7271.9973950.9550.00170.319
PAIRS_FSG2493161,449yes312.4624.27810.5940.003100.122
PAIRS_FSG_SMALL2197212,213yes36.1935.358130.6880.003140.127
YEAST223617182yes133.0424.648590.590.001160.065
CS_PHD31451924yes160.6362.2655310.9670100.001
Table 2

The description of groups of datasets and profiles of their collective classification results

GroupDatasetsNetwork profileResults profile
1AMD_NETWORK, NET_SCIENCEsmall-world profile-like, short avg. path length; the smallest network diameter; clustering coeff. > 0.3high error level decreasing with for increasing % of training set; good performance of ‘measure’-neighbour methods; LBP neighbour (bottom page rank) outperforms the others; most of ‘measure’-neighbour methods outperform random
2YEAST, PAIRS_FSG, PAIRS_FSG_SMALLnon-small-world modular networks, moderate avg. path length; modularity ∈ (0.5; 0.9); graph density ∈ (0.01; 0.1); clustering coeff. < 0.15relatively small variance of results; the more classes, the worst results; for smaller modularity and density, and greater clustering coeff. LBP-neighbour outperforms the others
3CS_PHDrandom-like network with very low probability of edges between nodes, highly disconnected; many isolated nodes (avg. node degree < 0.7); density 0; clustering coeff. close to 0LBP error > 0.8; ICA better than LBP, but still poor; ‘measure’-neighbour methods worse than original and random

5.2.Datasets

The experiments were carried out on six datasets. The AMD_NETWORK graph presents attendance at the conference seminars. The dataset was a result of the project that took place during “The Last HOPE” Conference held in July 18–20, 2008 in New York City, USA. The Radio Frequency Identification devices were distributed among participants of the conference that allowed to identify them uniquely and to track what sessions they attended. The dataset was built from the information about descriptions of participants’ interests, their interactions via instant messages, as well as their location over the course of the conference. Location tracking allowed to extract a list of attendances for each conference talk. In general, the most interesting data from the experiment point of view are: information about conference participants, conference talks and presence on the talks.

Another genealogy dataset CS_PHD is the network that contains ties between PhD students and their advisors in theoretical computer science field where the arcs lead from advisors to students [27].

The third dataset NET_SCIENCE contains a co-authorship network of scientists working in the area of network science [26]. It was extracted from the bibliographies of two review articles on networks.

Another biological dataset YEAST is a protein–protein interaction network [9].

The PAIRS_FSG dataset is a dictionary from the University of South Florida with word association, rhyme and word fragment norms. Its graph reflects correlations between nouns, verbs and adjectives. In the experiments we use the original PAIRS_FSG data as well as its reduced version PAIRS_FSG_SMALL.

The profiles of all datasets were shortly depicted in Table 1. In order to investigate our hypothesis that the accuracy of classification depends on network characteristics, the datasets were divided into three groups (see column ‘Group’ in Table 1 as well as description of groups in Table 2). It was done based on the commonly used network’s characteristics, such as average node degree, average path length, modularity, graph density, network diameter, and average clustering coefficient of nodes. Based on those characteristics, networks that belong to Group 1 are small-world networks, those that belong to Group 2 are non-small-world modular networks and those in Group 3 are random networks that are very sparse (at the edge of phase transition condition for random networks).

The graphs that belong to the first group have short average path length, the smallest network diameter out of all analysed networks, and clustering coefficient larger than 0.3. With relatively large clustering coefficient and short average path length networks in Group 1 exhibit characteristics of small-world networks. The second group contains networks with moderate average path length, modularity from the range (0.5; 0.9), graph density from the range (0.01; 0.1) and clustering coefficient smaller than 0.15. Those characteristics indicate that networks in this group are non-small-world modular networks. Group 3 contains only one dataset which is highly disconnected, with many isolated nodes, density and clustering coefficient approaching zero, and over 400 nodes with node degree equal to 0. This group represents random networks with very low probability that the link will exist between two randomly selected nodes, which means that the giant component is not present and such networks consist of many small connected components. Further detailed characteristics of the networks followed by information about number of distinct classes are available in the Online Supplemental Material in Sections 1 and 2.

Fig. 4.

Comparison of classification error for ICA and LBP; both only for their most efficient utility scores for original utility scores and ‘measure’-neighbour variants. (a) AMD dataset (Group 1); (b) NET_SCIENCE dataset (Group 1); (c) PAIRS_FSG dataset (Group 2); (d) PAIRS_FSG_SMALL dataset (Group 2); (e) YEAST dataset (Group 2); (f) CS_PHD dataset (Group 3).

Comparison of classification error for ICA and LBP; both only for their most efficient utility scores for original utility scores and ‘measure’-neighbour variants. (a) AMD dataset (Group 1); (b) NET_SCIENCE dataset (Group 1); (c) PAIRS_FSG dataset (Group 2); (d) PAIRS_FSG_SMALL dataset (Group 2); (e) YEAST dataset (Group 2); (f) CS_PHD dataset (Group 3).

6.Experimental results and discussion

As the experiments were performed using large number of parameters, the obtained results can be analysed from many different perspectives. Overall, it can be noticed that accuracy of the investigated approaches varies and the methods themselves cannot be compared in a straightforward way. This is clearly visible in Fig. 4 depicting results obtained for the best combination of ICA, ICA neighbour, LBP, and LBP neighbour settings and compared with random selection. All of the results are presented in Tables 3, 4, 5 and 6.

There are three basic factors that influence the output. The first one is the structural profile of the network, the second is the method of selecting nodes for the initial label acquisition (seed selection strategy), and the third one is the method of within-network collective classification. Also, the percentage of uncovered classes during the initial selection process contributes to the classification accuracy.

In general, there is no single node selection method combined with inference concepts that would be best for every kind of network and every size of initial node set. However, some approaches and combinations of the reasoning algorithms with seed acquisition methods are better than others for particular network profiles, see Section 6.1. This would suggest that the results depends on the network profile.

One of the observations that can be derived from the experimental results is the better performance of node selection methods based on ‘measure’-neighbour approach described in Section 3.3 compared with the original rankings, especially for datasets in Groups 1 and 2. This suggests that the bigger the clustering coefficient of the network, i.e. the higher probability that the clusters will exist in the network, the better the classification results. It is visible, if we juxtapose results for individual measures from Table 3 with Table 4 and from Table 5 with Table 6. The comparison showing how often top or bottom of ranks for given measures outperformed each other is presented in Table 7. As a result, ‘measure’-neighbour methods more often surpass random selection than the measure-based methods, see the last column in Tables 4 and 6. Moreover, while analysing the last column in Tables 3, 4, 5, and 6, we can find out that there is always at least one ICA and at least one LBP node selection method outperforming the random approach. In some cases, e.g. for AMD and PAIRS_FSG (Tables 4 and 6), all or almost all ‘measure’-neighbour methods are better than random selection. The proof for existence of some methods better than random in any case is very important from practical point of view. It justifies that searching for more effective inference methods can always be successful.

Experimental results also revealed that, regardless of the kind of nodes’ selection method in active learning (degree, betweenness, etc.) and selection strategy (top or bottom of the ranking), none of the methods was able to satisfactory generalize networks that are very sparse and random by nature (especially for network CS_PHD which belongs to Group 3, see Section 5.2). For such problem, the results were quite similar to random seeding, see Tables 3, 4, 5 and 6.

When analysing the active learning method giving the best results for ICA, in most cases, the inference results were not as much susceptible to the percentage of known nodes as for the LBP method. It means that the global network propagation of information applied by LBP is more dependent on the size of the training set than ICA method. In general, acquisition of labels by means of the best proposed methods (e.g. nodes with the greatest degree or neighbours of nodes with the lowest page rank) in conjunction with ICA and LBP algorithms, in most cases, outperformed random results. However, when using LBP the results might suffer from its basic property: if the selection method does not provide nodes from all connected components then the information about labels is not propagated to separated parts of the network.

In addition, as it was described in Section 4, the Loopy Belief Propagation (LBP) method is heavily network dependent, because the underlying network structure and global objective function determine the propagation, while the ICA method utilizes only the local structure of the network. The experimental results confirm that these differences heavily influenced the results of individual classification methods. For example compare results for AMD and NET_SCIENCE (good performance of LBP for small-world like networks) with CS_PHD (the poor LBP results for the loosely connected network with small node degree).

Table 3

Classification error in active learning based on ICA for distinct selection strategy; initial nodes taken directly from the ranking

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
AMD10%0.9830.9830.910.8790.9760.9760.910.8750.8750.8960.910.8750.8750.8750.9067
20%0.8830.8830.9020.9810.8830.8830.8830.8980.8980.9060.8980.9380.9380.8980.8780
30%0.8970.8970.9060.8750.9110.9110.9330.8840.8840.8530.8970.9110.9110.8840.8581
40%0.9430.9430.9060.9380.8850.8910.8850.9120.9120.9270.9430.9120.9120.9170.8370
50%0.8810.8810.8880.9190.8810.9690.9310.950.950.90.9060.950.950.950.8310
60%0.9610.9610.9140.9060.8750.8750.9610.9140.9140.9140.8980.8910.8910.8910.8110
70%0.9690.9690.9060.9380.9480.9480.9060.8330.8330.9480.8960.8330.9580.8330.7980
80%0.9380.9380.9220.9690.9530.9530.9380.9380.9380.9690.9530.9380.9380.9380.8050
90%0.9380.9380.9380.8750.9060.9060.9380.9380.9380.8750.9060.9060.9060.9380.7820
NET_SCIENCE10%0.9670.9690.9410.9440.9540.9350.9340.9730.910.9220.9190.9360.9350.940.9354
20%0.9350.9250.9330.9260.9160.9180.920.9110.9090.9220.9140.9420.9150.9230.9112
30%0.9060.9040.9330.9020.9020.9220.9120.9140.9110.9150.9210.9250.9180.9160.9084
40%0.9320.9090.9230.9110.9030.910.9150.930.9090.9120.9230.9030.9250.920.9020
50%0.9140.9040.9120.8990.9150.9180.9220.910.8990.9220.910.9260.9140.9120.8992
60%0.9060.9110.9250.9010.9230.9130.9230.9250.9540.9110.9160.9230.9270.9040.8980
70%0.9160.9160.920.9020.9110.9160.9230.9250.9230.9130.9070.9020.9070.9130.8980
80%0.8910.9080.9220.9110.9220.9110.9220.9220.980.9320.9180.8980.9150.9280.8982
90%0.8840.8980.8980.9050.9320.9180.9250.9050.9250.9390.9050.9180.9180.9050.8961
PAIRS_FSG10%0.8550.9840.8450.8450.7610.8870.8390.8150.8070.850.850.8820.850.8150.8244
20%0.8160.8890.880.9490.8550.9130.8340.8830.8370.840.9550.9460.9550.560.7771
30%0.9280.9380.9590.9930.82810.8410.6070.9450.890.9930.9380.9760.6070.662
40%0.8920.8640.9880.920.9120.9480.90.610.9360.6790.9280.8880.9080.610.6432
50%0.2910.2950.3020.3350.320.290.3490.3290.3190.3370.2920.3330.3390.3340.3085
60%0.2880.2950.2950.3350.3250.2840.2930.3280.3250.3310.280.3120.4430.5280.3076
70%0.2790.2970.2930.3320.330.280.2910.3350.3130.3410.2790.3130.5280.3460.3066
80%0.270.2930.2590.3330.3390.2710.2760.3440.3080.350.260.2970.4390.360.3098
90%0.2570.2960.2370.3260.3420.2630.2630.3620.30.370.2810.2810.5770.3830.3058
Table 3

(Continued)

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
PAIRS_FSG_SM10%0.6710.6680.6830.6680.6890.6950.6870.6610.6690.7010.7040.7090.70.6990.6651
20%0.6540.6780.6950.6950.7150.7150.6970.6710.6570.7130.7370.7320.7340.7060.6612
30%0.6680.6730.7150.7090.7350.7360.7010.680.6690.7380.7390.7570.7370.7230.6640
40%0.6630.6890.7260.6980.7570.760.7160.6760.6410.7280.7370.7540.7550.7270.6672
50%0.6630.6630.7250.7210.7560.780.7410.6950.6860.740.7140.7620.7690.7280.6652
60%0.6790.6960.7610.7470.7780.8040.7520.6890.6890.7380.7030.7620.7680.7310.6670
70%0.6640.6520.7760.7860.8130.8270.7690.6740.6640.7590.7250.7780.7740.7230.6653
80%0.6540.6620.8090.8350.830.840.7940.7020.6640.7610.7050.7560.7790.7480.6693
90%0.7010.6650.7670.8680.8930.8430.7970.6140.650.8120.6950.7560.7870.7820.6612
YEAST10%0.9650.950.930.7510.9220.9340.8730.9280.8810.940.8010.8110.7740.8460.8324
20%0.9030.9010.780.7760.7880.7680.7650.750.7660.8160.7780.8160.8330.7960.7684
30%0.9170.920.770.7620.780.7510.7710.7540.790.8040.7770.8310.8110.7840.7480
40%0.9260.7130.7040.7390.7140.7410.7680.7710.9170.8340.7880.8280.8120.7730.7445
50%0.7130.6950.880.7230.6810.7440.7610.8130.8110.8730.8190.8610.80.7810.744
60%0.9240.6540.6610.6970.6580.7410.7530.8020.9350.8270.8430.870.80.7770.7374
70%0.6470.640.6360.6870.6330.7490.7420.7910.8770.8310.9220.8910.8040.7670.735
80%0.620.6280.6280.6790.6390.7610.7380.7990.8990.8440.8750.9130.8120.7930.7295
90%0.6120.6290.6290.6370.6330.70.7380.8310.8820.8270.8310.8730.7850.7930.7246
CS_PHD10%0.9270.9390.8510.7540.7550.7740.9410.940.7770.8290.820.7510.750.9730.8358
20%0.9540.9190.7330.9120.8920.9260.6860.9360.9190.9250.8820.9080.9090.9150.8442
30%0.8020.9290.7040.7970.7970.9270.6680.9110.9180.9390.8840.7970.7940.90.8737
40%0.8120.6410.6890.8430.8410.7870.6530.7710.7610.7940.8410.8430.8490.8960.87613
50%0.6760.710.6780.7060.710.6460.6910.870.9120.7820.7080.7080.6590.7180.79712
60%0.6890.6170.6730.6890.6890.6350.7060.7390.6920.6470.6920.6920.6970.7150.79214
70%0.6360.6490.6460.6990.6960.6430.6960.730.6870.7430.6990.6990.7120.7050.74113
80%0.6480.610.6290.6060.610.6150.7320.7140.6760.6950.6060.610.610.6530.75614
90%0.7940.5610.6260.6260.5890.5890.7760.7380.6360.710.6170.5890.5890.6640.74212

Note: The last column represents how many of the non-random selection strategies outperformed the random case.

Table 4

Classification error in active learning based on ‘measure’-neighbour version of distinct selection strategy with ICA

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
AMD10%0.8690.9270.9070.8830.8860.890.8550.8920.8890.8650.9030.9070.8960.9170.90610
20%0.8510.870.7990.8650.8370.8440.8830.8940.8980.8220.8210.8210.8970.830.87810
30%0.830.7420.7950.7810.8050.8220.8460.80.7890.7770.8070.8120.7940.8280.85814
40%0.7490.750.7630.8160.7260.780.7280.7490.7720.7790.7860.7720.7760.7660.83714
50%0.7380.7980.7570.7680.7190.7040.7420.7060.7460.7620.7220.6830.7570.7880.83114
60%0.7510.7620.7070.7220.6730.7080.710.7680.6940.7940.7110.7850.7060.7380.81114
70%0.6690.7520.6770.7360.6890.690.6520.70.7390.6710.7060.6910.7270.7030.79814
80%0.6780.6710.6040.6510.6620.6760.6530.6940.7290.7470.6730.7120.7220.7240.80514
90%0.6140.6620.5830.6130.6350.6290.5930.6710.6890.6690.6690.630.7090.6720.78214
NET_SCIENCE10%0.9370.9140.9280.9370.9430.9390.9320.9370.9350.9640.9350.9530.9330.940.9355
20%0.9040.9050.9010.8840.9320.9380.9270.9220.940.9520.9350.9610.9360.910.9115
30%0.9010.9020.9160.8760.9140.8930.9110.9030.8930.8990.8990.9280.9150.8850.9089
40%0.9010.9050.8930.9010.8970.8890.9160.8950.8910.8960.9030.9150.9330.8930.9029
50%0.8860.8850.890.890.9020.910.8940.8830.8840.9050.9420.9170.9250.8870.8998
60%0.9020.8910.8810.8850.9040.8980.8970.8820.8910.8840.8970.9060.9030.890.8989
70%0.8970.8870.890.870.8920.9020.8880.8820.9120.8790.8920.9170.9180.8670.89810
80%0.8790.8850.8990.880.8880.8930.8970.8760.8710.8820.90.9260.9280.8610.89810
90%0.8920.870.8990.8690.8720.9070.8810.8890.8770.8840.880.9170.9160.8730.89610
PAIRS_FSG10%0.7960.9650.8070.830.7670.980.9770.80110.8030.8240.8440.8260.9790.8246
20%0.7560.8340.8240.8590.730.810.8090.8020.8270.8060.850.8420.8740.7930.7772
30%0.8390.8540.8380.890.7870.8350.8280.7870.8170.8280.8770.8820.8880.780.660
40%0.8340.8080.7990.9020.7760.8610.8290.7770.8520.820.9080.9040.9010.7810.6430
50%0.2930.2910.3010.3050.310.2990.3020.3030.2950.3030.2930.2950.3010.2970.30813
60%0.2930.3070.2980.30.3060.3010.3030.3010.3020.2930.2870.2970.3020.30.30714
70%0.2940.2950.3060.3060.3040.2940.2980.2930.3010.2950.2960.2960.2950.2970.30614
80%0.2870.2940.3010.3070.2990.2940.2940.2960.3050.2970.2870.2980.3010.2940.30914
90%0.290.2860.3020.3010.310.3040.3050.2990.2910.2970.2910.2940.2910.2910.30513
Table 4

(Continued)

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
PAIRS_FSG_SM10%0.6360.6680.6620.6580.6620.6690.6660.6650.6610.6470.6460.6630.6640.6470.66510
20%0.660.6630.6580.6550.6710.6730.6710.6620.6470.6590.6410.6530.6430.650.6619
30%0.660.6580.6730.650.6760.6940.6750.6520.6480.6430.6510.6390.6470.630.66410
40%0.6510.6570.6820.6540.6680.6780.6520.6440.650.6270.6490.6420.6460.6240.66711
50%0.6630.6470.6650.6620.6660.70.680.6520.650.6170.6350.6360.6480.6170.66510
60%0.6570.6560.6640.6570.680.6990.6810.6470.6550.6170.6280.6310.6290.6320.66711
70%0.6660.6540.6820.6610.6710.6810.6760.6230.6620.6060.6240.6370.6560.6170.6659
80%0.6460.6440.6620.660.6780.7020.6820.6430.6580.6220.6090.6230.5980.6140.66911
90%0.660.6560.6790.6740.6770.6980.6730.640.6690.6240.6220.5980.5990.6360.6618
YEAST10%0.7560.890.7760.9090.9250.7840.8890.89210.9250.90.9410.930.9410.8323
20%0.7110.7240.7260.7150.7630.7210.7380.91210.9170.7460.9340.7770.9510.7688
30%0.7060.6970.7020.7050.6950.7030.7230.7410.8550.8450.7760.7960.7230.7489
40%0.7090.7050.7020.7030.690.6870.7280.7320.9470.7560.7250.7770.7480.7270.74410
50%0.7050.7050.7030.6980.6740.6950.7110.7490.9350.7650.7160.7720.7140.7120.7410
60%0.7240.6850.6860.6950.6920.6950.7150.7090.7290.720.7220.730.6960.6990.73714
70%0.6870.6890.6990.6860.6830.6880.7070.6980.7190.6920.7020.7140.7180.6870.7314
80%0.680.7010.6890.6830.6810.6960.6870.6870.7040.6920.70.7060.6840.6910.72914
90%0.6990.6930.6920.6870.680.7110.7040.6840.6830.6830.6980.710.6850.6870.72414
CS_PHD10%0.9340.9530.8840.6960.830.6920.8920.9510.7810.7480.7620.780.9440.9190.8357
20%0.9750.760.6520.7210.7150.820.8970.9280.9630.9140.8540.6790.9010.930.8446
30%0.8910.850.930.8140.9170.8550.8830.9360.8480.7670.7820.7920.9460.9220.8737
40%0.9370.9080.7410.950.9490.9210.9640.8750.7790.8430.9790.8960.9310.9230.8764
50%0.8780.9380.8190.7140.6580.9210.9390.9020.9220.9560.9020.9380.8190.9560.7972
60%0.9230.8280.9390.8010.8180.8070.960.910.8030.9110.7740.8160.9510.7890.7922
70%0.7750.7010.8780.9580.7650.8890.9140.860.7910.8540.8910.930.8720.9170.7411
80%0.680.9580.9210.9420.9250.8290.8030.9690.9680.9320.8390.9220.930.8480.7561
90%0.8140.8310.9220.9470.9510.6890.9290.9130.90.6450.8860.8160.9560.9340.7422

Note: The last column represents how many of the non-random selection strategies outperformed the random case.

Table 5

Classification error in active inference based on LBP for distinct selection strategy; initial nodes taken directly from the ranking

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
AMD10%0.9310.9310.8820.8680.9310.9310.9270.9130.9130.8960.8790.9170.9170.9130.8630
20%0.9380.9380.9340.8910.9380.9380.9380.8950.8950.8980.930.8980.8980.8950.8170
30%0.8970.8970.8930.8880.8970.8970.8930.8840.8840.8840.8970.8660.8660.8840.7930
40%0.9010.9010.9120.9170.9010.9010.9010.8960.8960.9170.9060.8960.8960.9170.770
50%0.8940.8940.8880.90.8940.8940.8940.9250.9250.9190.8940.9250.9250.9440.7290
60%0.8910.8910.8520.9060.8910.8910.8910.8830.8830.9220.8670.9140.9140.9220.7140
70%0.8330.8330.8650.9170.8330.8330.8330.9480.9480.9380.8650.9480.9480.9480.6920
80%0.8280.8280.8440.9840.8280.8280.8280.9530.9530.9530.7970.9530.9530.9530.6810
90%0.8750.8750.8750.9060.9060.9060.8750.9380.9380.9690.8440.9380.9380.9380.6510
NET_SCIENCE10%0.9950.9860.9920.9950.9990.9990.9980.9860.9970.9980.9990.9990.9990.9840.9590
20%0.9860.9860.9920.9860.9990.9990.9970.9770.9870.9930.9970.9990.9990.9620.9290
30%0.9810.9840.9920.9790.9990.9990.9980.9650.9740.9880.9970.9990.9990.9370.9040
40%0.9810.970.9930.9670.9990.9990.9980.9660.970.9820.9950.9990.9990.9320.8830
50%0.9820.9670.9990.960.9990.9990.9990.960.9630.9810.9960.9990.9990.9320.8610
60%0.9830.9660.9980.9560.9980.9980.9980.9570.9680.9740.990.9980.9980.9280.8480
70%0.9750.9610.9980.9570.9980.9980.9980.9640.9750.9640.9950.9980.9980.9390.840
80%0.9630.9520.9970.9590.9970.9970.9970.9560.9730.9590.9970.9970.9970.9180.8270
90%0.9590.9390.9930.9590.9930.9930.9930.9390.9730.9460.9930.9930.9930.8640.8130
PAIRS_FSG10%0.9650.8930.9570.9360.9950.9410.9680.9210.9380.9410.9360.9410.920.9081
20%0.9460.8770.9190.9760.9430.9250.9940.8890.9460.9340.9820.9790.9730.8890.860
30%0.9860.8760.8930.9930.9210.9690.9970.8480.9720.9380.9970.9720.9760.8480.8440
40%0.9720.8590.9320.9920.980.9610.8230.9640.9120.9960.9760.9720.8230.8230
50%0.290.2720.2780.3050.3160.2850.2950.2740.2950.2590.2580.2750.2680.2760.2450
60%0.2840.2610.2760.3150.2860.2890.30.2590.2820.240.2410.2730.2620.2650.2340
70%0.3090.2570.2870.2980.280.3090.3180.2570.2570.2470.2320.2760.2570.270.2230
80%0.330.2570.3090.2760.2430.3260.3470.2480.250.2370.2070.2630.2550.2730.2131
90%0.4580.2530.4510.2940.2450.4560.4760.2670.2310.2350.190.2350.2590.2920.2081
Table 5

(Continued)

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
PAIRS_FSG_SM10%0.6670.7220.6780.6810.7260.6810.6650.7120.6710.7160.7270.7090.6850.6840.6661
20%0.6660.630.6790.6690.6920.6730.6790.6650.6640.6530.6480.7450.660.6390.6391
30%0.680.6410.6960.6560.6820.6960.7020.6930.6440.6230.6230.640.6320.6030.6618
40%0.6830.650.7110.7150.7430.7060.720.620.6430.5780.6730.6350.5930.5910.6647
50%0.6880.6360.7390.6540.6940.7510.7580.590.640.570.6180.6090.5850.5660.6910
60%0.6850.6570.7680.6760.7050.7810.7960.5670.6410.5480.6010.6050.5630.590.5914
70%0.6640.6380.8150.6880.7050.8370.8420.5160.630.5250.5890.620.5110.5760.6216
80%0.7250.6360.8910.7280.7430.9060.8980.7510.6180.4890.6180.5880.4910.5370.667
90%0.7210.6650.9750.8530.7720.990.9950.4320.5890.4370.5890.5530.4110.4570.6698
YEAST10%0.8650.7960.8250.8690.8450.8730.9480.8410.8890.930.9530.9520.8480.8635
20%0.830.7460.7950.8120.8220.840.8680.81710.8840.9030.950.920.8220.8032
30%0.8190.7370.7760.7370.7950.8320.8450.80610.8730.8950.9460.9030.80.7752
40%0.8120.7290.7770.7480.8040.8310.8350.7910.9920.8630.8960.9460.8860.7640.7431
50%0.8120.7370.7760.7750.7670.8140.8430.7550.9350.8520.8890.9310.8150.7340.6960
60%0.7970.7110.7750.8140.7730.8250.8220.730.9470.8660.920.9220.8050.7350.743
70%0.8150.7480.8120.8110.80.8350.8080.6860.8660.8490.8890.9210.7660.7350.6780
80%0.8370.7890.8030.8060.7950.8370.820.6510.8960.7950.7650.9150.7380.7740.6631
90%0.8140.7640.7810.7760.760.8860.7760.650.8690.7090.6840.9160.6670.8230.6712
CS_PHD10%0.9980.9990.9990.9980.9970.99910.99910.9990.9980.9980.9980.9910.9972
20%0.9720.9980.9990.980.9790.98210.99910.9940.980.980.980.9680.9969
30%0.9450.9840.9760.9570.9560.94810.9990.9990.9930.9560.9570.9560.9460.99210
40%0.9330.950.9530.9310.9290.92910.9980.9970.9780.9290.9310.9330.9390.98711
50%0.940.9380.9470.940.9380.92810.9980.9940.9770.9380.940.9420.9060.9811
60%0.9250.9320.9460.9410.9390.94110.9980.9860.9790.9390.9410.9390.9110.97610
70%0.9280.9530.9370.9280.9250.925110.9660.9750.9280.9280.9280.890.96711
80%0.9110.9480.9390.930.9250.9310.9910.8970.9720.930.930.9480.8830.96511
90%0.9350.9530.9250.9160.9160.91610.9530.8970.9440.9160.9160.9440.8320.95813

Note: The last column represents how many of the non-random selection strategies outperformed the random case.

Table 6

Classification error in active inference based on ‘measure’-neighbour version of distinct selection strategy with LBP

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
AMD10%0.7890.8270.8410.8450.7750.810.810.8370.8140.8390.8210.830.8440.8180.86314
20%0.7350.7360.7290.7310.7440.7340.7030.760.70.7370.7420.7640.7460.7760.81714
30%0.6760.6480.6790.680.6570.6750.6560.6630.6710.6740.640.6860.6440.6960.79314
40%0.6310.6230.5950.5890.5690.5780.6070.6580.5850.560.5820.6240.6330.6110.7714
50%0.5570.5490.5410.6080.5660.5320.6030.5790.5640.580.5280.5490.4920.5450.72914
60%0.5030.4920.5220.5280.4550.4540.5280.5360.5360.5420.50.4950.5080.4620.71414
70%0.4320.4390.4130.4470.4430.4420.4570.4540.4940.5230.4840.4570.4640.4550.69214
80%0.4180.4030.3860.4750.4360.4010.4470.4240.4110.4580.3610.3940.3990.4730.68114
90%0.4020.3280.2990.3930.3720.3230.3940.4120.380.3730.2950.3140.4130.3450.65114
NET_SCIENCE10%0.9710.920.9260.9230.9690.9760.9620.9230.9640.9720.96910.9990.9020.9595
20%0.9030.8660.9180.8480.9510.9480.9340.8780.9150.9310.9490.9990.9990.8180.9297
30%0.8630.8390.8970.7870.9190.9290.9030.8050.8580.8960.9230.9980.9970.740.9049
40%0.8310.7720.8780.7380.9070.9020.8930.7920.8470.8590.880.9960.9970.6990.8839
50%0.7860.7450.8460.6840.8720.8730.8560.730.8130.7970.8640.9960.9710.6450.8619
60%0.7760.7030.8310.6470.8410.8310.8210.6990.750.7850.8350.9930.9370.590.84812
70%0.720.6930.7860.6360.7820.7860.7970.6610.7310.7390.8290.9920.9160.6120.8412
80%0.7020.6760.7290.6410.7520.7420.7670.6680.6930.6530.7930.9960.8670.5910.82712
90%0.6760.6450.710.6440.7150.7120.7320.6350.6880.6390.7520.9870.8140.5540.81312
PAIRS_FSG10%0.9160.9130.8920.9020.9150.8880.890.87610.9160.9270.9060.9220.8640.9087
20%0.8770.8390.840.9090.8580.8950.9060.8610.9110.8730.9540.9170.9130.7970.864
30%0.8540.8350.7910.9020.8090.8980.8640.7920.8590.8820.9430.9070.9150.8210.8445
40%0.7970.8340.8120.910.750.8920.7980.7950.8580.8510.9540.9170.9160.7570.8236
50%0.2390.1990.230.2270.2250.2340.2360.2010.220.1880.2070.2030.3320.180.24513
60%0.2220.1880.2160.2140.2140.2310.2310.1730.20.1820.1830.1930.1710.1780.23414
70%0.2140.1910.2030.1950.20.2190.2320.1630.1950.1640.1770.1920.1610.1610.22313
80%0.2050.1690.2060.1750.1830.2160.2090.140.1830.150.1690.1730.1510.1540.21313
90%0.2040.1720.210.1690.1710.2040.20.1350.1680.1370.1680.1580.1340.1330.20813
Table 6

(Continued)

DatasetLab. nodesTop indegreeTop outdegreeTop betweennessTop clust. coeff.Top hubnessTop authorityTop page rankDown indegreeDown outdegreeDown betweennessDown clust. coeff.Down hubnessDown authorityDown page rankRandom# better
PAIRS_FSG_SM10%0.6160.6050.6330.6150.6390.6340.6320.6020.6140.7080.6210.7360.7390.5910.66611
20%0.7380.5590.5960.5870.5830.6140.6220.7450.5440.5370.7390.7220.7250.5220.6399
30%0.5470.530.5690.5660.5560.5750.7340.5160.5210.6940.740.6990.6880.490.6619
40%0.5190.7240.540.5310.5480.7330.5810.7170.7150.6750.7050.4850.6660.7170.6646
50%0.730.4840.5360.4980.5310.7480.5550.7040.7150.7220.6970.6950.4320.6670.697
60%0.4940.7060.5210.4850.5210.7420.740.6770.7080.3730.7310.7370.3850.40.5917
70%0.7280.7320.4910.4690.4880.750.7310.4160.7110.3880.740.6930.6770.7140.6215
80%0.7260.7260.7420.4330.4710.5160.7550.7050.7040.3640.6540.6790.7290.3390.666
90%0.7050.740.6990.7120.4560.7330.7370.720.6930.6570.6820.710.6470.6640.6694
YEAST10%0.7750.7580.7770.8170.770.8220.8170.78110.8260.8730.8650.8410.8330.86311
20%0.70.7450.6990.7140.7130.7280.740.7210.8370.8540.8830.8560.8150.8038
30%0.8070.6390.6310.6560.680.8560.6710.74410.7610.7480.8690.6970.7510.77510
40%0.6140.5770.5640.6460.6040.6340.8150.7450.9460.8120.8190.7640.8090.6660.7437
50%0.7580.5350.550.6190.5540.6110.6270.6460.8150.770.7740.7180.770.6680.6968
60%0.6890.5110.5620.7580.7530.5630.5870.6320.690.6620.7470.7550.6070.5940.7410
70%0.6780.7010.6640.570.5220.7770.5520.5720.7290.6060.5950.5930.5760.5440.67811
80%0.6850.7030.5190.740.5280.6880.530.6740.6970.5790.7340.6840.4960.5310.6636
90%0.7020.4960.5260.5670.7140.5150.6940.6330.5190.7010.5090.7210.4760.4960.6719
CS_PHD10%110.999111111110.9950.9950.9940.9973
20%0.9990.9990.9990.9990.9941110.9980.9990.9860.9950.9960.9980.9964
30%0.9970.990.9990.9980.9970.99810.99810.9980.9960.9890.9940.990.9923
40%0.980.9990.9910.9860.9860.99110.9970.9980.9750.9780.9840.9920.9876
50%0.9960.9930.9820.9780.9810.9830.9980.9990.9970.9990.9820.990.9830.9880.981
60%0.9810.9870.9950.9930.9910.9840.9970.9940.9860.9940.980.9770.9750.990.9761
70%0.9870.9820.9540.9920.9970.9730.9990.9990.9950.9920.9680.9730.9760.9790.9671
80%0.9920.9860.9790.9920.9780.988110.990.9970.9620.9780.9740.9720.9651
90%0.9880.9880.9860.9940.980.9810.9940.9810.9790.980.9820.980.9840.9670.9580

Note: The last column represents how many of the non-random selection strategies outperformed the random case.

Table 7

The comparison showing how often top or bottom of ranks for given measures outperformed each other; all datasets merged

In-degreeOut-degreeBetweennessClustering coeff.Hub centralityAuthorityPage rank







TopBottomTopBottomTopBottomTopBottomTopBottomTopBottomTopBottom
LBP1953521832393028352731301359
LBP-neighbour2744561538344229522036362052
ICA4824373340313631373348235217
ICA-neighbour3834403233393537413141312844

Notes: The number indicated how many times results of a particular method (top or bottom) was better than another one. It happened in some cases that both methods were equal and provided the same error level. Overall, there were 72 comparisons.

6.1.Influence of network characteristics on classification results

To facilitate the analysis of the large number of experimental results, for each group of networks presented in Section 5.2, the appropriate ‘results profile’ has been created (see Table 2). Classification performed on the networks from Group 1, which can be characterised as networks that exhibit small-world phenomenon, features good accuracy of ‘measure’-neighbour methods with LBP neighbourhood (bottom page rank) outperforming other approaches. In addition, most of the ‘measure’-neighbour methods outperform random case. Classification error, which for this group is high for small training sets, substantially decreases for the bigger training sets. Classification results for the second group of networks exhibit relatively small variance of error. In general, the more classes in the dataset, the worse classification accuracy. Moreover, for smaller modularity and density as well as greater clustering coefficient LBP neighbour approach outperforms others.

Finally, for the third group of networks, which are close to random networks with a very low connectivity probability, ‘measure’-neighbour methods are worse than original and random approaches. The classification results are rather poor for all cases, but due to the fact that ICA method does not depend on connections within network for classification, it outperforms LBP-based methods.

Fig. 5.

Robust fit regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when ICA method with regular selection strategy is better than random one (sum of the last column from Table 3 for each network). Plot on the right does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Robust fit regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when ICA method with regular selection strategy is better than random one (sum of the last column from Table 3 for each network). Plot on the right does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)
Fig. 6.

Regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when ICA approach with ‘measure’-neighbour selection strategies is better than random one (sum of the last column from Table 4 for each network). Right plot does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when ICA approach with ‘measure’-neighbour selection strategies is better than random one (sum of the last column from Table 4 for each network). Right plot does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)
Fig. 7.

Regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when LBP method with regular selection strategy is better than random one (sum of the last column from Table 5 for each network). Plot on the right does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when LBP method with regular selection strategy is better than random one (sum of the last column from Table 5 for each network). Plot on the right does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)
Fig. 8.

Regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when LBP approach with ‘measure’-neighbour selection strategies is better than random one (sum of the last column from Table 6 for each network). Right plot does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Regression between clustering coefficient (left plot)/average path length (right plot) and the number of times when LBP approach with ‘measure’-neighbour selection strategies is better than random one (sum of the last column from Table 6 for each network). Right plot does not include CS_PHD network as the network is not connected so average path length is not informative. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Additionally, we performed a set of robust fit regressions in order to investigate the relation between structural properties and the results of the classifications (see Figs 5, 6, 7, and 8). For this part of the study by results we understand the number of times when a given approach (ICA measure based, ICA measure-neighbour based, LBP measure based, LBP measure-neighbour based) was better than random for each analysed network (sum of the last column from Tables 3, 4, 5, 6 for each network). We took into account two metrics: (i) clustering coefficient and (ii) average path length as those are the measures that enable to classify networks as random, small-world or ordered ones. Results show that for measure-based methods (for both ICA and LBP approaches) the smaller the clustering coefficient and the bigger the average path length the better the performance of the classification. Exactly opposite trend is visible for measure-neighbour approaches. However, we should rather neglect the results for average path length as in all cases R2 is very close to 0. Concentrating on clustering coefficient metric and the obtained results we can make a recommendation that for networks that are disconnected and very random in their nature (Group 3) we should use measure-based selection strategies and for networks that are connected (Groups 1 and 2) we should rather apply ‘measure’-neighbour selection strategies.

6.2.Representativeness of the selected training sets

While considering the relational classification results, it should be investigated to what extend the selected nodes used for training and propagation appropriately represent the whole network, especially in terms of class conditional distribution.

In order to assess the representativeness of selected training set, the standard Kullback–Leibler divergence (a.k.a. relative entropy) was used which is a measure of the difference between two probability distributions. It measures how much information is lost when one probability distribution (in our case it is a distribution of classes in a given sample – 10%, …, 90% of the whole dataset) is used to approximate another one; in here it is the probability distribution of classes in the whole dataset. The smaller the divergence, the smaller loss; 0 means that no information is lost and that both distributions are the same. Results for all datasets are presented in the Online Supplemental Material in Section 3. Here, we present only the calculations for two networks: (i) PAIRS_FSG with 3 classes (Fig. 9) and (ii) NET_ SCIENCE with 26 classes (Fig. 10). The former has the smallest number of classes and the latter the largest number of classes out of all tested datasets.

Fig. 9.

Kullback–Leibler divergence for the PAIRS_FSG network. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Kullback–Leibler divergence for the PAIRS_FSG network. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)
Fig. 10.

Kullback–Leibler divergence for the NET_SCIENCE network. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Kullback–Leibler divergence for the NET_SCIENCE network. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)
Fig. 11.

Percentage of the classes uncovered in the initial node set used for learning; ID – indegree, OD – outdegree, B – betweenness, CC – clustering coefficient, H – hubness, A – authority, PR – page rank. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Percentage of the classes uncovered in the initial node set used for learning; ID – indegree, OD – outdegree, B – betweenness, CC – clustering coefficient, H – hubness, A – authority, PR – page rank. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

Both Figs 9 and 10 show the Kullback–Leibler divergence for the selected networks and different structural measures used to create the rankings of nodes and for three methods of ranking ordering: descending, ascending and random. Although, the divergence is generally bigger for NET_SCIENCE network than PAIRS, the absolute values are relatively small. The highest value is 0.105 for NET_SCIENCE, for the ranking created using hub measure and if only 10% of nodes with the smallest value of hub measure was selected. Moreover, for all networks (please see the Online Supplemental Material) the divergence is the highest for small sample sizes (10%, 20% and 30%). This is intuitive, but what is more important, the maximum value of divergence never exceeds 1 (see the Online Supplemental Material). Taking into account the fact that the limit of the measure is infinity, this value is acceptable from the perspective of data sampling.

The fall of Kullback–Leibler divergence values with the increasing percentage of nodes used for learning is quite obvious and visible in Figs 9 and 10. Even for not perfect distribution adjustment for smaller contribution of selected nodes we achieve very good representativeness (KL divergence value at the level of 0.01) already for 50% of nodes, see Fig. 10.

One of the main challenges in active learning and inference is to acquire all classes that exist within a given dataset during the initial node selection. The sampling quality can be also measured by assessing what the percentage of uncovered classes in the process of sampling is. It is very important as if not all classes are discovered in the phase of uncovering initial labels then the method will not be able to generalize these classes during the classification process. The percentage of classes uncovered during each selection of initial nodes process is presented in Fig. 11. Networks PAIRS_FSG, PAIRS_FSG_SMALL and YEAST are neglected as no matter what method was used always all classes were uncovered in the selection process. Also the classification results for those networks are relatively good when comparing with the remaining datasets.

Fig. 12.

The visualisation of the NET_SCIENCE network; colours represent classes (labels). (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

The visualisation of the NET_SCIENCE network; colours represent classes (labels). (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)
Fig. 13.

The comparison of the number of nodes which theoretically should be sampled against actually sampled for the YEAST dataset and the ‘measure’-neighbour method. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

The comparison of the number of nodes which theoretically should be sampled against actually sampled for the YEAST dataset and the ‘measure’-neighbour method. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-150686.)

The smallest percentage of classes has been discovered for CD_PHD network. In some cases even if 90% of data was sampled, there were still some classes that stayed uncovered. Comparing this outcome with the classification results, it can be noticed that classification error for this data set is very high – not smaller than 0.96 for LBP (‘measure’-neighbour version) and 0.55 for ICA (Fig. 4). This is also partially visible for the NET_ SCIENCE network: not all classes are being uncovered for 10% or 20% and the classification error for this percentages exceeded 0.8 (Fig. 4). This mainly results from the profile of these networks. They are compounded of dozens of classes and not all of them can be found within 10% or 20% of selected nodes, see Table 1.

6.3.Top vs. bottom selection from rankings

Next step of the analysis is to determine how the results are influenced by using the nodes from top or bottom of particular rankings.

The results revealed that in most cases the methods using top nodes from ranks were better (see Table 7). However, some exceptions from this rule can be noticed. LBP was performing better for in-degree based ranks, if used nodes from the bottom of ranks. It was because nodes with low in-degree in some datasets had large out-degree, so they were able to propagate the label effectively and it was the same label within their direct neighbours, see e.g. Fig. 12.

Due to the fact that the selection of training set based on ‘measure’-neighbour sampling is heavily dependent on the structure of the network, it happened that the number of neighbouring nodes utilized for learning and inference was smaller than the nominal number of nodes taken from the ranking. According to the nature of LBP, while using this algorithm regardless of the training set selection method (original and ‘measure’-neighbour one), if a node taken from the ranking has no neighbour, the information about its label will not be propagated. On the other hand, the ICA method is able to overcome this problem and the label may be assigned to even disconnected nodes. This phenomenon can be observed e.g. for the YEAST dataset in Fig. 13. In general, this LBP drawback did not influence the results so much, except one dataset – CS_PHD, where the network was highly disconnected and almost no nodes were labelled in the ‘measure’-neighbour algorithm. In other cases, LBP-neighbour method outperformed typical LBP in both approaches – top and bottom – despite the fact that less nodes were used as an input for classification algorithm.

7.Conclusions and future work

Active learning is an important problem that occurs when we need to specify what network sample should be taken to initially acquire its node labels (classes) in order to classify the rest of the network. In this paper various strategies of active learning for within-network classification were studied.

In particular, two representative classification algorithms: locally-driven Iterative Classification Algorithm (ICA) and globally-based Loopy Belief Propagation (LBP) were tested.

For each of them, seven main structure-based measures for node ranking were experimentally examined: (1) indegree, (2) outdegree, (3) betweenness, (4) clustering coefficient, (5) hubness, (6) authority and (7) page rank. Additionally, a new ‘measure’-neighbour set of methods was proposed in Section 3.3. Its novel idea is to select for the initial acquisition not the nodes taken from a given ranking but their neighbours. Besides, for each ranking list either top or bottom nodes were considered for label discovery. In total, 29 selection methods were tested: 14 for original structural measures (7 measures with either ‘top’ or ‘bottom’ approach), 14 for ‘measure’-neighbour selection and the random one. All of them were compared with each other.

Experiments were carried out on six real-world datasets with different network profiles and diverse number of classes.

The outcomes revealed that depending on both (1) network profiles and (2) complexity of class conditional probabilities the distinct settings of the methods perform differently. For example, inference applied within networks that exhibits small-world properties (with high clustering coefficient) give good performance for ‘measure’-neighbour methods. However, this does not hold for random networks with very low connectivity where ‘measure’-neighbour approaches are outperformed by original and random methods (networks from Group 3).

Also, the results significantly depend on the distribution of individual classes among nodes with a given measure, e.g. top linked nodes (high degree) can belong to some classes more frequently than on average within the network (class imbalance). It is quite visible, if we compare results for selection methods with not close to zero values of Kullback–Leibler divergence between the selected set and the entire node set. The classification accuracy for such approaches is usually worse than in the cases when class distributions for the sample set and for the whole dataset match closely each other.

Overall, the new ‘measure’-neighbour selection methods proposed in the paper performed better than their original approaches. They also more often surpassed the random selection in the final inference error level. It leads to one of the main findings of this paper: the relational inference is more effective if we learn on the labels of neighbours of the nodes with a given structural property (degree, page rank, etc.) rather than on the labels of those nodes.

It should be also emphasized that none of the presented methods was able to generalize the datasets with many classes (>10) at the satisfactory level, especially for smaller percentage of learning nodes.

In general, the experimental results presented in the paper have shown that the final classification quality depends on many factors like selection strategy, size of the learning set (percentage of all nodes), inference algorithm and network specific profile. However, in each case we can find many selection strategies that result in lower level of classification error than simple random approach. This is very important for real-world applications especially when the total cost of wrong classification is high. The better and well adapted to a given environment selection mechanism, the lower misclassification level and lower costs. It plays an important role e.g. in marketing of frequently changing products or services where it is hardly possible to collect feedback from larger communities of potential customers. Also at high risk screening for very rare diseases but with very expensive diagnostic tests and fatal consequences, it is difficult to acquire an accurate group of patients that may suffer from such illness. Thus, more effective methods, including network-based, need to be applied. Better initial selection methodologies for complex networks may reduce costs in such cases.

The development of general adaptation rules that would enable to adjust node selection method to the network structural profile and class distributions is a new future research direction derived from the paper conclusions. Additionally, active learning can be seen as an iterative process with adaptive selection of more and more nodes. It would complicate the learning even more.

Acknowledgements

This work was partially supported by the European Union as part of the European Social Fund, the European Commission under the 7th Framework Programme, Coordination and Support Action, Grant Agreement Number 316097, ENGINE – European research centre of Network intelliGence for INnovation Enhancement (http://engine.pwr.wroc.pl/), and The National Science Centre, the decision no. DEC-2013/09/B/ST6/02317.

References

1 

[[1]] D. Angluin, Queries and concept learning, Machine Learning 2 (1988), 319–342.

2 

[[2]] J. Attenberg and F. Provost, Online active inference and learning, in: Proceedings of the Seventeenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.

3 

[[3]] M. Balcan, A. Beygelzimer and J. Langford, Agnostic active learning, in: Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 65–72.

4 

[[4]] A. Beygelzimer, S. Dasgupta and J. Langford, Importance weighted active learning, in: Proceedings of the 26th International Conference on Machine Learning, 2009.

5 

[[5]] M. Bilgic and L. Getoor, Effective label acquisition for collective classification, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 43–51.

6 

[[6]] M. Bilgic and L. Getoor, Active inference for collective classification, in: Proceedings of Twenty-Fourth Conference on Artificial Intelligence AAAI’10, AAAI Press, Palo Alto, CA, 2010.

7 

[[7]] L. Breiman, Bagging predictors, Machine Learning 24(2) (1996), 123–140.

8 

[[8]] L. Breiman, Random forests, Machine Learning 45(1) (2001), 5–32.

9 

[[9]] D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li and R. Chen, Topological structure analysis of the protein–protein interaction network in budding yeast, Nucleic Acids Research 31(9) (2003), 2443–2450.

10 

[[10]] C. Desrosiers and G. Karypis, Within-network classification using local structure similarity, in: European Conference, ECML PKDD 2009, Lecture Notes in Computer Science, Vol. 5781, 2009, pp. 260–275.

11 

[[11]] H. Eldardiry and J. Neville, Across-model collective ensemble classification, Association for the Advancement of Artificial Intelligence, 2011.

12 

[[12]] H. Eldardiry and J. Neville, An analysis of how ensembles of collective classifiers improve predictions in graphs, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012.

13 

[[13]] A. Fast and D. Jensen, Why stacked models perform effective collective classification, in: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, IEEE, New York, 2008, pp. 785–790.

14 

[[14]] B. Gallagher and T. Eliassi-Rad, Leveraging label-independent features for classification in sparsely labeled networks: An empirical study, in: SNA-KDD’08, ACM, New York, 2008.

15 

[[15]] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984), 721–741.

16 

[[16]] T. Kajdanowicz and P. Kazienko, Collective classification, in: Encyclopedia of Social Network Analysis and Mining, Springer, New York, 2013.

17 

[[17]] T. Kajdanowicz, P. Kazienko and M. Janczak, Collective classification techniques: An experimental study, New Trends in Databases and Information Systems 185 (2012), 99–108.

18 

[[18]] K. Kazienko and T. Kajdanowicz, Label-dependent node classification in the network, Neurocomputing 75(1) (2012), 199–209.

19 

[[19]] P. Kazienko, K. Musial and T. Kajdanowicz, Multidimensional social network in the social recommender system, IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans 41(4) (2011), 746–759.

20 

[[20]] A. Knobbe, M. de Haas and A. Siebes, Propositionalisation and aggregates, in: Proceedings of Fifth European Conference on Principles of Data Mining and Knowledge Discovery, 2001, pp. 277–288.

21 

[[21]] A. Kuwadekar and J. Neville, Relational active learning for joint collective classification models, in: Proceedings of the 28th International Conference on Machine Learning, 2011, pp. 385–392.

22 

[[22]] Q. Lu and L. Getoor, Link-based classification, in: Proceedings of 20th International Conference on Machine Learning ICML, 2003, pp. 496–503.

23 

[[23]] K. Musial and P. Kazienko, Social networks on the Internet, World Wide Web Journal 16(1) (2013), 31–72.

24 

[[24]] K. Musiał, P. Kazienko and P. Bródka, User position measures in social networks, in: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, SNA-KDD’09, ACM, New York, 2009, pp. 6:1–6:9.

25 

[[25]] G. Namata, B. London, L. Getoor and B. Huang, Query-driven active surveying for collective classification, in: Workshop on Mining and Learning with Graphs, International Conference on Machine Learning ICML, 2012.

26 

[[26]] M.E.J. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical Review E 74 (2006), 036104.

27 

[[27]] W. Nooy, A. Mrvar and V. Batagelj, Exploratory Social Network Analysis with Pajek, Cambridge University Press, Cambridge, 2004, Chapter 11.

28 

[[28]] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Mateo, 1988.

29 

[[29]] M. Rattigan, M. Maier and D. Jensen, Exploiting network structure for active inference in collective classification, in: Proceedings of ICDM Workshop on Mining Graphs and Complex Structures, 2007.

30 

[[30]] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher and T. Eliassi-Rad, Collective classification in network data, Artificial Intelligence Magazine 29(3) (2008), 93–106.

31 

[[31]] B. Settles, Active learning literature survey, Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 1995.

32 

[[32]] B. Settles, From theories to queries: Active learning in practice, in: JMLR Workshop and Conference Proceedings, Vol. 16, 2011, pp. 1–18.

33 

[[33]] H. Seung, M. Opper and H. Sompolinsky, Query by committee, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 287–294.

34 

[[34]] M. Sugiyama, Active learning for misspecified models, Advances in Neural Information Processing Systems 18 (2006), 1305–1312.

35 

[[35]] B. Taskar, P. Abbeel and D. Koller, Discriminative probabilistic models for relational data, in: Proceedings of 18th Conference in Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco, 2002.