You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Fuzzy-Rough induced spectral ensemble clustering

Abstract

Ensemble clustering helps achieve fast clustering under abundant computing resources by constructing multiple base clusterings. Compared with the standard single clustering algorithm, ensemble clustering integrates the advantages of multiple clustering algorithms and has stronger robustness and applicability. Nevertheless, most ensemble clustering algorithms treat each base clustering result equally and ignore the difference of clusters. If a cluster in a base clustering is reliable/unreliable, it should play a critical/uncritical role in the ensemble process. Fuzzy-rough sets offer a high degree of flexibility in enabling the vagueness and imprecision present in real-valued data. In this paper, a novel fuzzy-rough induced spectral ensemble approach is proposed to improve the performance of clustering. Specifically, the significance of clusters is differentiated, and the unacceptable degree and reliability of clusters formed in base clustering are induced based on fuzzy-rough lower approximation. Based on defined cluster reliability, a new co-association matrix is generated to enhance the effect of diverse base clusterings. Finally, a novel consensus spectral function is defined by the constructed adjacency matrix, which can lead to significantly better results. Experimental results confirm that the proposed approach works effectively and outperforms many state-of-the-art ensemble clustering algorithms and base clustering, which illustrates the superiority of the novel algorithm.

1Introduction

Clustering is an unsupervised learning method that usually refers to dividing existing unlabeled instances into several clusters according to the similarity between objects without any prior information, making the instances in the same cluster have a higher similarity and in different clusters have a more substantial discrepancy [9, 11, 43]. Ensemble clustering utilises a consensus function to unify multiple types of partitions of the same dataset into one clustering result. It usually constructs a base clustering pool by repeatedly running a single clustering approach or executing multiple clustering algorithms. Then, the consensus function is built through voting methods, hypergraph partitioning, or evidence accumulation to obtain more optimal clustering results [18, 25]. Many existing ensemble clustering studies have confirmed that ensemble clustering can usually improve the clustering result compared to a single clustering algorithm [1, 14, 24].

1.1Background

Existing established clustering algorithms are mainly based on the theories of model, grid, density, partition, and hierarchy [2, 8]. Different types of clustering algorithms are good at solving diverse types, distributions, and scales of data. In particular, with the development of deep learning [27], the performance of various clustering methods has been further improved. For example, in [28], a deep-learning feature extractor for time-series data is designed for relation extraction, and the clustering effect achieved significant improvement. Nonetheless, in view of the unknown data distribution in actual problems, it is difficult to determine which clustering algorithm can get better clustering results. Conventional solutions often try different methods and choose the algorithm that performs best. Ensemble clustering is expected to establish a general scheme to combine the advantages of multiple clustering algorithms and form the optimal clustering result. It is especially feasible under the conditions of mature distributed computing technology, so as to adapt to unknown and complex data.

The related studies to ensemble clustering are mainly divided into three categories: pair-wise co-occurrence based, graph partitioning based, and median partition based algorithms [18]. The first type refers to constructing a co-occurrence matrix by finding the times of all instances that occur in pairs (assigned as a cluster) in base clusterings. The two instances should be classified into the same cluster in the final clustering based on co-occurrence [19]. The similarity function constructed by the co-occurrence matrix can be used in any similarity matrix based clustering algorithm to acquire the final optimal clustering result, such as hierarchical clustering and spectral clustering [7, 30]. The idea of co-occurrence matrix was first proposed in [5]. Correspondingly, a method of evidence accumulation clustering (EAC) based on this theory was proposed for the ensemble clustering problem. Subsequent researches have made various improvements, such as using the technique of normalised edges and matrix completion [29, 45]. In graph partitioning, the graph model and consensus function are usually constructed to partition the graph into multiple parts representing the final cluster. The primary purpose of graph partitioning is to achieve k-way min-cut partitioning, ensuring that the similarity between subgraphs is as tiny as possible [32]. Constructing a graph model is predominantly based on instances (vertices in hypergraph) or clusters (hyperedges in hypergraph) in base clustering. For example, the cluster-based similarity partitioning algorithm (CSPA) considers the local piecewise similarity and constructs a similarity graph as well as a graph partitioning method to perform ensemble clustering [23]. Compared with CSPA, the link-based ensemble clustering constructs a dense graph with the implied similarity between each instance and individual cluster; the clustering possesses a significant effect but needs too many computations [36]. The last type (median partition based algorithms) transforms ensemble clustering into an objective optimisation problem, which finds a median partition most similar to each base clustering by solving the objective function [17]. However, the issue is NP-hard [4]. Fortunately, some deconstructions, such as using expectation maximisation (EM) [40] and weighted consensus clustering (WCC) [33], have been proposed to find approximate solutions. In addition to the common types introduced above, ensemble algorithms based on voting [21], mutual information [20], finite mixture model [3] and other theories [22, 39] are also meaningful research directions in the field of ensemble clustering.

1.2Motivations

Ensemble clustering is mainly divided into two steps. One is to generate a base clustering pool, for example, running the same clustering algorithm multiple times with different parameters, running various clustering algorithms multiple times, and performing clustering in subspaces. The other step is to select a consensus function, mainly based on the theories such as co-occurrence matrix, graph segmentation, and information entropy. An overview of ensemble clustering algorithms is depicted in Fig. 1.

Fig. 1

The outline of ensemble clustering.

The outline of ensemble clustering.

In the numerous types of ensemble clustering solutions, the pair-wise co-occurrence based algorithms are pretty naive, easy to implement and have played a massive role in ensemble clustering fields. Nevertheless, these algorithms always treat all clusters in the base clustering equally, ignoring the difference of the clusters [35]. Some attempts have been used in cluster weighting to distinguish the effect of different clusters, such as weighting schemes of information entropy [14] and random walk [16]. The authors used related theories to distinguish different clusters and mine implicit relationships between instances. Corresponding experiments proved that it is effective to distinguish different clusters. However, these approaches always try to complete the ensemble clustering without the joining of features, but only the labels of base clusterings, which may lose some vital information implied in data features.

Fig. 2

The motivation of proposed method.

The motivation of proposed method.

Compared with the algorithms considering base clustering results only, effectively combining base clustering and original features helps further improve the performance of ensemble clustering. Fuzzy-rough sets offer a high degree of flexibility in enabling the vagueness and imprecision present in real-valued data to be simultaneously modelled effectively [12, 38]. The idea of upper and lower approximation can well depict the membership of instances to each category, which is helpful for measuring the cluster reliability in ensemble clustering. The motivation of the proposed method is described in Fig. 2, and the solid black arrow and blue box are used to illustrate the difference with similar works, which means the joining of original features while distinguishing different cluster reliability.

1.3Contributions

To distinguish the validity of different clusters and combine the role of features, in this paper, the fuzzy-rough lower approximation is used to induce cluster reliability in all base clusterings. A novel fuzzy-rough induced spectral ensemble clustering (FREC) algorithm is proposed to enhance the performance of pair-wise co-occurrence based ensemble clustering. The contribution of the paper is threefold:

  • 1 Proposing the novel idea of cluster reliability through the fuzzy-rough lower approximation of each instance to enable the distinction of diverse cluster significance during clustering;

  • 2 Developing a new adjacency matrix based on cluster reliability to effectively enhance the effect of diverse base clusterings and improve the clustering performance;

  • 3 Establishing a consensus function and spectral ensemble clustering algorithm with its superiority confirmed through a comparative study and analysis on various benchmark datasets.

The experiment compares eleven state-of-the-art clustering algorithms on ten benchmark datasets, as well as the parallel algorithm that ignores the difference of clusters in base clustering. The result shows that FREC achieves a significant clustering performance. As the ensemble size increases, FREC achieves a superior effect.

The remainder of the paper is structured as follows. The preliminaries of the rough set and fuzzy-rough set are introduced in Section 2. The FREC algorithm is introduced in detail in Section 3. In Section 4, the experimental results are given and analysed. Finally, a summary is presented in Section 4.

2Preliminaries

This section reviews the mathematical concepts concerning rough set and fuzzy-rough set, which are relevant to the reliability of the cluster developed in this paper.

2.1Rough set

The study on rough sets theory [13, 41, 49] provides a methodology that can be employed to extract knowledge from a domain in a concise way: it is able to minimise information loss whilst reducing the amount of information involved. Central to rough set theory is the concept of indiscernibility. Let (U,A) be an information system, where U is a set of instances and A is a set of attributes (features) such that a:UVa for every aA . Va is the set of values that attribute a may take. For each feature subset PA , an associated P-indistinguishable relation can be determined:

(1)
IND(P)={(x,y)U2aP,a(x)=a(y)}.

Obviously, IND (P) is an equivalence relation on U . The partition of U determined by IND (P) is herein denoted by U/P which can be defined such that

(2)
U/P={U/a|aP}.

where ⊗ is defined as follows for sets V and W:

(3)
VW={XY|XV,YW,XY}.

For any object xU , the equivalence class determined by IND (P), is denoted by [xP. Let XU . X can be approximated using only the information contained in P by constructing the P-lower and P-upper approximations of X [48]:

(4)
P_X={x[x]PX},
(5)
P¯X={x[x]PX}.
The pair P_X,P¯X is called a rough set. Informally, the former depicts the set of those objects which can be said with certainty to belong to the concept approximated, and the latter is the set of objects which either definitely or possibly belong to the concept approximated. The difference between the upper and lower approximations is the area known as the boundary region that represents the area of uncertainty. If the boundary region is empty, there is no uncertainty regarding the concept which is being approximated and all objects belong to the subset of objects of interest with full certainty.

2.2Fuzzy-rough set

Fuzzy-rough sets [6, 12, 38] encapsulate the related but distinct concepts of vagueness (for fuzzy sets) and indiscernibility (for rough sets), both of which occur as a result of uncertainty in knowledge. Compared to rough sets, fuzzy-rough sets offer a high degree of flexibility in enabling the vagueness and imprecision present in real-valued data to be simultaneously modelled effectively. In fuzzy-rough sets, the fuzzy lower and upper approximations to approximate a fuzzy concept X can be defined as:

(6)
μRP_X(x)=infyUI(μRP(x,y),μX(y)),
(7)
μRP¯X(x)=supyUT(μRP(x,y),μX(y)).

Here, I is a fuzzy implicator and T is a T-norm. RP is a T-transitive fuzzy similarity relation induced by the subset of features P:

(8)
μRP(x,y)=TaP{μRa(x,y)},
where μRa (x, y) is the degree to which object x and y are similar for feature a, and may be defined in many ways, for example:
(9)
μRa(x,y)=exp(-(a(x)-a(y))22σa2),
(10)
μRa(x,y)=max(min((a(y)-(a(x)-σa))(a(x)-(a(x)-σa)),((a(x)+σa)-a(y))((a(x)+σa)-a(x))),0).
Same as their crisp parallels, μRP_X(x) and μRP¯X(x) indicate the degrees to which the object x must and may belong to the approximated fuzzy concept X, respectively.

3Fuzzy-rough induced spectral ensemble clustering

3.1The unacceptable degree of clusters

The validity of a cluster can be well judged by considering the unacceptable degree (UD) of clusters in a base clustering. In multiple base clusterings, if the assignment of a cluster in one base clustering is consistently agreed by other base clusterings, this cluster should play a more critical role in the final consensus clustering. At the same time, if the assignment of a cluster is constantly negated by other base clusterings, the cluster should play a minor role.

For illustration purposes, some formalised descriptions are first introduced below. Let U={xi|i1,2,...,n} be an instances set, xi is an instance which contains d features. B = {βj|j ∈ 1, 2, . . . , m} is a set of base clusterings where βj={βjk|k1,2,...,K} indicates the instances set in the k-th cluster of the j-th base clustering. The degree of xi belongs to βjk with the features set A can be calculated by the fuzzy lower approximation μRA_βjk(xi) .

Since for every fuzzy implicator I , there is I(x,1)=1 , μRA_βjk(xi) can be simplified into:

(11)
μRA_βjk(xi)=infyUI(μRA(xi,y),μβjk(y))=min{minyβjk{I(μRA(xi,y),1)},minyβjk{I(μRA(xi,y),0)}}=minyβjk{I(μRA(xi,y),0)}.

As proved in [31], if I belongs to S-implications, QL-implications or R-implications which enjoys contrapositive symmetry, it is that I(x,0)=N(x) , where N is the strong negator to induce I . In particular, for the classical strong negation NC(x)=1-x , Equation (11) can be further modified to:

(12)
μRA_βjk(xi)=minyβjk{1-(μRA(xi,y))}=1-maxyβjk{μRA(xi,y)}.

Equation (12) implies that the lower approximation of xi to βjk depends on the most similar instance in different clusters, which has a crucial role in ensemble clustering. It indicates that the farther the two clusters are, the greater the lower approximation of each instance to the cluster to which it belongs. At the same time, it means that the distinction between clusters is more obvious, that is, the cluster allocation scheme is more reasonable.

For different base clusterings, the assignment of clusters is distinct, but the data location is fixed, that is, multiple base clusterings are acting on the same dataset. For a specific cluster in one base clustering, the resulting assignment has two cases:

  • 1 Another base clustering approves this assignment;

  • 2 Another base clustering denies this assignment.

In this paper, a novel concept of UD is proposed to metric the cluster reliability. Here, two exemplar artificial datasets D1 (shown in Fig. 3) and D2 (shown in Fig. 4) are employed to illustrate the UD of the two cases.

Fig. 3

Two exemplar base clusterings β1 and β2 in dataset D1.

Two exemplar base clusterings β1 and β2 in dataset D1.

The first case is relatively simple, as shown in Fig. 3, including two exemplar base clusterings β1 and β2 in D1. For a specifically given cluster (e.g., β11 ) in base clustering β1, considering the distribution of this cluster in another base clustering β2, an obvious fact is that if the particular cluster in β1 is a subset of one cluster in β2, the assignment of the cluster (e.g. β11 ) can be considered to be fully admitted by β2. At this point, the UD of the specific cluster in β1 is 0, which means that the objects of the cluster in β1 meeting the above condition can be divided into one cluster in both base clustering β1 and β2.

For the three clusters ( β11 , β12 , β13 ) of β1 shown in Fig. 3(a), considering the base clustering β2, β11 is a subset of β21 , and both β12 and β13 are subsets of β22 , so the UD is 0 for all three clusters in β1. It is worth noting that this relationship is not symmetric, that is, the clusters in β1 are admitted by β2, which does not mean that the clusters in β2 are admitted by β1.

Another situation is shown in Fig. 4. For a particular cluster in β3, if the cluster objects are split into a plurality of clusters in another base clustering β4, it can be considered that the specific cluster is not admitted or accepted by β4. Here, β31 is split into β41 and β42 by β4, and β32 is split into β43 and β44 by β4, which means β4 disagrees with the allocation of β31 and β32 . Further, the UD can be well measured by the lower approximation of cluster objects in β3 to the base clustering β4.

Fig. 4

Two exemplar base clusterings β3 and β4 in dataset D2.

Two exemplar base clusterings β3 and β4 in dataset D2.

More specifically, for a specific cluster in a base clustering, considering its position distribution in another base clustering, if the objects of this particular cluster have a more significant lower approximation to the cluster in which the objects relocate in another base clustering, it indicates that the given cluster prefers the allocation of another base clustering. At the same time, it also means the extent of another base clustering does not accept the assignment of the given cluster.

Objects from βjk may be located in one or multiple clusters in another base clustering βl, such indeterminate clusters can be represented by

(13)
Rjlk={βls|βjkβls,βlsβl},lj

For the example cluster β31 in Fig. 4(a), the set R341 is { β41,β42 }. Based on the analysis above, the UD γjlk of the cluster βjk relative to another base clustering βl can be defined as:

(14)
γjlk={0,|Rjlk|=1,1|βjk|βlsRjlkxiβjkβlsμRA_βls(xi),|Rjlk|>1.
where |Rjlk| indicates the number of the clusters in set |Rjlk| , and |βjk| represents the number of the instances in βjk .

To illustrate the concepts involved, the objects of the exemplar clusters β31 and β32 in Fig. 4(a) are given in Table 1, the relocated clusters in β4 are recorded in Table 2.

Table 1

Two exemplar clusters in Fig. 4(a)

ClusterSamplex1x2x3x4x5x6x7x8x9x10
β31 x-axis1.01.01.52.21.81.92.51.52.53.0
 y-axis4.54.84.13.43.54.74.75.55.53.8
ClusterSamplex11x12x13x14x15x16x17x18x19x20
β32 x-axis5.85.15.56.26.58.29.08.78.59.0
 y-axis5.35.06.05.05.51.02.01.50.51.0
Table 2

Relocated cluster of the objects in β31 and β32

Sample ( β31 )x1x2x3x4x5x6x7x8x9x10
Relocated cluster β41 β42
Sample ( β32 )x11x12x13x14x15x16x17x18x19x20
Relocated cluster β43 β44

Take x1, x6, x11 and x16 as an example (located in diverse clusters in β4), the respective lower approximation of the above four objects to β4 are obtained by using the Algebraic T-norm TP(a,b)=ab and the fuzzy similarity function (9):

x1β41μRA_β41(x1)=0.05,x6β42μRA_β42(x6)=0.05,x11β43μRA_β43(x11)=0.91,x16β44μRA_β44(x16)=0.94.

Through further calculations, the lower approximation of all objects in β3 to the base clustering β4 are shown in Table 3.

Table 3

Lower approximation of β31 and β32 to the base clustering β4

ClusterRelocationx1x2x3x4x5x6x7x8x9x10
β31 β41 0.050.050.070.060.10-----
β42 -----0.050.110.100.190.07
ClusterRelocationx11x12x13x14x15x16x17x18x19x20
β32 β43 0.910.900.960.860.91-----
  β44 -----0.940.860.910.970.95

Then, the UD of β31 and β32 to base clustering β4 is computed by Equation (14), there is

γ341=0.05+0.05+0.07+0.06+0.10+0.05+0.11+0.10+0.19+0.0710=0.09γ342=0.91+0.90+0.96+0.86+0.91+0.94+0.86+0.91+0.97+0.9510=0.92

Considering the ground distribution of β31 and β32 in Fig. 4, apparently, the allocation scheme of β31 is more reasonable, with a lower UD of 0.09. While β32 groups the objects with an enormous difference, the UD is 0.92. The result shows that the UD acquired is consistent with the actual situation of the clusters.

Further, the global UD γjk of a cluster in βj to the remaining m - 1 base clustering can be calculated as follows:

(15)
γjk=1m-1ljγjlk,l=1,2,...,m.

The unacceptable degree computing (UDC) algorithm is outlined in Algorithm 1. Given the inputs U and m, the first step is to initialise the set Γ empty and generate m base clusterings by any constructed method, such as repeating k-means m times and using diverse results from multiple clustering algorithms [34]. The loop in Lines 3 to 14 traverses each cluster in all base clusterings and computes the UD. Specifically, βjk represents the current cluster to calculate UD, and βl indicates any cluster in the base clusterings set except βj. Mean(·) defines an average process, and all computed UD is stored in Γ. Finally, in Line 15, the Γ is returned for subsequent calculations.

Algorithm 1

Unacceptable Degree Computing (UDC)

UDC ( U , m)

Input:

   U , input space containing n objects,

  m, number of base clusterings.

Output:

  Γ, set of unacceptable degree of all clusters,

  B, set of base clusterings.

1:  Initialise: Γ = ∅.

2:  B ← Generate m base clustering.

3:  foreach βj in B do

4:   foreach βjk in βj do

5:    S = ∅

6:    foreach βl in {B - βj} do

7:      Rjlk Equation (13)

8:      γjlk Equation (14)

9:     S = S γjlk

10:    end

11:     γjk = Mean(S)

12:    Γ = Γγjk

13:   end

14:  end

15:  return Γ, B

3.2Defining the co-association matrix

Fig. 5

Matrices O3, O4 and A of the example in Fig. 4.

Matrices O3, O4 and A of the example in Fig. 4.
Fig. 6

Redefined Matrices O˜3 , O˜4 and A˜ of the example in Fig. 4.

Redefined Matrices 
O˜3
, 
O˜4
 and 
A˜
 of the example in Fig. 4.

The co-association matrix is obtained by summing and averaging a series of co-occurrence matrices, and it represents the frequency with which two objects co-occur in multiple base clusterings. Each base clustering βj produces a separate co-occurrence matrix, which can be expressed as

(16)
Oj={oihj}n×n,
where oihj represents whether xi and xh co-occur in βj. Let Cj (xi) indicate the serial number of the cluster to which xi belongs in the j-th base clustering, oihj can be denoted by
(17)
oihj={1,Cj(xi)=Cj(xh),0,Cj(xi)Cj(xh).
Further, the co-association matrix can be expressed as

(18)
A={aih}n×n,
where aih is calculated by
(19)
aih=1mj=1moihj.

Given the exemplar clusters in Fig. 4 and corresponding coordinates in Table 1, the matrices O3, O4 and A are recorded in Fig. 5. Take x1 and x6 as an example (marked as a blue box):

C3(x1)=C3(x6)o163=1,o613=1,C4(x1)C4(x6)o164=0,o614=0.

Then, the elements a16 and a61 of A are calculated by

o163=1,o164=0a16=(1+0)/2=0.5,o613=1,o614=0a61=(1+0)/2=0.5.

A higher UD for a cluster indicates that other base clusterings are more likely to disapprove of the cluster’s allocation scheme. At this point, the role of the cluster should be weakened. Otherwise, the function of the cluster should be reinforced. Therefore, the reliability of a cluster can be described as a decreasing function of the UD. In this paper, the reliability of the k-th cluster in βj is defined as

(20)
μjk=1-γjk.

Similar to Equations (16), (17), (18) and (19), the redefined co-occurrence matrix O˜j and co-association matrix A˜ are expressed as

(21)
O˜j={o˜ihj}n×n,

(22)
A˜={a˜ih}n×n,
where
(23)
o˜ihj={μjk,Cj(xi)=Cj(xh),0,Cj(xi)Cj(xh).
(24)
a˜ih=1mj=1mo˜ihj.

Again, for the example of x1 and x6,

C3(x1)=C3(x6)o˜163=0.91,o˜613=0.91,C4(x1)C4(x6)o˜164=0,o˜614=0.

Then, the elements a˜16 and a˜61 of A˜ are computed by

o˜163=0.91o˜164=0}a˜16=(0.91+0)/2=0.46,o˜613=0.91o˜614=0}a˜61=(0.91+0)/2=0.46.

The matrices O˜3 , O˜4 and A˜ of the exemplar clusters in Fig. 4 are recalculated and shown in Fig. 6.

Algorithm 2

Co-Association Matrix Construction (CMC)

CMC ( U , Γ, B)

Input:

U , input space containing n objects,

  Γ, set of unacceptable degree of all clusters,

  B, set of base clusterings.

Output: A˜ , co-association matrix.

1:  Initialise:    A˜={a˜ih}n×n(a˜ih=1),O˜j={o˜ihj}n×n(o˜ihj=0) .

2:  foreach i = 1 to n do

3:   foreach h = i + 1 to n do

4:    Q = ∅

5:    foreach βj in B do

6:     if Cj (xi) ≠ Cj (xh) then o˜ihj = 0

7:     else

8:      k = Cj (xi)

9:       μjk = 1-γjk

10:       o˜ihj = μjk

11:     end

12:     Q = Q o˜ihj

13:    end

14:     a˜ih = Mean(Q)

15:     a˜hi = a˜ih

16:   end

17:  end

18:  return A˜

The co-association matrix construction (CMC) algorithm is detailed in Algorithm 2. Firstly, the initialised step is performed. Lines 2 to 17 represent the overall process, including the main loop to identify the co-occurrence and co-association matrices. Note that all matrices are calculated only for upper triangular due to the symmetry. In Line 15, the lower triangular matrix of A˜ directly takes the values from the existing result in Line 14, which can significantly diminish unnecessary calculations. The final co-association matrix A˜ is output in Line 18.

3.3Consensus function

A mapping from a set of clusterings to a single final clustering is called a consensus function. Considering the superior performance of spectral clustering in complex shapes and cross data, in this paper, the optimised co-association matrix is used in the spectral method to acquire the consensus result.

Given a graph model G = (V, L), where V indicates the vertexes set and L represents the links set. Its adjacency matrix can be constructed in various ways, such as considering the neighbours or defining the distance threshold. Let the objects in U be the vertexes in the graph, the adjacency matrix can be expressed as A˜ . It means that if aih is 0, there is no edge connection between xi and xh. Otherwise, the edge exists and the similarity is aih. From this, the diagonal matrix D can be expressed by

(25)
D={dih}n×n,
where
(26)
dih={0,ih,q=1na˜iq,i=h.

The Laplacian matrix L of the graph G can be further defined as

(27)
L=D-A˜.

Normalisation makes the diagonal entries of the Laplacian matrix to be all units and scales off-diagonal entries correspondingly. In this case, the normalised Laplacian matrix Lnor is defined as

(28)
Lnor=D-12LD-12.

The components of the eigenvectors corresponding to the smallest eigenvalues of the graph Laplacian can be used for meaningful clustering [42]. In Equation (28), the eigenvectors corresponding to the first K smallest eigenvalues of Lnor will be used in an independent clustering algorithm, generally k-means, due to its speed and efficiency.

As summarised in Algorithm 3, D and Lnor are computed sequentially in Lines 2 to 6. EV(·) in Line 7 represents a function that generates the first K eigenvectors, and k-means(·) in Line 8 indicates a fast clustering algorithm detailed in [34]. Finally, in Line 9, R is used to return the consensus clustering result.

Algorithm 3

Consensus Spectral Clustering (CSC)

CSC ( A˜ , K)

Input:

   A˜ , co-association matrix,

  K, number of clusters.

Output: R , consensus result.

1:  Initialise: D = {dihn×n (dih = 0) .

2:  foreach i = 1 to n do

3:   dii = q=1na˜iq

4:  end

5:   L=D-A˜

6:   Lnor=D-12LD-12

7:  F← EV(Lnor)

8:   R k-means(F, K)

9:  return R

3.4Fuzzy-rough induced spectral ensemble clustering

According to the description of the above three subsections, the overall fuzzy-rough induced spectral ensemble clustering is depicted in Algorithm 4. Given a dataset U , the number of base clusterings m and the actual clusters number K. Algorithm 1 is first executed to generate the UD set Γ of all clusters and the set of base clusterings B in Line 1. Further, the result returned by Algorithm 1 is combined with the dataset U as the parameters of Algorithm 2 to calculate the co-association matrix A˜ . Finally, the matrix A˜ is used in the consensus spectral clustering (CSC) to generate the final clustering result R .

Algorithm 4

Fuzzy-Rough Induced Spectral Ensemble Clustering (FREC)

FREC ( U , m, K)

Input:

   U , input space containing n objects,

  m, number of base clusterings,

  K, number of clusters.

Output: R , consensus result.

1: Γ, B = UDC( U , m) //Algorithm 1

2:  A˜ = CMC( U , Γ, B) //Algorithm 2

3:  R = CSC( A˜ , K) //Algorithm 3

4: return R

4Experimental evaluation

This section presents the experimental evaluation of FREC and other algorithms on ten popular datasets contained in UCI1 repository. For convenience, datasets Cardiotocography, Image Segmentation, and Steel Plates Faults are represented by abbreviations Cardio, IS, and SPF, respectively. After introducing the experimental setup, the results and discussion are divided into five parts. Section 4.2 analyses the tendency of clustering effect as the number of ensembles increases. To test the impact of cluster reliability induced by fuzzy-rough lower approximation, Section 4.3 compares the effect of FREC and the original parallel algorithm EAC on all benchmark datasets. Besides, the average result of 100 base clusterings is also used to compare and validate the ensemble performance. In Sections 4.4 and 4.5, a detailed analysis of FREC and other state-of-the-art clustering algorithms is reported. Finally, the time complexity of the proposed method and running time of each algorithm are analysed in Section 4.6.

Table 4

Benchmark datasets used for evaluation

DatasetsAttributesClassSize
Heart132270
Cleveland135297
Dermatology346358
Movement9015360
Appendicitis72106
Led7digit710500
Mammographic52830
Cardio21102126
IS1972130
SPF2771941

4.1Experimental setup

In the experimental investigation, all datasets are normalised first. Homogeneity score (HS) and normalised mutual info (NMI) are used to evaluate the performance of the separate clustering method [10, 15]. The base clustering pool B in Algorithm 1 is generated by running the k-means method 100 times, where the K is randomly selected from the interval [2, n ], in line with [14]. In Section 4.2, the ensemble size increases sequentially from 10 to 100 with a step size of 10. Meanwhile, each ensemble algorithm of a specific size is run 100 times, and the results are averaged. Considering the excellent results of ensemble clustering at larger ensemble number, ensemble size is set to 100 for all ensemble methods in Sections 4.3 and 4.4. At the same time, for a fair comparison, the different algorithms are run 100 times to get the average results.

Fig. 7

HS results with the ensemble size.

HS results with the ensemble size.

Ten state-of-the-art ensemble clustering algorithms, namely, locally weighted evidence accumulation (LWEA) [14], locally weighted graph partitioning (LWGP) [14], probability trajectory accumulation (PTA) [16], probability trajectory based graph partitioning (PTGP) [16], ensemble clustering by propagating cluster-wise similarities with hierarchical consensus function (ECPCS-HC) [18], ensemble clustering by propagating cluster-wise similarities with meta-cluster-based consensus function (ECPCS-MC) [18], evidence accumulation clustering (EAC) [5], weighted evidence accumulation clustering (WEAC) [15], graph partitioning with multi-granularity link analysis (GPMGLA) [15], and spectral ensemble clustering (SEC) [26] are selected to compare the ensemble performance of FREC. Moreover, two other non-ensemble state-of-the-art clustering methods, deep temporal clustering representation (DTCR) [37] and robust temporal feature network (RTFN) [50] are also used to compare the performance of the newly proposed method. For FREC, Łukasiewicz t-norm and Equation (9) are used to calculate the fuzzy similarity. As for other compared algorithms, there is no extra parameter for EAC, and the specific parameters of the remaining methods are set according to the recommendations or optimal values given in the corresponding papers. More specifically, the core settings are listed as follows.

  • - LWEA, LWGP: θ = 0.4;

  • - PTA, PTGP: K=N/2 , T=N/2 where N indicates the number of the graph nodes;

  • - ECPCSHC, ECPCSMC: t = 20;

  • - WEAC, GPMGLA: α = 0.5, β = 2;

  • - SEC: μ = 1;

  • - DTCR: m1 = 100, m2 = 50, m3 = 50, λ = 1e - 1, lr = 1e - 4;

  • - RTFN: CNNchannel = 128, kernelsize = 11, lrate (0) =0.01, drate = 0.1.

Fig. 8

NMI results with the increased ensemble size.

NMI results with the increased ensemble size.
Fig. 9

NMI results of FREC versus the parallel EAC and base clustering.

NMI results of FREC versus the parallel EAC and base clustering.

4.2The influence of ensemble size

Figures 7 and 8 show the experimental results for different ensemble sizes, including two indexes of HS and NMI on ten benchmark datasets. The various contrastive algorithms use different colours, lines and markers, as the legend details. Apparently, for the proposed FREC, as the ensemble size increases, the results on nearly all datasets deliver an upward trend regardless of the evaluation criteria, which is in line with the objective of ensemble methods. More specifically, considering HS, FREC shows significantly superior performance on datasets Heart, Appendictis, Led7digit and Mammographic, i.e. no matter how large the ensemble size is, FREC can invariably outperform the other ten ensemble techniques. For Dermatology and SPF, if the ensemble size is less than 40, the effect of FREC is slightly lower than that of GPMGLA and LWEA, respectively, but exceeds that of the other nine methods. If the ensemble size is more significant than 40, the proposed method performs superiorly, outperforming all different ways. While Cleveland, Cardio, and IS are not optimal in all ensemble sizes, FREC can always exceed most comparison approaches and always shows the best or second best performance if the ensemble size is maximum. Finally, for Movement, as the ensemble size increases, the performance of FREC tends to stabilise rapidly, and there is no apparent transformation trend. However, it can still surpass almost all contrastive algorithms.

While using the evaluation index NMI, the results are resemblance. For Heart, Appendictis, Led7digit and Mammographic, FREC can accomplish best values at any ensemble size, and the performance grows and stabilises as the ensemble size boosts. Regarding Cardio, although FREC is slightly lower than some methods if the ensemble size is less than 90, FREC still performs satisfactorily if the ensemble size is the largest, which ranks third. The curves of FREC may be slightly lower for the remaining datasets than individual algorithms at a particular ensemble size. Still, FREC can consistently achieve better results, especially at the largest ensemble size.

4.3Comparison to the parallel EAC and base clustering

The algorithm EAC, which means the original co-association based ensemble scheme that does not consider cluster reliability, always conducts poorly in both HS and NMI. As exampled in Fig. 7(b), 7(d), 7(h) and 7(iFor the three clusters), EAC conveys the worst effect regardless of the ensemble size by comparing the other ten ensemble algorithms. To more comprehensively recognise the effect of using the cluster reliability induced by fuzzy-rough set, this part compares FREC and the parallel EAC in detail in the form of a histogram.

Since all ensemble strategies primarily achieve more satisfactory results at larger ensemble sizes, FREC and EAC use the pool containing 100 base clusterings. Moreover, both algorithms are run 100 times to acquire the average results. In addition, the algorithms in the base clustering pool (k-means) are also averaged to compare the ensemble performance.

Table 5

HS results of FREC versus other ensemble algorithms

DatasetHeartClevelandDermatologyMovementAppendicitis
AlgorithmAveBestAveBestAveBestAveBestAveBest
LWEA0.192 v0.1920.244 v0.2440.858 v0.8580.546 v0.5460.041 v0.041
LWGP0.214 v0.2140.260 v0.2600.851 v0.8510.580 v0.5940.244 v0.245
PTA0.001 v0.0010.217 v0.2170.603 v0.6030.564 v0.5640.041 v0.041
PTGP0.001 v0.0010.220 v0.2200.603 v0.6030.567 v0.5730.041 v0.041
ECPCSHC0.001 v0.0010.241 v0.2410.859 v0.8590.521 v0.5210.198 v0.198
ECPCSMC0.243 v0.2430.250 v0.2500.912 v0.9120.581 v0.5840.180 v0.180
WEAC0.198 v0.1980.243 v0.2430.858 v0.8580.562 v0.5620.136 v0.136
GPMGLA0.235 v0.2350.256 v0.2560.907 v0.9070.590 v0.5900.180 v0.180
SEC0.224 v0.3800.227 v0.2270.732 v0.9230.520 v0.5840.122 v0.154
DTCR0.288 v0.2880.250 v0.2600.851 v0.8510.460 v0.4600.298 *0.298
RTFN0.375 v0.3750.270 *0.2700.880 v0.9230.558 v0.5580.248 v0.248
FREC0.380     0.3800.261     0.2610.926     0.9260.591     0.5910.253     0.253
Summary(11/0/0)(10/0/1)(11/0/0)(11/0/0)(10/0/1)
DatasetLed7digitMammographicCardioISSPF
AlgorithmAveBestAveBestAveBestAveBestAveBest
LWEA0.532 v0.5320.005 v0.0050.363 v0.3630.556 v0.5560.334 v0.334
LWGP0.565 v0.5660.005 v0.0050.368 v0.3820.560 v0.6100.328 v0.357
PTA0.474 v0.4740.005 v0.0050.330 v0.3300.643 *0.6430.285 v0.285
PTGP0.477 v0.5070.005 v0.0050.332 v0.3630.603 v0.6330.303 v0.341
ECPCSHC0.429 v0.4290.005 v0.0050.357 v0.3570.521 v0.5210.264 v0.264
ECPCSMC0.544 v0.5450.001 v0.0010.352 v0.3560.519 v0.5190.301 v0.301
WEAC0.532 v0.5320.237 v0.2370.368 v0.3680.524 v0.5240.330 v0.330
GPMGLA0.566 v0.5660.001 v0.0010.375 v0.3750.519 v0.5190.296 v0.296
SEC0.477 v0.5540.098 v0.2890.304 v0.3640.452 v0.6290.275 v0.376
DTCR0.500 v0.5000.300 *0.3000.330 v0.3300.535 v0.5350.260 v0.260
RTFN0.575 v0.5750.290 v0.2900.380 *0.3800.593 v0.5930.351 v0.351
FREC0.609     0.6090.292     0.2920.379     0.3800.613     0.6130.362     0.380
Summary(11/0/0)(10/0/1)(10/0/1)(10/0/1)(11/0/0)
Table 6

NMI results of FREC versus other ensemble algorithms

DatasetHeartClevelandDermatologyMovementAppendicitis
AlgorithmAveBestAveBestAveBestAveBestAveBest
LWEA0.192 v0.1920.221 v0.2210.875 v0.8750.581 v0.5810.041 v0.041
LWGP0.214 v0.2140.232 v0.2320.867 v0.8670.590 v0.6110.220 v0.221
PTA0.001 v0.0010.194 v0.1940.648 v0.6480.578 v0.5780.041 v0.041
PTGP0.001 v0.0010.197 v0.1970.648 v0.6480.581 v0.5890.041 v0.041
ECPCSHC0.001 v0.0010.217 v0.2170.912 v0.9120.564 v0.5640.185 v0.185
ECPCSMC0.242 v0.2420.224 v0.2240.912 v0.9120.593 v0.5940.157 v0.157
WEAC0.199 v0.1990.219 v0.2190.877 v0.8770.594 v0.5940.130 v0.130
GPMGLA0.234 v0.2340.229 v0.2290.906 v0.9060.597 v0.5970.157 v0.157
SEC0.231 v0.3890.216 v0.2950.758 v0.9230.547 v0.6060.127 v0.369
DTCR0.287 v0.2870.227 v0.2310.905 v0.9050.460 v0.4600.294 *0.294
RTFN0.382 v0.3820.245 *0.2450.869 v0.9230.584 v0.5840.241 v0.241
FREC0.389     0.3890.234     0.2340.925     0.9250.598     0.5980.247     0.247
Summary(11/0/0)(10/0/1)(11/0/0)(11/0/0)(10/0/1)
DatasetLed7digitMammographicCardioISSPF
AlgorithmAveBestAveBestAveBestAveBestAveBest
LWEA0.567 v0.5670.007 v0.0070.353 v0.3530.571 v0.5710.333 v0.333
LWGP0.573 v0.5770.007 v0.0070.357    0.3690.584 v0.6170.326 v0.349
PTA0.486 v0.4860.007 v0.0070.318 v0.3180.669 *0.6690.300 v0.300
PTGP0.497 v0.5240.007 v0.0070.316 v0.3430.618 v0.6500.294 v0.336
ECPCSHC0.487 v0.4870.007 v0.0070.355 v0.3550.566 v0.5660.305 v0.305
ECPCSMC0.559 v0.5600.002 v0.0020.353 v0.3560.562 v0.5620.314 v0.314
WEAC0.556 v0.5560.237 v0.2370.366 *0.3660.571 v0.5710.334 v0.334
GPMGLA0.572 v0.5720.002 v0.0020.373 *0.3730.563 v0.5630.311 v0.311
SEC0.526 v0.5850.104 v0.2920.303 v0.3540.505 v0.6600.275 v0.363
DTCR0.480 v0.4800.300 *0.3000.310 v0.3100.555 v0.5550.210 v0.210
RTFN0.584 v0.5850.292 v0.2920.361 *0.3610.575 v0.5750.339 v0.339
FREC0.610     0.6100.295     0.2950.358     0.3590.643     0.6430.342     0.361
Summary(11/0/0)(10/0/1)(7/1/3)(10/0/1)(11/0/0)

Without overloading similar results, the NMI is used to report the experiment evaluation. As shown in Fig. 9, FREC consistently achieves a better clustering effect relative to the parallel EAC and base clustering on all datasets. Especially for Heart, Dermatology, Movement and Mammographic, FREC reports the best clustering results while achieving a satisfactory improvement, illustrating the advantage which considers the cluster reliability and the superiority of FREC.

4.4Comparison to other clustering algorithms

In order to comprehensively evaluate the performance of the proposed algorithm, experimental comparisons are carried out against the other eleven state-of-the-art methods. The results are summarised in Tables 5 and 6. Note that the results of EAC have been analysed in Section 4.3 and will not be repeated in this part.

Recall reported results, ensemble algorithms always work best using more ensemble size. Thus, all comparison ensemble methods employ the results with an ensemble size of 100, and the average and best results for each dataset are shown in columns Ave and Best, respectively. To describe the experimental results more obviously, the best results are highlighted in bold. Moreover, the second best results are highlighted with an underline to make the information in the table easier to follow.

Considering the metric HS, FREC achieves optimal average performance relative to the other eleven algorithms in most datasets, including Heart, Dermatology, Movement, Leg7digit and SPF. As for the results in column Best, although a bit inferior to one or two approaches occasionally, FREC is the best or second best in most cases. For the remaining datasets, the average performance of FREC is slightly lower than individual algorithms. Still, FREC shows a satisfactory clustering effect compared with the other techniques.

Now, take an observation of the evaluation NMI, the clustering result is highly analogous to HS. For datasets Heart, Cleveland, Dermatology, Movement, Appendicitis, Leg7digit, Mammographic and SPF, the proposed method can consistently surpass most contrasting approaches. For dataset IS, consistent with the HS, FREC is slightly inferior to PTA, ranking second in all ensemble methods. The main difference from HS is the dataset Cardio. FREC is slightly lower than WEAC and GPMLGA by 0.008 and 0.015, respectively. Nevertheless, compared with the remaining algorithms, FREC still shows excellent performance.

In general, the average and best results of FREC are equal in most cases, which means that the FREC is relatively stable and the results are less serendipitous. At the same time, regardless of the average or best results, FREC always achieves the most significant or second best effect, which illustrates the rationality of the research in this paper.

4.5Statistical analysis

Paired t-test is used throughout the present experimental studies to show any statistically significant differences between different approaches. This helps ensure that the results are not obtained by chance. The t-test results are summarised at the end of each table, counting the number of statistically better(v), equivalent(space) or worse(*) cases for FREC in comparison to each algorithm. In all experiments reported, the threshold of significance is set to 0.05. For example, in Table 6, (11/0/0) in the column Heart indicates that the clustering performance returned by FREC performs better than other ensemble methods in eleven cases, equivalently well in no case, and worse than other approaches in no case. It can be clearly seen that whether the evaluation index is HS or NMI, the statistical results of FREC are better than other methods in most cases, especially for the HS indicator, FREC can surpass all other algorithms on more than half of the datasets. Statistical analysis experiments show that in 100 repeated experiments, the overall performance of FREC is relatively stable, which is better than most algorithms.

4.6Time complexity analysis

As shown in Algorithm 4, the computing cost of the proposed FREC mainly includes three parts: (1) For UDC, this function mainly consists of three loops with a time complexity of O (mk (k - 1)). In particular, each instance needs to traverse to find the lower approximation when calculating UD in the inner loop, and this process will consume O (n2); (2) As for CMC, this part mainly calculates the upper triangular matrix of A˜ , and the time complexity is O (n2k); (3) Finally, CSC computes the eigenvectors of adjacency matrix and performs fast clustering, with a time complexity of O (ndk). Thus, the total cost of FREC is O (mk (k - 1) + n2 + ndk).

Table 7

Time complexity comparison of different algorithms (seconds)

DatasetLWEALWGPPTAPTGPECPCSHCECPCSMCWEACGPMGLASECDTCRRTFNFREC
Heart0.010.120.020.060.180.210.011.650.0136.361.3418.76
Cleveland0.010.180.020.050.220.250.012.540.0138.521.4719.50
Dermatology0.010.110.010.060.240.310.013.470.01227.091.4919.59
Movement0.010.110.020.060.200.230.012.430.01437.691.6322.41
Appendicitis0.010.120.010.050.050.060.010.360.0115.510.184.34
Led7digit0.010.130.010.050.370.350.064.810.0128.612.6436.49
Mammographic0.010.310.010.040.760.720.1612.800.0138.082.8176.43
Cardio0.010.540.010.103.372.571.4555.960.01368.164.52404.51
IS0.010.560.010.063.843.271.6072.760.01352.294.48398.55
SPF0.010.390.010.062.762.211.1352.290.01428.804.36407.69

To compare the running time gap with other methods, the running time (seconds) of each algorithm is reported in Table 7. The experimental CPU used is i7-12700, and the memory is 24G. It can be seen that after considering the data features, the running time of FREC is significantly higher than that of other comparison methods. Especially as the number of instances continues to increase, the time of FREC increases significantly. In comparison, methods such as LWEA, SEC, and RTFN have achieved better running time. The above implementation shows that the time efficiency of FREC is relatively poor, which requires further optimisation in subsequent work.

5Conclusion

This paper explores the role of considering cluster reliability using fuzzy-rough set in co-occurrence based ensemble clustering, and guides a fuzzy-rough ensemble approach. The experimental results indicate that the reliability induced by fuzzy-rough lower approximation is effective and can be reasonably employed in the task of ensemble clustering. Compared with other ensemble algorithms that ignore attributes and only employ base clustering results, FREC demonstrates the advantage of viewing feature information. Meanwhile, compared with the parallel version and base clustering, FREC shows its superiority again.

Nonetheless, from the time experiment, the efficiency of FREC is relatively slow. This is mainly due to the high time complexity of the algorithm. For large sample datasets, it will take a lot of time to calculate the lower approximation for each object. In future work, the idea of KD-tree [44] can be introduced to improve the running speed of the algorithm further. In addition, in Equation (23), if the two instances do not belong to the same cluster, it may not be a better choice to assign the adjacency matrix to 0 directly. Further mining the implicit connection between instances of different clusters helps improve the clustering performance.

Whilst promising, further work remains. The performance of the ensemble strategy and multi-density cluster designs is worth further exploration. In addition, the implied relationship of the objects in the same base clustering but the different clusters is a valuable route of investigation. Moreover, a more comprehensive comparison of ensemble methods over diverse datasets from the real-world domains, such as mammographic risk assessment [46, 47] would construct the foundation for a broader series of issues for future research.

References

[1] 

Nazari A. , Dehghan A. , Nejatian S. , Rezaie V. and Parvin H. , Acomprehensive study of clustering ensemble weighting based oncluster quality and diversity, Pattern Analysis andApplications 22: ((2019) ), 133–145.

[2] 

Zubaroglu A. and Atalay V. , Data stream clustering: a review, Artificial Intelligence Review 54: ((2021) ), 1201–1236.

[3] 

Topchy A. , Jain A.K. and Punch W. , A mixture model for clustering ensembles, Proceedings of the 2004 SIAM International Conference on Data Mining, (2004), 379–390.

[4] 

Topchy A. , Jain A.K. and Punch W. , Clustering ensembles: Models of consensus and weak partitions, IEEE Transactions on Pattern Analysis and Machine Intelligence 27: ((2005) ), 1866–1881.

[5] 

Fred A.L. and Jain A.K. , Combining multiple clusterings using evidence accumulation, IEEE Transactions on Pattern Analysis and Machine Intelligence 27: ((2005) ), 835–850.

[6] 

Radzikowska A.M. and Kerre E.E. , A comparative study of fuzzy rough sets, Fuzzy Sets and Systems 126: ((2002) ), 137–155.

[7] 

Karna A. and Gibert K. , Automatic identification of the number ofclusters in hierarchical clustering, Neural Computing and Applications 34: ((2022) ), 119–134.

[8] 

Ghosal A. , Nandy A. , Das A.K. , Goswami S. and Panday M. , A short review on different clustering techniques and their applications, Emerging Technology in Modelling and Graphics, (2020), 69–83.

[9] 

Li C. , Cerrada M. , Cabrera D. , Sanchez R.V. , Pacheco F. , Ulutagayand G. and Valente de Oliveira J. , A comparison of fuzzy clustering algorithms for bearing fault diagnosis, Journal of Intelligent & Fuzzy Systems 34: ((2018) ), 3565–3580.

[10] 

Arisdakessian C.G. , Nigro O.D. , Steward G.F. , Poisson G. and Belcaid M. , CoCoNet: an efficient deep learning tool for viral metagenome binning, Bioinformatics 37: ((2021) ), 2803–2810.

[11] 

Wu C. , Peng Q. , Lee J. , Leibnitz K. and Xia Y. , Effective hierarchical clustering based on structural similarities in nearest neighbor graphs, Knowledge-Based Systems 228: ((2021) ), 107295.

[12] 

Dubois D. and Prade H. , Putting rough sets and fuzzy sets together, Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, 1992.

[13] 

Li D. , Zhang H. , Li T. , Bouras A. , Yu X. and Wang T. , Hybrid missing value imputation algorithms using fuzzy c-means and vaguely quantified rough set, IEEE Transactions on Fuzzy Systems 30: ((2021) ), 1396–1408.

[14] 

Huang D. , Wang C.-D. and Lai J.-H. , Locally weighted ensembleclustering, IEEE Transactions on Cybernetics 48: ((2017) ), 1460–1473.

[15] 

Huang D. , Lai J. and Wang C. , Combining multiple clusterings viacrowd agreement estimation and multi-granularity link analysis, Neurocomputing 170: ((2015) ), 240–250.

[16] 

Huang D. , Lai J. and Wang C. , Robust ensemble clustering using probability trajectories, IEEE Transactions on Knowledge and Data Engineering 28: ((2015) ), 1312–1326.

[17] 

Huang D. , Lai J. and Wang C. , Ensemble clustering using factorgraph, Pattern Recognition 50: ((2016) ), 131–142.

[18] 

Huang D. , Wang C. , Peng H. , Lai J. and Kwoh C. , Enhanced ensembleclustering via fast propagation of cluster-wise similarities, IEEE Transactions on Systems, Man and Cybernetics: Systems 51: (2018), 508–520.

[19] 

Huang D. , Wang C. , Wu J. , Lai J. and Kwoh C. , Ultra-scalablespectral clustering and ensemble clustering, IEEE Transactionson Knowledge and Data Engineering 32: ((2019) ), 1212–1226.

[20] 

Rashedi E. and Mirzaei A. , A hierarchical clusterer ensemble method based on boosting theory, Knowledge-Based Systems 45: (2013), 83–93.

[21] 

Saeed F. , Salim N. and Abdo A. , Voting-based consensus clusteringfor combining multiple clusterings of chemical structures, Journal of Cheminformatics 4: ((2012) ), 1–8.

[22] 

Yang F. , Li X. , Li Q. and Li T. , Exploring the diversity in clusterensemble generation: Random sampling and random projection, Expert Systems with Applications 41: ((2014) ), 4844–4866.

[23] 

Karypis G. and Kumar V. , A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM Journal on Scientific Computing 20: ((1998) ), 359–392.

[24] 

Li G. , Mahmoudi M.R. , Qasem S.N. , Tuan B.A. and Pho K.-H. , Clusterensemble of valid small clusters, Journal of Intelligent & Fuzzy Systems 39: ((2020) ), 525–542.

[25] 

Liu H.Q. , Zhang Q. and Zhao F. , Interval fuzzy spectral clustering ensemble algorithm for color image segmentation, Journal of Intelligent & Fuzzy Systems 35: ((2018) ), 5467–5476.

[26] 

Liu H. , Wu J. , Liu T. , Tao D. and Fu Y. , Spectral ensemble clustering via weighted k-means: Theoretical and practical evidence, IEEE Transactions on Knowledge and Data Engineering 29: (2017), 1129–1143.

[27] 

Xing H. , Xiao Z. , Qu R. , Zhu Z. and Zhao B. , An efficient federated distillation learning system for multitask time series classification, IEEE Transactions on Instrumentation and Measurement 71: ((2022) ), 1–12.

[28] 

Xing H. , Xiao Z. , Zhan D. , Luo S. , Dai P. and Li K. , Self Match: Robust semisupervised time-series classification with self-distillation, International Journal of Intelligent Systems 37: ((2022) ), 8583–8610.

[29] 

Yi J. , Yang T. , Jin R. , Jain A.K. and Mahdavi M. , Robust ensemble clustering by matrix completion, 2012 IEEE 12th International Conference on Data Mining, (2012), 1176–1181.

[30] 

Zhu J. , Jang-Jaccard J. , Liu T. and Zhou J. , Joint spectral clustering based on optimal graph and feature selection, Neural Processing Letters 53: ((2021) ), 257–273.

[31] 

Fodor J.C. , Contrapositive symmetry of fuzzy implications, Fuzzy Sets and Systems 69: ((1995) ), 141–156.

[32] 

Golalipour K. , Akbari E. , Hamidi S.S. , Lee M. and Enayatifar R. , From clustering to clustering ensemble selection: A review, Engineering Applications of Artificial Intelligence 104: (2021), 104388.

[33] 

Franek L. and Jiang X. , Ensemble clustering by means of clusteringembedding in vector spaces, Pattern Recognition 47: (2014), 833–842.

[34] 

MacQueen , Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1: (1967), 281–297.

[35] 

Zhang M. , Weighted clustering ensemble: Areview, Pattern Recognition, (2021), 108428.

[36] 

Iam-On N. , Boongoen T. and Garrett S. , Lce: a link-based cluster ensemble method for improved gene expression data analysis, Bioinformatics 26: ((2010) ), 1513–1519.

[37] 

Ma Q. , Zheng J. , Li S. and Cottrell G. , Learning representations fortime series clustering, Advances in Neural Information Processing Systems 32: ((2019) ).

[38] 

Jensen R. and Shen Q. , New approaches to fuzzy-rough feature selection, IEEE Transactions on Fuzzy Systems 17: ((2008) ), 824–838.

[39] 

Khedairia S. and Khadir M.T. , A multiple clustering combination approach based on iterative voting process, Journal of King Saud University-Computer and Information Sciences 34: ((2022) ), 1370–1380.

[40] 

Li T. and Ding C. , Weighted consensus clustering, Proceedings of the 2008 SIAM International Conference on Data Mining, (2008), 798–809.

[41] 

Zhang T. , Ma F. , Yue D. , Peng C. and O’Hare G.M. , Interval type-2fuzzy local enhancement based rough k-means clustering considering imbalanced clusters, IEEE Transactions on Fuzzy Systems 28: ((2019) ), 1925–1939.

[42] 

Von Luxburg U. , A tutorial on spectral clustering, Statisticsand Computing 17: ((2007) ), 395–416.

[43] 

Peng X. , Zhu H. , Feng J. , Shen C. , Zhang H. and Zhou J.T. , Deepclustering with sample-assignment invariance prior, IEEE Transactions on Neural Networks and Learning Systems 31: (2019), 4857–4868.

[44] 

Chen Y. , Zhou L. , Bouguila N. , Wang C. , Chen Y. and Du J. , Block-dbscan: Fast clustering for large scale data, Pattern Recognition 109: ((2021) ), 107624.

[45] 

Li Y. , Yu J. , Hao P. and Li Z. , Clustering ensembles based on normalized edges, Pacific-Asia Conference on Knowledge Discovery and Data Mining, (2007), 664–671.

[46] 

Qu Y. , Yue G. , Shang C. , Yang L. , Zwiggelaar R. and Shen Q. , Multi-criterion mammographic risk analysis supported withmulti-label fuzzy-rough feature selection, Artificial Intelligence in Medicine 100: ((2019) ), 101722.

[47] 

Qu Y. , Fu Q. , Shang C. , Deng A. , Zwiggelaar R. , George M. and Shen Q. , Fuzzy-rough assisted refinement of image processing procedure for mammographic risk assessment, Applied Soft Computing 91: ((2020) ), 106230.

[48] 

Yao Y. , Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences 111: ((1998) ), 239–259.

[49] 

Pawlak Z. , Rough sets: Theoretical aspects of reasoning about data, Springer Science & Business Media 9: ((2012) ).

[50] 

Xiao Z. , Xu X. , Xing H. , Luo S. , Dai P. and Zhan D. , RTFN: a robusttemporal feature network for time series classification, Information Sciences 571: ((2021) ), 65–86.