Next Article in Journal
Exploring the Predictors of Co-Nationals’ Preference over Immigrants in Accessing Jobs—Evidence from World Values Survey
Next Article in Special Issue
A Mathematical Model for Customer Segmentation Leveraging Deep Learning, Explainable AI, and RFM Analysis in Targeted Marketing
Previous Article in Journal
Multi-Layer Decomposition and Synthesis of HDR Images to Improve High-Saturation Boundaries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semi-Supervised Multi-Label Dimensionality Reduction Learning by Instance and Label Correlations

1
Yunnan Key Lab of Computer Technology Applications, Kunming University of Science and Technology, Kunming 650500, China
2
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
3
School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85287, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 30 December 2022 / Revised: 25 January 2023 / Accepted: 1 February 2023 / Published: 3 February 2023

Abstract

:
The label learning mechanism is challenging to integrate into the training model of the multi-label feature space dimensionality reduction problem, making the current multi-label dimensionality reduction methods primarily supervision modes. Many methods only focus attention on label correlations and ignore the instance interrelations between the original feature space and low dimensional space. Additionally, very few techniques consider how to constrain the projection matrix to identify specific and common features in the feature space. In this paper, we propose a new approach of semi-supervised multi-label dimensionality reduction learning by instance and label correlations (SMDR-IC, in short). Firstly, we reformulate MDDM which incorporates label correlations as a least-squares problem so that the label propagation mechanism can be effectively embedded into the model. Secondly, we investigate instance correlations using the k-nearest neighbor technique, and then present the l 1 -norm and l 2 , 1 -norm regularization terms to identify the specific and common features of the feature space. Experiments on the massive public multi-label data sets show that SMDR-IC has better performance than other related multi-label dimensionality reduction methods.

1. Introduction

Multi-label learning tasks are frequently accompanied by high-dimensional feature space, as are other machine learning paradigms. When learning directly from large-scale, high-dimensional data, algorithms usually fail to perform in classification [1]. First, the high-dimensional space’s overabundance of redundant and irrelevant information makes it more difficult for the model to identify the interclass structure of the data; second, the multicollinearity between the feature attributes of high-dimensional data results in the model’s poor generalization ability; and third, in high-dimensional feature spaces, traditional distances (such as the Euclidean distance) do not have the ability to measure the manifold structure between samples. In reality, Euclidean distance plays a significant role in the majority of classifiers. The term “dimensional curse” [2] is used to describe these problems. Dimensionality reduction becomes an important preprocessing step for multi-label learning on high-dimensional data as a result.
Contrary to the traditional single-label learning tasks, which presuppose that the labels of instances are mutually exclusive, the labels of multi-label instances are inter-correlated. In the multi-class classification task of photos, for instance, “desert” and “camel” coexist frequently, although “sea” does not frequently coexist with “desert”. People are motivated by this fact to learn about or label samples with unknown labels using known labels by relying on the correlations between labels. Unfortunately, it is challenging to incorporate such a label-learning mechanism into the model to estimate the labels of unlabeled instances due to the particularity of feature dimensionality reduction. Because of this, the majority of multi-label dimensionality reduction techniques that have been developed recently are supervised settings that require enough labeled instances [3,4,5,6]. These supervised approaches, sadly, fail to take into account the fact that, in many real-world applications, it is difficult to annotate enough unlabeled instances because of the high labeling costs. As a consequence, labeling a sizable training dataset is typically impractical in authentic situations [7]. Unlabeled instances, on the contrary, are typically available and plentiful. Then, certain semi-supervised multi-label dimensionality reduction techniques have been developed (see [8,9,10]), which can enhance learning performance by efficiently combining a large number of unlabeled instances with the scarce number of labeled instances.
Most of these semi-supervised dimensionality reduction techniques start by developing various label propagation methods based on label correlations, then applying them to the k-nearest neighbor (kNN) graph to generate soft labels for instances that are not labeled. When expanding labeled training samples, the projection matrix from high-dimensional space to low-dimensional space is trained on the assumption that the sample features’ between-class distance achieves the maximum and their within-class distance reaches the minimum. This approach, often known as MLDA (multi-label linear discriminant analysis) [6], or its promotion variant, is the core of these concepts. The instance correlations between the original feature space and low dimensional space are ignored by the MLDA framework, despite the fact that it can integrate label correlations.
Given that the multi-label data with high-dimensional features contains a significant amount of redundant and irrelevant information, it is necessary to selectively extract specific and common features of the feature space for concentration, while reducing dimensions, and to remove the negative effects of unimportant features. Feature extraction and feature selection are techniques that are comparable to this. The l 2 , 1 -norm is frequently employed in multi-label feature selection because it may choose distinguishing features for all instances with joint sparsity (each feature has a lower score for all instances or a higher score for all instances) [11,12]. The l 2 , 1 -norm’s drawback is equally clear though—it does not take into account the distinctive features or the redundant correlation of features [13]. To compensate for the shortcomings of the  l 2 , 1 -norm and improve the projection matrix’s ability to identify within-class features of samples, we use the  l 1 -norm as the model regularization term to learn the high sparsity specific features of the low-dimensional feature space of samples.
In this paper, we propose a novel method, namely semi-supervised multi-label dimensionality reduction learning by instance and label correlations (SMDR-IC) based on dependence maximization. This method effectively utilizes the information from both labeled and unlabeled instances, simultaneously considers the label and instance correlations, and also concurrently incorporates specific and common features of feature space. Our major contributions are summarized as follows.
The Hilbert–Schmidt Independence Criterion (HSIC) [14] has been mathematically shown to maximize the dependence between the original feature description and the associated class label. Motivated by this, we use the matrix factorization technique to reconstruct the HSIC empirical estimator in MDDM [4] into a least squares problem, enabling the label propagation mechanism to be seamlessly incorporated into the dimensionality reduction learning model.
Consideration is given to the instance correlations. In order to use instance correlations in dimensionality reduction, we introduce a new assumption, which states that if two instances have a high degree of correlation in the original feature space, they should also have a high degree of correlation in the low-dimensional feature space. The instance correlations are assessed using the k-nearest neighbor approach.
Through the use of the  l 1 -norm and l 2 , 1 -norm regularization terms to select the appropriate features, the specific features and the common features of the feature space are simultaneously investigated in our method, which helps to enhance the performance of dimensionality reduction.
The rest of the paper is organized as follows. The related work is briefly reviewed in Section 2. Section 3 introduces the details of our proposed SMDR-IC method. Experimental results are analyzed in Section 4. Finally, Section 5 concludes this paper.

2. Related Work

2.1. Dimensionality Reduction

In this section, we mainly review the related works on dimensionality reduction, including unsupervised, supervised, and semi-supervised methods.
For a long time, people have been concerned about the topic of data feature dimensionality reduction. Since unsupervised dimensionality reduction techniques do not use label information, they can theoretically be used to reduce the dimensionality of instances with multiple labels without using label data. A popular dimensionality reduction technique is principal component analysis (PCA), which constructs the projection matrix by maximizing feature variances or minimizing squared constructed error [15]. Other unsupervised techniques, such as locally linear embedding (LLE) [16], Laplacian eigenmaps (LE) [17], and flexible manifold embedding (FME) [18], have also been reported. Latent semantic indexing (LSI) [19], which was first used for document analysis and information retrieval, has since evolved into a successful unsupervised dimensionality reduction method. These techniques are primarily aimed at obtaining a low-dimensional representation by maintaining the manifold structure of instances. The large number of redundant and irrelevant information in high-dimensional data features, however, makes it impossible for a single feature learning method to extract the varied representation of the data. Consequently, the supervised and semi-supervised dimensionality reduction modes of feature and label information coupling are becoming increasingly popular in the problem of multi-label dimension reduction. The goal of multi-label informed latent semantic indexing (MLSI) [20], which extends LSI to a supervised method, is to achieve a projection matrix that maximizes feature variances and binary label variances through a method based on linear combination. This method aims to capture correlations between labels, while also preserving the information of inputs. However, it does not investigate the internal relationship between features and labels.
Currently, there have been three basic frameworks for these dimensionality reduction methods of multi-label data feature space. To achieve the purpose of the training projection matrix, the first strategy is to identify the main direction in the label space and feature space and maximize the linear correlation between them. The theoretical foundations of this tactic come from Hotelling, who proposed in [21] that canonical correlation analysis (CCA) can be viewed as the problem of locating the basis vectors of two groups of variables so as to maximize the correlation between the projections of variables on these basis vectors. CCA can be used directly—without any adjustments—in the multi-label scenario. Hardoon et al. [5] used the CCA technique to address the multi-label dimensionality reduction problem in the beginning. The CCA method of multi-label dimensionality reduction, described in [5], aims to maximize the linear correlation between the feature set derived from the low dimensional projection space and the label set. As a foundational technique, CCA has made significant contributions to numerous extensions in multi-label feature dimensionality reduction. LS-CCA [22], for example, expands CCA by using a least-squares formulation and its several regularized variations, suggesting that CCA can be transformed into slightly different least-squares problems. CCA is extended by 2SDSR [23] by combining it with other feature reduction methods. The CCA framework’s drawback is that semi-supervision expansion is challenging.
One of the most well-known supervised dimensionality reduction techniques is linear discriminative analysis (LDA) [24], which utilizes label information to define the between-class and within-class scatter matrices and then maximizes the Rayleigh quotient between the two matrices to find a projection matrix that makes instances of the same class to be close while the different class is far away in a low-dimensional space. The second main framework for dimensionality reduction is provided by this technique. By employing various label weighting settings, LDA has been extended to multiple types of multi-label LDA versions [6,25,26,27,28,29]. wMLDAb [25] adopts a binary weight, wMLDAe [26] an entropy-based weight, wMLDAc (i.e., MLDA) [6] a correlation-based weight, wMLDAf [27] a fuzzy-based weight, and wMLDAd [28] a dependence-based weight. MLDA-LC [29], in particular, constructs an adjacency graph to represent instance similarity as a graph Laplacian matrix and then combines the Laplacian matrix into the MLDA method to reveal the local structure of multi-label instances. The advantage of the LDA framework is that the path optimization problem built on the basis of the framework can easily be converted into the eigenvalue and eigenvector problem of the matrix for solution. The disadvantage is that because the scatter matrix is constructed using Euclidean distance, the traditional distance cannot adequately capture the data’s complex manifold structure when the data feature dimension is high.
Gretton et al. [14] developed the mathematical theory for HSIC in 2015, the third key framework for dimensionality reduction. The empirical estimator for HSIC as well as an explanation of how HSIC can measure the relationship between the original feature description and the associated class label can be found in [14]. Since then, multi-label dimensionality reduction learning has begun focusing on HSIC. MDDM (multi-label dimensionality reduction via dependence maximization) [4], as a supervised baseline technique of multi-label dimensionality reduction inside the HSIC framework, aims to learn the projection matrix by maximizing feature-label dependence. (MDDM and HSIC will be briefly reviewed in Section 2.2). In its initial form, MDDM was developed using two different projection strategies: MDDMp and MDDMf. The former relies on orthonormal projection directions, whereas the latter makes the projected features orthonormal. In order to avoid the direct eigendecomposition on the large-scale matrix, SSMDDM [30] presents an effective approach for finding the optimum solution of MDDM. It then reformulates MDDM as a least-squares problem and develops a shared subspace MDDM for multi-label dimensionality reduction. However, the label correlations, which are crucial for multi-label learning, are not taken into account in the least-squares problem of MDDM, as recast by SSMDDM.
In the last ten years, a number of semi-supervised multi-label dimensionality reduction techniques have been put forth to make use of labeled and unlabeled instances. Some of the methods combine the learning of a classifier with the learning of a low-dimensional embedding, such as SSDR-MC [8], BSSML [31], and so on [32]. The corresponding supervised multi-label dimensionality reduction techniques are also available in semi-supervised forms. Examples include [33], which introduces the semi-supervised CCA based on Laplacian regularization; MSDA [34], which adds two regularization terms to the MLDA objective function by setting up two matrices (adjacency matrix and similarity matrix); SSMLDR [9], which first obtains soft labels of unlabeled instances by label propagation, and then the soft labels of all instances, both labeled and unlabeled, are used to construct the scatter matrices of MLDA; SMDRdm [10], which is similar to [9], the scatter matrices are constructed by estimating the soft labels of unlabeled instances using label propagation, and further, the empirical measure of HSIC is plugged into the LDA framework to train the projection matrix with inter-class scatter minimization and dependence maximization as the objective function. SSMLDR and SMDRdm only pay attention to label correlations and ignore instance correlations between original feature space and low-dimensionality feature space. Furthermore, these two methods are based on LDA, a framework that heavily relies on the distance function, making them more vulnerable to outliers and increases the soft label error [35,36,37]. Recently, Mikalsen et al. [38] extended the dimensionality reduction technique to noisy multi-label cases by developing an anti-noise label propagation method that they used to label unlabeled samples before using the MDDM method to reduce the dimension of data features. NMLSDR is the name of this approach.
Although these semi-supervised strategies have shown good experimental results, they cannot overcome the restrictions of the three frameworks mentioned above. These framework models can only make limited improvements, and other regular term constraints cannot be freely added, which is why current dimension reduction methods rarely consider the specific and commom features of feature space.

2.2. The Brief Review of HSIC and MDDM

As a helpful measure of dependence, the Hilbert–Schmidt Independence Criterion (HSIC) has been used in numerous machine learning applications. Let X = [ x 1 , x 2 , , x n ] be a training data set, which consists of n instances, and Y = [ y 1 , y 2 , , y n ] be the label matrix, where y i denotes the class label vector of  x i . Given a multi-lable data set { ( X , Y ) } with joint distribution P X Y , k and l denote the kernel functions, and the feature kernel matrix and label kernel matrix, respectively, are defined as K R n × n , K i j = k ( x i , x j ) , and L R n × n , L i j = l ( y i , y j ) . Then, the empirical estimator of the HSIC is given by [14]:
HSIC ( X , Y , P X Y ) = ( n 1 ) 2 tr ( HKHL ) ,
where tr ( · ) indicates the trace operation of a matrix. H R n × n is the centering matrix defined as H i j = δ i j 1 n , where δ i j = 1 if  i = j and δ i j = 0 otherwise.
With maximizing the correlation between the original feature description and the relevant class labels, the HSIC empirical estimator is used in MDDM to project the original data into the low-dimensional feature space. Denote the projection matrix as P . An instance x is projected into a new space by  ϕ ( x ) = P T x , and the induced kernel functions are given as k ( x i , x j ) = ϕ ( x i ) , ϕ ( x j ) = P T x i , P T x j and l ( y i , y j ) = y i , y j , where x i , x j denotes the inner product defined as x i , x j = x i T x j [4].
Drop ( n 1 ) 2 and notice that K = X T PP T X and L = Y T Y . The optimization procedure of Equation (1) is wrote as searching for the optimal linear projection:
P * = arg max P tr ( HX T PP T XHL ) .
To avoid a trivial solution, an additional constraint for  P is introduced, which leads to the following expression [4]:
max P tr ( HX T PP T XHL ) , s . t . p i T ( μ XX T + ( 1 μ ) I ) p j = δ i j ( 1 i , j d ) ,
where d is the dimension of the lower dimensional space, P = [ p 1 , p 2 , , p d ] ; μ [ 0 , 1 ] is a pre-defined parameter to control the importance between two constraints. When μ = 0 , the projection matrix is orthonormal, called an orthogonal projection [4]; when μ = 1 , the projected features are uncorrelated on the training data and called uncorrelated subspace dimensionality reduction [4].
It is easy to verify that the optimal solutions of Equation (3) are characterized by the following generalized eigenvalue problem:
XHLHX T p = λ ( μ XX T + ( 1 μ ) I ) p .

3. Materials and Methods

In this section, we first go over some important notation and symbols, and thereafter elaborate on our proposed SMDR-IC method.

3.1. Preliminaries

Let X = [ x 1 T ; ; x l T ; x l + 1 T ; ; x n T ] R n × d be a data set of n d-dimensional instances, where the first l of the instances are labeled and the remaining u are unlabeled, l + u = n . L = { 1 , 2 , , c } denotes the label set, and c means that each instance includes c labels. Drawing on [9], the additional ( c + 1 ) -th label is appended into the label set in order to detect the outliers. Define the initial label matrix Y = [ y 1 T ; ; y l T ; y l + 1 T ; ; y n T ] = [ Y l ; Y u ] R n × ( c + 1 ) , where Y l denotes the initial labels of labeled instances and Y u denotes the initial labels of unlabeled instances. For the labeled instances, Y i j = 1 if the i-th instance is labeled as j, and Y i j = 0 otherwise. For the unlabeled instances, Y i j = 1 if  j = c + 1 , and Y i j = 0 otherwise. We suppose the predicted label matrix as F = [ F 1 T ; ; F l T ; F l + 1 T ; ; F n T ] = [ F l ; F u ] R n × ( c + 1 ) , where F i R c + 1 ( 1 i n ) are column vectors and 0 F i j 1 . F l denotes the predicted labels of labeled instances and F u denotes the predicted labels of unlabeled instances.
The objective is to learn a projection matrix P R d × t that projects an instance x from original feature space R d to a lower dimensional space representation z R t , and
z = x T P ,
where t d .

3.2. Obtaining Soft Label by Label Propagation

3.2.1. Neighborhood Graph Construction

To accomplish label propagation, a graph construction consisting of labeled and unlabeled instances is built to evaluate the similarities among neighboring instances. The weighted adjacency matrix W is defined specifically by using a  k NN graph over n instances, as shown below:
W i j = 1 , if x i k NN ( x j ) or x j k NN ( x i ) , 0 , otherwise ,
where k NN ( x i ) contains the k-nearest neighbors of  x i computed by the Euclidean distance. Because of its simplicity and wide applicability, the weight in Equation (6) is simply set to  0 1 weight. This adjacency matrix W can also be obtained by other weight (i.e., Gaussian heat kernel) and distance settings.
We normalize W as a stochastic matrix W ˜ to ensure that the sum of the transitional probabilities from i to the other nodes of the graph equals 1, as follows:
W ˜ = D 1 W ,
where D = diag ( d 11 , , d n n ) and d i i = j = 1 n W i j . W ˜ i j can be considered as the probability of a transition from node i to node j along the edge between them.

3.2.2. Label Propagation

Multi-label learning differs from single-label. The former assumes that instances’ labels are inter-correlated, while the latter assumes that instance labels are independent of one another. To convert the single-label propagation version to the multi-label propagation version, we first normalize the initial label matrix:
Y ˜ i j = 1 Y i , if Y i j = 1 , 0 , otherwise ,
where Y i denotes the number of labels that the instance x i belong to.
The following equation updates the probability that instance x i has the j-th class label:
F i j ( t + 1 ) = λ i k = 1 n W ˜ i k F k j ( t ) + ( 1 λ i ) Y ˜ i j .
Obviously, in each iteration, the probability of each instance partially propagates from their neighbors and partially from their own labels. According to the labeled and unlabeled instances, we divide the matrix W ˜ , Y ˜ , F into the following forms based on the labeled and unlabeled instances:
W ˜ = W ˜ l l W ˜ l u W ˜ u l W ˜ u u , Y ˜ = Y ˜ l Y ˜ u , F = F l F u .
For the labeled instances, we fix the labels as F l = Y ˜ l and λ l = 0 . For the unlabeled instances, the iteration can be written as follows:
F u ( t + 1 ) = I λ u W ˜ u l F l ( t ) + I λ u W ˜ u u F u ( t ) + ( I I λ u ) Y ˜ u ,
where I R u × u is an identify matrix and I λ u R u × u is a diagonal matrix with the diagonal elements λ u . Due to  F l ( t ) = Y ˜ l , F u ( 0 ) = Y ˜ u , we have:
F u ( t + 1 ) = i = 0 t ( I λ u W ˜ u u ) i I λ u W ˜ u l Y ˜ l + ( I λ u W ˜ u u ) t + 1 Y ˜ u + i = 0 t ( I λ u W ˜ u u ) i ( I I λ u ) Y ˜ u ,
{ F u ( t ) } is a convergent sequence, the convergence analysis has been presented in [9]. After a finite number of iterations, F u will converge to
F u = ( I I λ u W ˜ u u ) 1 ( I λ u W ˜ u l Y l ˜ + ( I I λ u ) Y ˜ u ) .
It is easily found that the sum of each row of  F u is equal to 1. This means that the elements in  F are the probability values and F i j can be viewed as the posterior probability of the instance x i belonging to the j-th class. In particular, F i , c + 1 represents the probability of the instance x i belonging to the outliers. After obtaining the predicted label F i j for each instance x i , we define these labels F i j ( 1 j c ) as soft labels.

3.3. Dimensionality Reduction

In this subsection, we utilize soft labels to learn the projection matrix P . Firstly, we go into detail on how to incorporate the label propagation mechanism and label correlations into MDDM to construct least squares, and then we integrate instance correlations as well as specific and common features of feature space into our approach separately. Finally, we discuss how our proposed dimensionality reduction strategy was optimized.

3.3.1. Design of the Semi-Supervised Mode

It is evident from label propagation Equations (6) and (9) that the label propagation measures the soft labels of unlabeled instances by the similarity of instance features, whereas label correlations are almost never employed. Therefore, to improve the performance of the propagation mechanism, we take the label correlations into consideration. According to the previous work [6], the correlation, namely cosine similarity, between two distinct label classes is expressed as follows:
C k l = cos ( Y ( k ) , Y ( l ) ) = Y ( k ) , Y ( l ) Y ( k ) Y ( l ) .
Then, we have F ˜ u = F u C and F ˜ = [ F l ; F ˜ u ] . Next, the resulting soft label matrix is then used to compute the label kernel matrix L in Section 2.2, effectively transforming the MDDM into a semi-supervised technique. L can be rewritten below:
L = F ˜ F ˜ T .

3.3.2. Reformulate MDDM to Least Square

To reformulate MDDM to a least squares problem, we first define two necessary matrix S 1 , S 2 as follows:
S 1 = μ X T X + ( 1 μ ) I , S 2 = X T HLHX .
Then, the generalized eigenvalue problem in Equation (4) can be written as S 2 p = λ S 1 p . If S 1 is inverse (when μ 1 , S 1 must be inverse), the optimal P is given by the eigenvectors of  S 1 1 S 2 corresponding to the d eigenvalues. Now, we apply the matrix decomposition technique to decompose S 1 1 S 2 . The singular value decomposition of  X is defined as:
X = U diag ( Σ t , 0 ) V T ,
where U R n × n and V R d × d are orthogonal matrices and t = rank ( X ) ; Σ t R t × t is an orthogonal matrix with the diagonal elements being singular values of  X , and diag ( Σ t , 0 ) R n × d is a matrix in which the first r diagonal elements are singular values of  X and 0 otherwise.
Let U = [ U 1 , U 2 ] and V = [ V 1 , V 2 ] , where U 1 R n × t , U 2 R n × ( n t ) , V 1 R d × t , and V 2 R d × ( d t ) . Then S 1 , S 2 can be rewritten as:
S 1 = V 1 [ μ Σ t 2 + ( 1 μ ) I ] V 1 T , S 2 = V 1 Σ t U 1 T H F ˜ F ˜ T H U 1 Σ t V 1 T .
According to Equation (18), we can calculate S 1 1 S 2 as:
S 1 1 S 2 = V 1 [ μ Σ t 2 + ( 1 μ ) I ] 1 Σ t U 1 T H F ˜ F ˜ T H U 1 Σ t V 1 T .
We define a diagonal matrix B as follows:
B = [ μ Σ t 2 + ( 1 μ ) I ] 1 2 .
Notice that B is an inverse matrix and B = B T . According to Equations (19) and (20), we rewrite S 1 1 S 2 as:
S 1 1 S 2 = V 1 B B T Σ t U 1 T H F ˜ F ˜ T H U 1 Σ t B B 1 V 1 T .
Denote T = F ˜ T HU 1 Σ t B R c × t , and let T = P 1 Λ P 2 T be the singular value decomposition of  T , where P 1 R c × c , P 2 R t × t , and Λ R c × t is a diagonal matrix. Then we have:
S 1 1 S 2 = V 1 B P 2 Λ P 1 T P 1 Λ P 2 T B 1 V 1 T = V 1 B P 2 Λ ˜ P 2 T B 1 V 1 T ,
where Λ ˜ = Λ 2 . Thus, the solution for problem Equation (4), which consists of the eigenvectors corresponding to the eigenvalues of  S 1 1 S 2 , is provided by the following equation:
P = V 1 B P 2 .
Consider the following least squares problem:
min P XP Z F 2 ,
where we assume that both the observation matrix X and the target matrix Z are centered. The optimal solution of Equation (24) is given by [39]:
P = ( X T X ) X T Z ,
where ( · ) is the pesudo-inverse of a matrix. If we set Z = U 1 Σ t BP 2 , then we have:
P = V 1 Σ t 2 V 1 T V 1 Σ t U 1 T U 1 Σ t BP 2 = V 1 B P 2 ,
which is exactly the same formula as in Equation (23). It implies that the MDDM formulation in Equation (4) is equivalent to the least-squares formulation in Equation (24). On the basis of this connection of equivalence, we can attach some constraint conditions—such as instance correlations and sparse constraints—as regularization items.

3.3.3. Incorporating Instance Correlations

In our dimensionality reduction approach, we not only consider the label correlations as shown in Equations (15) and (18) but also incorporate the instance correlations. By making the assumption that two instances, x i and x j , may be related in the label space if they are correlated in the feature space, the previous classification algorithms [40,41] incorporate instance correlations. In fact, instances in multi-label dimensionality reduction problems should also maintain the interrelation of their features and labels even before and after dimensionality reduction. As a result, if two instances x i and x j are correlated in the original feature space, we assume they will also be related in the low-dimensionality feature space.
Instead of evaluating the instance correlations using the cosine similarity, we adopt the k-nearest neighbor (kNN) mechanism to reduce the effects of noisy and redundant features. Thus, the weighted adjacency matrix W in Equation (6) can be exploited to define the following regularization term for assessing the interrelation of feature space:
min P i , j n W i j x i T P x j T P = tr ( ( XP ) T L ( XP ) ) = tr ( O T LO ) ,
where O = XP is the output low-dimensionality matrix and L = W * W indicates the  n × n Laplacian matrix of  W . W * is a diagonal matrix and W i i * = j = 1 n W i j . After incorporating this regularization, we can rewrite our objective function in Equation (24) as follows:
min P 1 2 XP Z F 2 + α 2 tr ( O T LO ) .

3.3.4. Incorporating Specific and Common Features

Furthermore, two norm regularization terms that constrain the sparsity of matrix P are also incorporated in our technique to enhance the performance of the dimensionality reduction. One is the  l 1 -norm, which can enforce sparsity among all elements in  P and shrink some parameters to zero, allowing specific features in the original feature space to be selected. The l 2 , 1 -norm, which is the other norm, can guarantee the sparsity of  P in rows, which is advantageous for choosing common features in the original feature space. For instance, in Figure 1, when the sparse projection matrix P with five rows and three columns is used to reduce the dimensions of instances x 1 and x 2 , the first feature f 1 of the projection instances z 1 and z 2 is dominated of the original features f 1 and f 4 , the second feature f 2 is made up of the original features f 1 , f 2 , and f 3 , and the third feature f 3 is determined by  f 2 and f 5 , respectively. These features can be thought of as the specific features of each feature in the low dimensional feature space. To give P this performance, we employ the  l 1 -norm. While features f 1 and f 2 contribute to the first, second, and third features of  z 1 and z 2 , they can be regarded as common features in the original feature space, and we leverage the  l 2 , 1 -norm to capture these common features.
The final objective function of our proposed approach can be rewritten as Equation (29) after incorporating these two regularization terms.
min P 1 2 XP Z F 2 + α 2 tr ( O T LO ) + β P 1 + γ P 2 , 1 ,
where α , β , and γ are constant coefficients.

3.3.5. Optimization

Despite knowing that Equation (29) is a convex optimization problem, the objective function is not smooth because of the non-smoothness of the  l 1 -norm and l 2 , 1 -norm regularization terms. To address this non-smooth optimization problem, we first release P 2 , 1 by  tr ( P T AP ) in the following [11], where A indicates a  d × d diagonal matrix, the i-th diagonal value in  A is denoted as A i i = 1 2 P i 2 . Then, to solve the  l 1 -norm regularization term, we employ the accelerated proximal gradient method (APG).
In the general accelerated proximal gradient method, a convex optimization problem can be defined as:
min P H F ( P ) = f ( P ) + g ( P ) ,
where H indicates a real Hilbert space. f ( P ) and g ( P ) are both convex, but they are smooth and non-smooth, respectively. The gradient function f is also Lipschitz continuous, i.e., f ( P 1 ) f ( P 2 ) F 2 L f Δ P , where Δ P = P 1 P 2 and L f is the Lipschitz constant. Proximal gradient algorithms, rather than directly minimizing F ( P ) , minimize a sequence of separable quadratic approximations to  F ( P ) , denoted as:
Q L f ( P , P ( m ) ) = f ( P ( m ) ) + f ( P ( m ) ) , P P ( m ) + L f 2 P P ( m ) F 2 + g ( P ) .
Let G ( m ) = P ( m ) 1 L f f ( P ( m ) ) , then
P * = arg min P Q L f ( P , P ( m ) ) = arg min P g ( P ) + L f 2 P G ( m ) F 2 .
According to Equations (29) and (30), f ( P ) and g ( P ) are defined as follows:
f ( P ) = 1 2 XP Z F 2 + α 2 tr ( O T LO ) + γ P 2 , 1 ,
g ( P ) = β P 1 .
According to Equation (33), we can calculate f ( P ) as:
f ( P ) = X T XP X T Z + α X T LXP + γ AP .
According to Equations (32)–(34), the projection matrix P can be optimized by
P * = arg min P Q L f ( P , P ( m ) ) = arg min P L f 2 P G ( m ) F 2 + g ( P ) = arg min P 1 2 P G ( m ) F 2 + β L f P 1 .
Lin et al. [42] proposed that instead, setting P ( m ) = P m + b m 1 1 b m ( P m P m 1 ) for a sequence b m by satisfying b m + 1 2 b m b m 2 can improve the convergence rate to  O ( m 2 ) , where P m is the result of  P at the m-th iteration and P ( m ) is the intermediate variable at the m-th iteration. Additionally, for the  l 1 -norm regularization term g ( P ) , if  H is a normed space endowed with the Frobenius norm · F and g ( · ) is l 1 -norm, then P m + 1 is generated by soft-thresholding the entries of  G ( m ) as:
P m + 1 = S ε [ G ( m ) ] = arg min P 1 2 P G ( m ) F 2 + ε P 1 ,
where S ε [ ω ] is the soft-thresholding operation, ω R and ε > 0 , defined as:
S ε [ ω ] = ω ε , if ω > ε , ω + ε , if ω < ε , 0 , otherwise .
Then, in each iteration, P m + 1 can be obtained by the following soft-thresholding operation:
P m + 1 = S β L f [ G ( m ) ] .
Here, we calculate the Lipschitz constant L f . Given P 1 and P 2 , according to Equation (35), we have:
f ( P 1 ) f ( P 2 ) F 2 = X T X Δ P + α X T LX Δ P + γ A Δ P F 2 3 ( X T X Δ P F 2 + α X T LX Δ P F 2 + γ A Δ P F 2 ) 3 ( X T X 2 2 + α X T LX 2 2 + γ A 2 2 ) Δ P F 2 ,
where Δ P = P 1 P 2 . Obviously, L f is formed by
L f = 3 ( X T X 2 2 + α X T LX 2 2 + γ A 2 2 ) .
We fix the value of  A , which is determined by the initial value of  P , to ensure that L f will always be a constant value.
Algorithm 1 summarizes the pseudo-code of the SMDR-IC method. This algorithm produces the projection matrix P , which can map the features of instances from d-dimensional to t-dimensional space, here t = rank ( X ) and t d . If the dimensionality is to be reduced to  r ( r < t ) , the first r columns of the projection matrix P can be chosen.
Algorithm 1 SMDR-IC: Semi-supervised Multi-label Dimensionality Reduction Learning by Instance and Label Correlations
Require: 
Feature matrix X R n × d , label matrix Y R n × c , and parameters k, μ , α , β , γ , σ
Ensure: 
Projection matrix P R d × t , t = rank ( X )
1:
F l Y , F u 0 ;
2:
Calculate the neighborhood graph construction matrix W for label propagation and instance correlations by using k-nearest neighbors;
3:
Obtain the soft label matrix F ;
4:
Calculate the label correlations matrix C and obtain the label matrix F ˜ ;
5:
Calculate the target matrix Z ;
6:
b 0 , b 1 1 , m 1 ;
7:
P 0 , P 1 ( X T X + σ I ) 1 X T Z ;
8:
Calculate the diagonal matrix A ;
9:
Calculate Lipschitz constant L f ;
10:
repeat:
P ( m ) P m + b m 1 1 b m ( P m P m 1 ) ;
G ( m ) P ( m ) 1 L f f ( P ( m ) ) ;
P m + 1 S β L f ( G ( m ) ) ;
b m + 1 1 + 4 b m 2 + 1 2 ;
     Calculate the diagonal matrix A m + 1 ;
untill: stop criterion is reached;
11:
P P m ;
An alternative optimization to the accelerated proximal gradient method is the alternating direction method of multipliers (ADMM), which has the advantage that f ( P ) and g ( P ) are completely independent, so they can both be non-smooth.

4. Results

4.1. Benchmark Data Sets

To verify the efficiency of our proposed method, we conduct experiments on fifteen publicly available real-world data sets from Mulan (http://mulan.sourceforge.net/datasets-mlc.html, accessed on 5 March 2022), the statistics of which are summarized in Table 1. These data sets are divided into four categories: music, image, biology, and text. Emotions is a multi-label music data set, Scene and Corel5k are multi-label image data sets, Yeast is a multi-label biology data set, and the remaining are subsets of Yahoo, which is a multi-label text (web) data set. All data sets are standardized to have a mean of zero and a variance of one.

4.2. Evaluation Metrics

The performances of different dimensionality reduction methods are evaluated by employing seven widely used evaluation metrics: Hamming Loss, Ranking Loss, Average Precision (AvgPrec), OneError, Macro-F1 (MacroF1), Micro-F1 (MicroF1), and Coverage. These evaluation criteria are given in [4,43] along with detailed descriptions.
For convenience, we modify Hamming Loss, Ranking Loss, and OneError to 1-Hamming, 1-Ranking, and 1-OneError, respectively, so that higher values always mean better for the first six evaluation metrics, and lower values indicate better performance for Coverage.

4.3. Comparison Methods and Parameters Settings

The previously mentioned multi-label dimensionality reduction methods, CCA [5], MDDM [4], and six MLDA variants—wMLDAb [25], wMLDAe [26], wMLDAc [6], wMLDAf [27], wMLDAd [28], and MLDA-LC [29]—are all taken into account to compare the performance of SMDR-IC. These supervised methods can only be trained on the labeled portion of the training instances since they require labeled instances. MLDA-LC, specifically, adopts an instance correlation measure similar to our technique to constrain the consistency between the class-wise within-class distances of instances in the low-dimensional feature space and the original feature space, allowing the LDA framework to capture local structures. Moreover, three semi-supervised approaches, SSMLDR [9], SMDRdm [10], and NMLSDR [38], are compared to SMDR-IC. NMLSDR is a dimensionality reduction method for noisy multi-label data that was recently proposed. To compare and demonstrate fairness, we replace NMLSDR’s noise-coping label propagation strategy with our approach’s label propagation mechanism.
We begin by projecting the high-dimensional instances into a low-dimensional space using these dimensionality reduction methods. The widely used ML-kNN [43] classifier is then applied to classify the instances in the low-dimensional space.
In the experiments, we set k = 10 to construct the kNN neighborhood graph construction. To be more convincing, the label propagation parameters are set to the same as those in SSMLDR, i.e., λ l = 0 for labeled instances and λ u = 0.99 for unlabeled instances. μ in Equation (16) is set to 0.5 , as in MDDM [4], to constrain the orthogonality of the objective matrix. The regularization parameters α , β , and γ are tuned by a grid search strategy from { 10 i | i = 5 , 4 , , 1 } and the best results are reported. All comparison method parameters were chosen based on their publications’ recommendations.
The experiment was performed using Matlab R2020a on a desktop PC with an Intel(R) Core(TM) i7-7700k 4.20GHz CPU, 16GB RAM, and NVIDIA GeForce GTX 1080 8G GPU, with a 64-bit Windows 10 operating system.

4.4. Experimental Results

In our experiments, we randomly partition 70% of instances as the training set and 30% of instances as the test set. In the training set, we randomly select 20% of instances as the labeled set and the remaining instances as the unlabeled set to learn a projection matrix P , which is used to project the d-dimensionality training and test sets to an r-dimensionality representation. To avoid the random effect, the random partitioning and selection are repeated 10 times on each dataset for each comparing method, with the averaged results reported.
Due to the restrictions of the MLDA framework, the five methods wMLDAb, wMLDAe, wMLDAc, wMLDAf, and wMLDAd have a target dimension of at most c 1 . The rank of the initial feature matrix is maximum and the feature dimension of the data is reduced by MLDA-LC. Therefore, we set the dimensions of projected features to t = c 1 for all approaches.
Using the training set after dimensionality reduction, we train an ML-kNN classifier and validate its performance on the low-dimensionality test set. As shown in Table 2, SMDR-IC demonstrated excellent effectiveness by obtaining optimal results on all data sets for seven measures. Indeed, SMIC has over five evaluation criteria on 11 data sets that produce the highest results, while other outcomes are marginally inferior.
In the meantime, we randomly chose more labeled instances to evaluate the effectiveness of SMDR-IC. As shown in Table 3 and Table 4, SMDR-IC performs better when 50% of the training set’s instances are labeled than when 20% are. However, SMDR-IC with 70% labeled instances of the training set shows higher performance than it does with 20% labeled instances and has somewhat worse performance than it does with 50% labeled instances, but it still ranked first on average. This outcome is primarily due to the fact that the merits of the supervised approaches start to emerge as the fraction of labeled instances rises enough.
Regarding the experimental results reported in Table 2, Table 3 and Table 4, the semi-supervised methods outperform the supervised methods because these supervised methods obtain the geometric structure of instances by using labeled instances, whereas the semi-supervised method leverages labeled and unlabeled instances to obtain a more comprehensive geometric structure of instances. Furthermore, because the label propagation mechanism of the NMSLDR method is replaced by that of the SMDR-IC method, the comparison results demonstrate that learning of instance similarity, common features, and specific features all play significant roles in capturing the geometric structure of instances.
To investigate further the performance differences between the compared methods and the proposed approach SMDR-IC, the necessary analyses were carried out using the Friedman test [44]. Table 5 lists the Friedman statistics F F for each evaluation metric and critical levels. The null hypothesis that all comparison algorithms perform equally well is rejected for each evaluation metric at significance level α = 0.05 . The proposed approach, SMDR-IC, is viewed as the control method, and the Bonferroni–Dunn test is utilized as the post-hoc test [44]. The critical distance (CD), defined as CD = q a K ( K + 1 ) 6 N , is used to assess the average ranking differences between any two algorithms. Here, q a = 3.2680 , ( K = 12 , N = 15 ) , and CD = 4.3025 . Figure 2 depicts the CD diagrams for each evaluation metric. The red line in each subfigure shows that the distances between SMDR-IC and certain comparison methods are less than CD, indicating statistical similarity. We observe that SMDR-IC and MDDM have significant statistical similarities for Ranking Loss, Average Precision, OneError, and Coverage, which is consistent with MDDM where the baseline method of SMDR-IC and SMDR-IC ranks second under MacroF1 and first under the other six evaluation metrics.
We also selected four domain data sets (Emotions, Scene, Yeast, and Arts) and five comparing methods, including supervised and semi-supervised to further study the performance of SMDR-IC under different target dimensionalities. Figure 3, Figure 4, Figure 5 and Figure 6 show the average results concerning 1-Hamming, 1-Ranking, AvgPrec, and Coverage, respectively. It can also be seen that SMDR-IC almost always outperforms other comparing methods on these evaluation metrics. These results further demonstrate the effectiveness of SMDR-IC for multi-label dimensionality reduction.

4.5. Sensitivity Analysis of Parameters

Three crucial parameters— α , β , and γ —are present in our dimensionality reduction model. α controls the contributions of embedding instance correlations, while β and γ regulate the sparsity of the projection matrix. It is necessary for us to conduct the sensitivity analysis of parameters for SMDR-IC. All values of parameters are specified as [ 10 7 , 10 6 , , 10 2 , 10 3 ] based on four domain data sets, including Emotions, Scene, Yeast, and Arts. We tune one parameter by a step 10 1 , while fixing other parameters at their best settings.
Figure 7, Figure 8 and Figure 9 show the influence of three parameters on four data sets. Parameter α controls the instance correlations, and the larger value of it means the higher importance of instance correlations. We can clearly see that when the α increases from 10 7 to 10 0 , the performance tends to increase slowly and remain stable, but when the α exceeds 1, performance begins to decline slowly on Yeast and Arts and rapidly on Emotions and Scene. This is explained by the fact that the influence of instance correlations becomes more significant as the α value increases, potentially limiting the influence of other factors and resulting in poor performance. Parameters β and γ control the sparsity of specific features and common features, respectively. The trends are similar to the parameter α , with the increasing values of β and γ , the performance tends to increase and maintain stability. Nevertheless, with parameters β or γ > 10 1 , the performance beings to decline, slowly on Yeast and Arts, and rapidly on Emotions and Scene.

4.6. Convergence and Time Complexity Analysis

The accelerated proximal gradient method (APG), a special version of the gradient descent method, is used to solve optimization problems with non-differentiable objective functions. The SMDR-IC optimization model Equation (29) can be transformed into the standard model solved by APG. For its convergence theory, see [42].
As the number of iterations is increased, Figure 10 illustrates the changes in the objective function values for the SMDR-IC dimensionality reduction model on the four data sets. It is clear that the value of the objective function rapidly declines as iteration times increase. More specifically, on the Emotions dataset after around 50 iterations, and on the Scene, Yeast, and Arts data sets after about 150, 40, and 2500 iterations, respectively, the objective function value tends to a fixed value. In conclusion, the APG method steadily converges when solving the SMDR-IC dimensionality reduction model.
The following is the conclusion of the complexity analysis of SMDR-IC methods. The time complexity of constructing a kNN graph matrix W and computing the label correlations matrix C is denoted as O ( n 2 d ) and O ( l c 2 ) , respectively, where n is the number of labeled and unlabeled instances, d is the dataset dimensionality, l is the number of labeled instances, and c is the number of labels. The time complexity of obtaining the soft labels is approximately O ( n 2 c ) . The time cost of calculating the target matrix Z is dominated by SVD on X , which can be denoted as O ( n 2 d + n d 2 ) . O ( n d 2 + d 3 ) is the time complexity of initializing P . The time complexity of calculating the Lipschitz constant L f is denoted by the symbol O ( d 3 ) . The time cost of iteration steps is dominated by calculating the gradient of f ( P ) , which can be denoted as O ( n d 2 + n 2 d ) . As a consequence, the total time complexity of SMDR-IC is denoted as O ( m ( n 2 d + n d 2 ) + d 3 ) , where m is the number of iterations.
Table 6 lists the calculation time of each methods on four data sets. According to Table 6, we see the following. Firstly, the semi-supervised dimensionality methods take longer than the supervised ones. This is because semi-supervised dimensionality methods use more instances than supervised ones in the learning process. Specifically, semi-supervised dimensionality reduction methods use both labeled and unlabeled instances to learn, whereas supervised methods use only labeled instances. Semi-supervised dimensionality methods also require obtaining soft labels for unlabeled instances. Secondly, when comparing other semi-supervised dimensionality reduction methods, SMDR-IC performs with similar efficiency on data sets with medium data volume, and slower efficiency on data sets with small and large data volume, while most of them take a longer time. This is because the final result of SMDR-IC is obtained by the gradient optimization method, and the number of iterations for the optimization method to converge varies on different data sets, making it difficult to find a universally applicable setting.

5. Conclusions

In this paper, we introduced a novel method, namely semi-supervised multi-label dimensionality reduction learning by instances and label correlations (SMDR-IC), which effectively utilizes the information from both labeled instances and unlabeled instances by label propagation. SMDR-IC exploits the instance correlations by assuming that if two instances are correlated in the original feature space, they will also be related in the low-dimensionality feature space. The label correlations are also taken into account by reformulating the least squares method. Furthermore, l 1 -norm and l 2 , 1 -norm regularization terms are respectively utilized to select specific features and common features in feature space. Finally, extensive experiments on fifteen data sets prove that our method can outperform other well-established multi-label dimensionality reduction methods.
The main shortcoming of the SMDR-IC technique is that the projection mapping ϕ considered is a linear operator, and the linear operator may not be capable of learning the manifold structure of high-dimensional space completely. Then, in future research, we will concentrate on developing more effective nonlinear projection operators and researching the classification method coupled with dimensionality reduction techniques. In addition, SMDR-IC is slightly disadvantaged in terms of efficiency, mainly due to the difficulty in finding universally applicable convergence determination conditions for iterative optimization for different data sets. Therefore, iterative convergence determination methods or more efficient solution methods applicable to different data sets are to be further studied in future work.

Author Contributions

Conceptualization, formal analysis, writing—original draft preparation, R.L. and J.D. (Jiaxing Du); methodology, software, validation, writing—review and editing, R.L., J.D. (Jiaman Ding) and L.J.; resources, supervision, funding acquisition, Y.C. and Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the open fund of the Yunnan Key Laboratory of Computer Technology Applications, and in part by the National Natural Science Foundation of China (No. 12063002 and 62262035).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sun, L.; Ji, S.; Ye, J. Multi-Label Dimensionality Reduction; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
  2. Bellman, R. Dynamic programming and Lagrange multipliers. Proc. Natl. Acad. Sci. USA 1956, 42, 767. [Google Scholar] [CrossRef] [PubMed]
  3. Siblini, W.; Kuntz, P.; Meyer, F. A review on dimensionality reduction for multi-label classification. IEEE Trans. Knowl. Data Eng. 2021, 33, 839–857. [Google Scholar] [CrossRef]
  4. Zhang, Y.; Zhou, Z.H. Multilabel dimensionality reduction via dependence maximization. ACM Trans. Knowl. Discov. Data (TKDD) 2010, 4, 1–21. [Google Scholar] [CrossRef]
  5. Hardoon, D.R.; Szedmak, S.; Shawe-Taylor, J. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 2004, 16, 2639–2664. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, H.; Ding, C.; Huang, H. Multi-label linear discriminant analysis. In Proceedings of the Computer Vision—ECCV, Heraklion, Greece, 5–11 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 126–139. [Google Scholar]
  7. Kong, X.; Ng, M.K.; Zhou, Z.H. Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 2011, 25, 704–719. [Google Scholar] [CrossRef]
  8. Qian, B.; Davidson, I. Semi-supervised dimension reduction for multi-label classification. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; pp. 569–574. [Google Scholar]
  9. Guo, B.; Hou, C.; Nie, F.; Yi, D. Semi-supervised multi-label dimensionality reduction. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 919–924. [Google Scholar]
  10. Yu, Y.; Wang, J.; Tan, Q.; Jia, L.; Yu, G. Semi-supervised multi-label dimensionality reduction based on dependence maximization. IEEE Access 2017, 5, 21927–21940. [Google Scholar] [CrossRef]
  11. Nie, F.; Huang, H.; Cai, X.; Ding, C. Efficient and robust feature selection via joint l2,1-norms minimization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; Volume 23, pp. 1813–1821. [Google Scholar]
  12. Hu, L.; Li, Y.; Gao, W.; Zhang, P.; Hu, J. Multi-label feature selection with shared common mode. Pattern Recognit. 2020, 104, 107344. [Google Scholar] [CrossRef]
  13. Li, J.; Li, P.; Hu, X.; Yu, K. Learning common and label-specific features for multi-Label classification with correlation information. Pattern Recognit. 2022, 121, 108259. [Google Scholar] [CrossRef]
  14. Gretton, A.; Bousquet, O.; Smola, A.; Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings of the International Conference on Algorithmic Learning Theory, Singapore, 8–11 October 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 63–77. [Google Scholar]
  15. Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
  16. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [Green Version]
  17. Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 2001, 14, 585–591. [Google Scholar]
  18. Nie, F.; Xu, D.; Tsang, I.W.H.; Zhang, C. Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 2010, 19, 1921–1932. [Google Scholar] [PubMed]
  19. Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
  20. Yu, K.; Yu, S.; Tresp, V. Multi-label informed latent semantic indexing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 15–19 August 2005; pp. 258–265. [Google Scholar]
  21. Hotelling, H. Relations between two sets of variates. Biometrika 1936, 28, 321–377. [Google Scholar] [CrossRef]
  22. Sun, L.; Ji, S.; Ye, J. Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 194–200. [Google Scholar]
  23. Pacharawongsakda, E.; Theeramunkong, T. A two-stage dual space reduction framework for multi-labe classification. In Proceedings of the Trends and Applications in Knowledge Discovery and Data Mining, Delhi, India, 11 May 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 330–341. [Google Scholar]
  24. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  25. Park, C.H.; Lee, M. On applying linear discriminant analysis for multi-labeled problems. Pattern Recognit. Lett. 2008, 29, 878–887. [Google Scholar] [CrossRef]
  26. Chen, W.; Yan, J.; Zhang, B.; Chen, Z.; Yang, Q. Document transformation for multi-label feature selection in text categorization. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, 28–31 October 2007; pp. 451–456. [Google Scholar]
  27. Lin, X.; Chen, X.W. KNN: Soft relevance for multi-label classification. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; pp. 349–358. [Google Scholar]
  28. Xu, J. A weighted linear discriminant analysis framework for multi-label feature extraction. Neurocomputing 2018, 275, 107–120. [Google Scholar] [CrossRef]
  29. Yuan, Y.; Zhao, K.; Lu, H. Multi-label linear Ddiscriminant analysis with locality consistency. In Proceedings of the Neural Information Processing, Montreal, QC, Canada, 8–13 December 2014; Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 386–394. [Google Scholar]
  30. Shu, X.; Lai, D.; Xu, H.; Tao, L. Learning shared subspace for multi-label dimensionality reduction via dependence maximization. Neurocomputing 2015, 168, 356–364. [Google Scholar] [CrossRef]
  31. Gönen, M. Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning. Pattern Recognit. Lett. 2014, 38, 132–141. [Google Scholar] [CrossRef]
  32. Yu, T.; Zhang, W. Semisupervised multilabel learning with joint dimensionality reduction. IEEE Signal Process. Lett. 2016, 23, 795–799. [Google Scholar] [CrossRef]
  33. Blaschko, M.B.; Shelton, J.A.; Bartels, A.; Lampert, C.H.; Gretton, A. Semi-supervised kernel canonical correlation analysis with application to human fMRI. Pattern Recognit. Lett. 2011, 32, 1572–1583. [Google Scholar] [CrossRef] [Green Version]
  34. Li, H.; Li, P.; Guo, Y.J.; Wu, M. Multi-label dimensionality reduction based on semi-supervised discriminant analysis. J. Cent. South Univ. Technol. 2010, 17, 1310–1319. [Google Scholar] [CrossRef]
  35. Hubert, M.; Van Driessen, K. Fast and robust discriminant analysis. Comput. Stat. Data Anal. 2004, 45, 301–320. [Google Scholar] [CrossRef]
  36. Croux, C.; Dehon, C. Robust linear discriminant analysis using S-estimators. Can. J. Stat. 2001, 29, 473–493. [Google Scholar] [CrossRef]
  37. Hubert, M.; Rousseeuw, P.J.; Van Aelst, S. High-breakdown robust multivariate methods. Stat. Sci. 2008, 23, 92–119. [Google Scholar] [CrossRef]
  38. Mikalsen, K.Ø.; Soguero-Ruiz, C.; Bianchi, F.M.; Jenssen, R. Noisy multi-label semi-supervised dimensionality reduction. Pattern Recognit. 2019, 90, 257–270. [Google Scholar] [CrossRef]
  39. Golub, G.H.; Van Loan, C.F. Matrix Computations; JHU Press: Baltimore, MD, USA, 2013. [Google Scholar]
  40. Han, H.; Huang, M.; Zhang, Y.; Yang, X.; Feng, W. Multi-label learning with label specific features using correlation information. IEEE Access 2019, 7, 11474–11484. [Google Scholar] [CrossRef]
  41. Huang, S.J.; Zhou, Z.H. Multi-Label Learning by Exploiting Label Correlations Locally; AAAI Press: Palo Alto, CA, USA, 2012. [Google Scholar]
  42. Lin, Z.; Ganesh, A.; Wright, J.; Wu, L.; Chen, M.; Ma, Y. Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix; Report no. UILU-ENG-09-2214, DC-246; Coordinated Science Laboratory: Urbana, IL, USA, 2009; Available online: https://hdl.handle.net/2142/74352 (accessed on 20 December 2022).
  43. Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 2007, 40, 2038–2048. [Google Scholar] [CrossRef] [Green Version]
  44. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Figure 1. An explanation of specific and common features of feature space.
Figure 1. An explanation of specific and common features of feature space.
Mathematics 11 00782 g001
Figure 2. Bonferroni–Dunn test for SMDR-IC and other compared techniques. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) MacroF1. (f) MicroF1. (g) Coverage.
Figure 2. Bonferroni–Dunn test for SMDR-IC and other compared techniques. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) MacroF1. (f) MicroF1. (g) Coverage.
Mathematics 11 00782 g002
Figure 3. Results of 1-Hamming on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 3. Results of 1-Hamming on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Mathematics 11 00782 g003
Figure 4. Results of 1-Ranking on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 4. Results of 1-Ranking on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Mathematics 11 00782 g004
Figure 5. Results of AvgPrec on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 5. Results of AvgPrec on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Mathematics 11 00782 g005
Figure 6. Results of Coverage on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 6. Results of Coverage on four data sets under different target dimensionalities. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Mathematics 11 00782 g006
Figure 7. Sensitivity analysis of parameter α on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Figure 7. Sensitivity analysis of parameter α on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Mathematics 11 00782 g007
Figure 8. Sensitivity analysis of parameter β on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Figure 8. Sensitivity analysis of parameter β on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Mathematics 11 00782 g008aMathematics 11 00782 g008b
Figure 9. Sensitivity analysis of parameter γ on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Figure 9. Sensitivity analysis of parameter γ on four data sets for five evaluation metrics. (a) 1-Hamming. (b) 1-Ranking. (c) AvgPrec. (d) 1-OneError. (e) Coverage.
Mathematics 11 00782 g009
Figure 10. Convergence analysis of proposed method on four data sets. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Figure 10. Convergence analysis of proposed method on four data sets. (a) Emotions. (b) Scene. (c) Yeast. (d) Arts.
Mathematics 11 00782 g010
Table 1. Data statistics (“#instances” represents the total amount of instances, “#Attributes” indicates the number of features, including both the numeric and nominal features, “#Labels” is the number of class labels, and “Cardinality” means the average number of labels per instance).
Table 1. Data statistics (“#instances” represents the total amount of instances, “#Attributes” indicates the number of features, including both the numeric and nominal features, “#Labels” is the number of class labels, and “Cardinality” means the average number of labels per instance).
DatasetDomin#Instances#Attributes#LabelsCardinality
Emotionsmusic5937261.87
Sceneimage240729461.07
Corel5kimages50004993743.52
Yeastbiology2417103144.23
Artstext5000695261.65
Businesstext5000658301.60
Computerstext50001023331.51
Educationtext5000827331.46
Entertainmenttext5000961211.41
Healthtext5000919321.64
Recreationtext5000910221.43
Referencetext50001191331.17
Sciencetext50001116401.45
Socialtext50001571391.28
Societytext5000955271.67
Table 2. Experimental results of 20% labeled instances on fifteen data sets in terms of seven evaluation metrics.
Table 2. Experimental results of 20% labeled instances on fifteen data sets in terms of seven evaluation metrics.
DatasetMetricsCCAwMLDAbwMLDAewMLDAcwMLDAfwMLDAdMLDA-LCMDDMSSMLDRSMDRdmNMLSDRSMDR-IC
Emotions1-Hamming0.66240.68650.67860.67660.67680.68370.68230.69910.66570.73170.75070.7602
1-Ranking0.61950.65320.65330.63970.65050.65070.64270.70100.62230.74320.77410.7872
AvgPrec0.61050.63970.64100.62470.62420.63430.63700.67380.61170.71060.74250.7507
1-OneError0.46070.49720.47920.46850.45340.48430.49490.53820.45790.59160.64040.6466
MacroF10.37030.36140.39230.36880.37300.37790.38950.46990.33930.49300.53290.5373
MicroF10.39020.38730.40470.38960.38880.39850.40720.48370.35220.51510.55410.5643
Coverage2.85792.62082.64212.70622.61742.68482.70962.45062.81522.26242.11182.0287
Scene1-Hamming0.77620.80400.78490.77720.78620.78500.77790.80560.78060.86490.87590.8780
1-Ranking0.64870.66860.65640.64680.66390.65320.63270.71420.65030.84870.86210.8660
AvgPrec0.56090.58450.57900.56560.58230.57450.54700.63280.56540.76880.78810.7949
1-OneError0.35730.39560.38880.36470.38770.38160.34090.45480.36630.62960.65840.6710
MacroF10.33540.34360.37600.34500.37310.37140.31640.44420.34320.60120.62750.6365
MicroF10.33780.34400.37600.34790.37130.36930.31750.44130.34760.59540.62210.6311
Coverage1.83061.74081.78641.83931.76651.81831.91761.51241.83710.84650.77440.7592
Corel5k1-Hamming0.98720.98980.98730.98890.98760.98810.98910.98900.98980.99050.99050.9905
1-Ranking0.83290.83950.83130.83460.83170.83210.83250.83560.83730.84220.84160.8465
AvgPrec0.14910.20270.16430.15170.16350.16730.16260.16700.16460.21620.21020.2235
1-OneError0.11240.20760.13730.10340.13240.13650.11990.13730.12230.22710.21480.2382
MacroF10.00770.00820.01580.00580.01590.01580.01180.00990.00290.00130.00100.0024
MicroF10.04310.06890.08140.03170.07430.07300.04500.05380.01650.00530.00530.0118
Coverage131.5283130.7983133.2597130.2415133.1978135.4703133.7845129.6946128.3927129.4937129.6759127.7015
Yeast1-Hamming0.75550.77210.75880.75960.76390.75740.76160.76020.75740.76990.78190.7839
1-Ranking0.78220.79230.78130.77880.78350.78260.77900.77950.77570.79050.80210.8065
AvgPrec0.70150.71830.69810.69620.70250.69990.70010.69710.69270.70980.72740.7331
1-OneError0.68030.72440.67690.67220.67550.67190.68620.66630.67560.71400.72400.7348
MacroF10.36510.28140.34980.32780.34480.35180.32520.35690.30120.29140.33420.3367
MicroF10.57570.56380.57070.55910.57460.56970.57070.57430.55030.57180.60110.6041
Coverage7.03827.04717.02997.09487.04596.99997.19277.03607.20236.99796.79786.7818
Arts1-Hamming0.88340.92100.90460.90590.90350.90540.90200.89160.89930.93150.93210.9320
1-Ranking0.77340.77740.75570.74790.76960.77970.72570.79240.74260.82080.82340.8250
AvgPrec0.33550.34280.33240.29970.34390.34810.26050.38860.28520.44020.44790.4573
1-OneError0.18830.19190.19030.14540.19400.19230.10620.25130.14260.29690.31380.3203
MacroF10.09620.06190.08630.06770.08640.08280.04660.12220.06580.08800.10100.1009
MicroF10.17380.11540.15420.10700.15390.14960.08600.22250.11180.16250.18610.1896
Coverage7.37197.31997.91058.09557.52087.23578.60596.91668.25046.20776.14516.0962
Business1-Hamming0.94870.95810.94820.94820.94970.95800.93740.95000.93580.96920.96830.9698
1-Ranking0.92150.93150.92050.92190.92380.92940.90660.93220.90130.94950.95170.9550
AvgPrec0.70800.73150.67910.68200.69810.74130.60920.73540.59250.85440.85400.8618
1-OneError0.65270.66110.59830.60650.62670.71000.50390.67480.48850.86250.85690.8667
MacroF10.06400.05540.06390.05610.06910.05360.06340.09030.06500.05510.09020.0950
MicroF10.50080.52380.46510.46930.48710.52310.41030.54050.38740.65840.66360.6747
Coverage3.62813.20173.57733.69483.55383.43654.11163.26114.22592.76152.66292.5270
Computers1-Hamming0.93060.94520.93960.94310.93840.94390.93590.93680.93160.95530.95430.9560
1-Ranking0.86010.85400.84940.85120.85360.86140.84810.87330.81540.89150.89680.9013
AvgPrec0.44330.43420.41200.41270.42440.44060.40720.49690.33010.58990.59290.6127
1-OneError0.29110.27460.25570.24040.25590.27500.25430.35660.18230.51250.50070.5273
MacroF10.05640.04950.05620.03280.05470.04520.06060.07940.03600.05120.08740.0974
MicroF10.23410.22120.21130.16880.21230.20980.21140.29510.15200.36220.38830.3871
Coverage5.89506.02956.26716.24046.12395.98516.29415.44487.42034.91394.73154.5519
Education1-Hamming0.92290.94170.93440.94360.93510.94180.93340.92670.93590.95180.95070.9528
1-Ranking0.85490.86260.83750.83860.84740.85490.85330.86400.80300.88260.88300.8869
AvgPrec0.37210.37620.33780.32370.35020.36180.36340.41110.25450.45670.46280.4791
1-OneError0.19970.18380.17080.13890.17870.17670.19300.24810.09150.29060.30240.3207
MacroF10.06400.04800.05310.02950.05600.04530.05870.07940.03240.04600.06550.0605
MicroF10.20220.14700.15330.07950.15540.13240.17320.24320.08250.15240.19710.1870
Coverage5.69815.43156.30936.20815.90695.66805.78515.42137.46054.80734.80274.6636
Entertainment1-Hamming0.88420.91280.90140.90660.90180.90480.90340.89400.89500.92640.92760.9281
1-Ranking0.80740.82570.80350.79540.81580.81140.80060.83500.72230.85360.85610.8613
AvgPrec0.40780.45050.42430.39970.43390.42820.43170.47990.27640.52050.52520.5391
1-OneError0.23510.27230.26460.22710.25390.25170.27720.31720.11560.36590.36770.3852
MacroF10.12290.10960.12010.09460.11550.11100.12860.16220.05410.13640.14000.1481
MicroF10.23150.22080.22690.17070.22010.20730.23320.29910.09920.26920.28110.2873
Coverage4.69594.32354.81994.93844.53884.63014.89874.15616.41813.77753.69753.6377
Health1-Hamming0.92550.93820.93480.93740.93710.94250.93390.93180.92100.94970.95040.9515
1-Ranking0.89830.90210.88520.88660.89350.89730.89730.90390.84960.91700.92230.9261
AvgPrec0.49390.49860.48150.46570.50720.52760.50680.53920.35790.60000.62470.6384
1-OneError0.33540.33350.33550.29990.36440.38560.35970.40570.19730.47860.51630.5324
MacroF10.07570.07770.08050.06080.09090.06820.09030.11730.05070.06630.11690.1209
MicroF10.33020.25600.27130.22160.30750.27970.30040.36590.16790.30760.40560.4101
Coverage4.52474.36005.05154.97234.73034.68814.65444.36716.24853.94973.75763.6298
Recreation1-Hamming0.86660.91130.90030.89800.89920.89960.89670.88790.89470.93080.92920.9296
1-Ranking0.73230.73150.74710.72540.75130.75510.72550.76120.65710.76900.79300.7985
AvgPrec0.32110.33850.34910.31020.35490.35560.31980.38340.21700.38600.44090.4504
1-OneError0.15910.19370.19000.15300.19110.18310.16770.23180.07220.22650.29290.3019
MacroF10.09780.09100.10020.08270.10170.09260.09370.12940.04150.05560.12050.1306
MicroF10.15920.14940.16690.12790.16510.15940.14560.20970.06140.10380.19850.2024
Coverage6.67506.66506.25686.74246.15436.09116.76276.03178.22475.83065.30585.1896
Reference1-Hamming0.95060.95520.94900.95690.94970.95220.95530.95420.94400.96420.96240.9634
1-Ranking0.88200.87710.87180.87430.87520.87550.88190.88440.81670.89010.89310.8983
AvgPrec0.48540.46900.44410.46490.45670.47040.49580.51780.28410.55870.55820.5777
1-OneError0.32650.28740.27270.30320.27230.30070.34870.37910.08140.45750.43730.4644
MacroF10.05140.06640.06920.05350.07150.06990.06860.08430.02600.01870.06870.0708
MicroF10.28410.26170.26460.25810.26810.28800.29230.34350.06700.19440.36280.3611
Coverage4.34864.49054.67134.58094.57384.57174.34264.27576.44414.09263.99173.8267
Science1-Hamming0.94070.95020.94460.94730.94350.94540.94710.94240.94300.96080.95930.9596
1-Ranking0.81840.81480.81330.79910.80970.81770.81900.82780.77340.82670.83990.8449
AvgPrec0.31150.32960.31220.27640.30500.31580.33950.36150.21100.34820.39950.4079
1-OneError0.15820.18650.16030.12820.15720.15950.20280.22310.07230.19870.26420.2711
MacroF10.06240.06120.06230.04390.05970.05480.07410.08090.02930.01540.06980.0744
MicroF10.13610.14630.13990.09080.13690.12980.16540.19130.05490.05100.17420.1684
Coverage8.68138.94108.96089.44479.09488.76778.75348.436710.55848.51537.93437.7084
Social1-Hamming0.95760.96250.95860.95950.95740.96000.96180.96210.94770.96690.96620.9666
1-Ranking0.88810.89620.89530.89220.89160.89950.90190.90230.85150.91610.91220.9139
AvgPrec0.47380.51210.50250.46700.49200.50990.54990.55990.31960.58640.59010.5941
1-OneError0.29310.36130.34550.28130.33270.34330.41140.42340.14050.42220.45710.4574
MacroF10.04770.05360.07140.04830.06900.06930.07510.07000.03700.00330.06690.0756
MicroF10.19250.31290.30260.17690.30240.29210.34830.36880.12550.04300.36770.3565
Coverage5.25184.90094.92355.11835.12234.81344.72324.72226.69414.23534.32304.2311
Society1-Hamming0.88350.92300.90410.90620.90530.91070.90790.89170.90360.93570.93620.9367
1-Ranking0.79310.79540.78310.77990.79890.80150.79600.80120.72190.82830.83590.8366
AvgPrec0.37020.40580.35720.34430.37120.38040.38210.42210.24070.51110.52420.5291
1-OneError0.22210.22540.19430.18560.19540.21000.23170.30230.08040.42510.43610.4454
MacroF10.09440.07240.08330.07540.08580.08210.09430.10980.04280.05730.08170.0827
MicroF10.20090.17660.17050.16390.17220.18140.19690.25150.07690.27600.27900.2942
Coverage7.16237.17457.50517.62897.01887.06517.20657.00549.05936.38466.24476.1403
The best results are shown in bold.
Table 3. Experimental results of 50% labeled instances on fifteen data sets in terms of seven evaluation metrics.
Table 3. Experimental results of 50% labeled instances on fifteen data sets in terms of seven evaluation metrics.
DatasetMetricsCCAwMLDAbwMLDAewMLDAcwMLDAfwMLDAdMLDA-LCMDDMSSMLDRSMDRdmNMLSDRSMDR-IC
Emotions1-Hamming0.74820.76180.74900.75060.77010.74340.76560.75380.75290.76960.77320.7862
1-Ranking0.76000.79000.76740.77640.78940.76500.79460.77870.77500.79310.80290.8206
AvgPrec0.71910.75440.73680.73990.75350.72860.75960.74670.73970.75610.76770.7836
1-OneError0.59890.65450.64550.62810.65450.60960.65730.64100.63760.65110.67360.6972
MacroF10.55600.55130.56060.54890.58880.55110.57270.57300.55880.59720.59140.6080
MicroF10.57060.57480.57220.56410.60390.56120.59660.58040.57280.60910.61210.6319
Coverage2.12422.01402.15112.06072.00452.13431.97922.09892.08201.98261.95111.8753
Scene1-Hamming0.86360.87130.86830.87010.87050.87210.86470.86770.86880.88900.89320.8934
1-Ranking0.83860.83970.84220.84840.84330.84650.84210.84630.84610.88190.88900.8913
AvgPrec0.76160.76880.76850.77410.77260.77770.76060.77150.77310.81430.82260.8267
1-OneError0.62310.63910.63550.64220.64380.65310.61770.63780.64090.70030.71110.7173
MacroF10.60560.61490.61880.62520.62520.63540.59690.62580.62500.67700.68630.6893
MicroF10.59950.60930.61300.61740.61730.62920.59030.61680.61700.66970.67970.6836
Coverage0.89530.89240.87690.84740.87580.86000.87030.85670.85660.67940.64430.6328
Corel5k1-Hamming0.99040.99040.99040.99050.99040.99050.99050.99050.99050.99050.99050.9906
1-Ranking0.85740.86000.86030.85760.85870.85770.85700.85750.85750.85810.85680.8609
AvgPrec0.22130.23540.23160.22250.22930.23040.23010.22830.21970.23050.22630.2403
1-OneError0.22470.25110.23750.21740.23460.24040.23650.23660.21820.24440.23890.2577
MacroF10.00600.00960.00770.00560.00780.00770.00820.00490.00320.00350.00370.0055
MicroF10.02340.04310.03140.02050.03110.02770.03570.01950.01140.01280.01660.0236
Coverage119.3953118.0986117.7031119.2533119.0181119.0487120.1382119.3591119.7709120.0287121.4830117.9619
Yeast1-Hamming0.78160.78550.78490.78510.78740.78640.78770.79030.78160.78470.79300.7943
1-Ranking0.80420.80600.80750.80660.80700.81000.80790.81070.80400.80310.81380.8172
AvgPrec0.73040.73060.73540.73400.73580.73830.73320.73740.72970.73030.74570.7494
1-OneError0.72730.74090.74050.73460.73720.73820.73540.72870.74570.73870.75070.7576
MacroF10.34840.28390.34360.33730.33600.34910.31910.35690.30410.31230.35490.3455
MicroF10.60370.58110.60880.60290.60490.61090.60740.61910.59220.60020.62190.6181
Coverage6.80346.76186.72706.73986.74046.65766.73616.66986.78096.79906.62296.6229
Arts1-Hamming0.92890.93590.93150.93140.93290.93280.93130.92860.93020.93620.93600.9374
1-Ranking0.83740.84450.84120.84100.84120.84330.84020.83860.83490.84280.84420.8487
AvgPrec0.49150.50690.49990.49430.50180.50070.49050.49140.47900.48900.50260.5166
1-OneError0.37150.38680.38080.37070.38610.38170.36740.36800.35310.35750.38430.4026
MacroF10.17760.14730.18940.17590.18050.18710.17630.17680.16110.14350.16200.1720
MicroF10.27200.25040.28310.26930.28240.27690.26170.27720.24860.21950.25510.2708
Coverage5.72815.56275.64515.63215.65055.59435.66755.69475.84935.60335.61395.4561
Business1-Hamming0.96840.97020.96850.96870.96910.96900.96880.96790.96580.97000.97080.9715
1-Ranking0.95850.95730.95890.96010.95910.95810.95960.95830.95520.95450.96020.9618
AvgPrec0.86360.86360.85970.86110.86260.85950.85870.85840.84350.85880.86900.8768
1-OneError0.86250.85840.85350.85510.85760.85690.85310.85350.83320.86250.86680.8784
MacroF10.16690.11660.15130.15330.16260.15510.15590.16980.13310.07440.14690.1545
MicroF10.67850.66540.66990.67140.67810.67330.67070.67530.64810.66550.69160.6951
Coverage2.37562.38732.33712.30682.34372.41532.32752.34662.48452.51112.28972.2537
Computers1-Hamming0.94960.95570.95250.95280.95200.95460.95130.94760.94610.95510.95840.9590
1-Ranking0.90110.89750.90070.89990.89860.90310.90090.89750.88750.89760.91110.9149
AvgPrec0.58940.59510.58660.58380.58270.59160.58310.58400.54370.59490.63700.6433
1-OneError0.48230.50120.48430.47500.48090.48690.47990.48620.43050.50650.55730.5622
MacroF10.14050.09470.14500.12630.13920.12880.13450.14710.11360.08880.14540.1489
MicroF10.38720.38080.39480.37920.39810.38830.38480.39510.34090.38110.44230.4515
Coverage4.46574.62494.50404.52854.57444.42354.48814.58914.97714.65434.16044.0427
Education1-Hamming0.94960.95340.95110.95180.95160.95190.95130.94840.94980.95450.95470.9549
1-Ranking0.89280.89710.89240.89230.89480.89400.89420.89100.88640.89670.89740.9006
AvgPrec0.49260.50250.50030.49400.50450.49640.49890.49230.46920.50240.51270.5225
1-OneError0.34190.34850.35080.34390.36070.34210.34450.34270.31480.34820.36830.3764
MacroF10.11020.09210.11780.11210.11590.10690.11540.11520.10090.10260.10600.1131
MicroF10.27360.23520.27350.25180.26790.24650.24920.28020.22340.22870.25330.2510
Coverage4.45084.28944.47254.49394.43354.41774.45274.51974.66934.30864.31644.2052
Entertainment1-Hamming0.92180.93000.92440.92490.92440.92680.92660.92200.92040.93090.93340.9354
1-Ranking0.86170.86860.86590.86430.86540.86580.86240.86430.85230.86860.87550.8799
AvgPrec0.54150.56040.55700.55350.55840.55680.54260.55570.52020.56250.58330.5932
1-OneError0.39360.41920.41790.41070.41750.41090.39090.41590.36790.41430.44700.4577
MacroF10.21350.16870.20410.19460.20730.19200.17800.22280.16610.17160.20080.2146
MicroF10.36240.32500.35610.33420.35800.33230.31080.38150.29970.31100.36070.3742
Coverage3.57433.46693.50483.57793.50853.54653.58873.55193.78143.46383.31783.2151
Health1-Hamming0.95180.95490.95350.95300.95410.95460.95090.95080.94740.95370.95620.9570
1-Ranking0.93150.93290.93360.93190.93410.93100.92730.93200.91900.93250.93660.9382
AvgPrec0.65920.66840.66840.65890.66880.66320.63790.66170.60390.65670.68280.6910
1-OneError0.56150.57410.57570.56660.57470.56770.53040.56410.49360.55290.59270.6071
MacroF10.18850.15870.19930.20580.21250.19060.17610.19850.15500.16400.20720.2083
MicroF10.46990.44990.48320.46470.48850.46210.43810.48290.39930.42440.48050.4902
Coverage3.42393.38033.36553.41053.33803.49313.58393.41923.83113.40033.25233.2255
Recreation1-Hamming0.92000.93050.92630.92480.92720.92620.92530.92110.92350.93230.93420.9359
1-Ranking0.80630.81310.81420.81310.81310.81340.80460.80940.80050.80430.82140.8286
AvgPrec0.47280.48960.49220.48130.49250.48870.46960.47660.45340.45720.50560.5204
1-OneError0.33750.35830.36370.34620.36280.35530.33290.34010.31070.30960.37380.3917
MacroF10.20540.18110.22900.20530.22890.21410.20040.20880.17640.14540.20330.2135
MicroF10.27940.26920.30450.28300.30460.29390.26610.28480.24400.20440.29070.3020
Coverage5.05084.91224.89834.88674.86094.86805.07574.98295.16555.09154.69434.5397
Reference1-Hamming0.95540.96330.95820.95820.95820.96010.95840.95690.95220.96350.96540.9657
1-Ranking0.89300.89660.89000.88710.88910.89000.88560.89620.85770.89560.90730.9106
AvgPrec0.54350.56080.54600.53260.54040.54120.51830.55750.44690.56510.60420.6154
1-OneError0.40950.42660.41910.39930.40930.41360.38450.43190.30340.45390.49190.5069
MacroF10.09400.09130.10720.09440.10490.10840.07980.11440.07900.04830.11750.1184
MicroF10.36500.38010.36840.34700.36490.35770.32940.39320.25420.30070.41370.4255
Coverage3.97143.86014.08104.16694.10094.07734.22633.86705.10503.90793.50043.3993
Science1-Hamming0.94990.95750.95330.95400.95350.95430.95400.95010.95230.96040.96140.9622
1-Ranking0.84230.84590.84300.83810.84730.84980.84530.84710.83510.84430.86320.8650
AvgPrec0.39580.40740.40580.39650.41360.41060.39690.41220.36170.39130.46220.4695
1-OneError0.26270.27120.27430.26620.28140.27380.25750.28010.22230.24360.33690.3480
MacroF10.11730.09750.11390.11220.12120.11380.11050.11890.08990.04640.13590.1398
MicroF10.22070.19650.22000.20820.22760.21360.20090.23180.17110.10530.26030.2607
Coverage7.68217.61457.71657.83917.51117.50927.62007.51317.99437.69246.88436.8199
Social1-Hamming0.95320.96250.95810.95750.95680.95860.94940.95740.94700.96740.96830.9693
1-Ranking0.88960.89790.89350.88980.89160.89620.87270.90630.85500.91850.92290.9265
AvgPrec0.46690.48960.47360.45220.46630.47410.39580.55030.33060.59550.62800.6354
1-OneError0.30220.32270.31010.27750.29760.29900.22920.40790.16310.43700.51100.5137
MacroF10.07790.06940.09280.08100.09680.09060.05910.11290.05180.00550.13020.1376
MicroF10.22980.25240.25110.20560.23900.22430.19380.34940.12560.06250.43810.4310
Coverage5.10894.80454.99035.14615.10654.87195.81204.50016.49714.08983.83693.6669
Society1-Hamming0.92200.93650.92990.92690.92720.93010.92880.92430.92600.93700.93920.9404
1-Ranking0.83900.84310.83640.83470.83780.84170.82870.83750.81950.84320.85340.8594
AvgPrec0.50130.53360.50830.49450.50700.51040.47500.51040.45650.53610.56380.5761
1-OneError0.38830.42890.40360.38500.39600.39890.35420.40840.33680.45070.48650.5048
MacroF10.13980.11270.14610.14380.14720.14430.10510.15110.10910.09070.12980.1537
MicroF10.29730.28660.30590.28010.29560.29110.25070.32120.22730.27120.33730.3568
Coverage5.99905.89736.06956.18396.07355.98436.28535.97056.52855.98385.69125.5111
The best results are shown in bold.
Table 4. Experimental results of 70% labeled instances on fifteen data sets in terms of seven evaluation metrics.
Table 4. Experimental results of 70% labeled instances on fifteen data sets in terms of seven evaluation metrics.
DatasetMetricsCCAwMLDAbwMLDAewMLDAcwMLDAfwMLDAdMLDA-LCMDDMSSMLDRSMDRdmNMLSDRSMDR-IC
Emotions1-Hamming0.76150.77170.76720.77390.77670.77660.76710.77500.77280.77980.78550.7960
1-Ranking0.78480.79760.79620.79900.80150.81080.80290.79740.80210.81520.81590.8270
AvgPrec0.74580.76380.76300.76120.76420.77640.76900.76500.76720.77950.77700.7945
1-OneError0.63600.67810.67360.66570.67130.69330.68310.67580.67360.68370.68710.7185
MacroF10.57850.56750.58760.60400.59680.59890.58620.60710.59800.60490.60590.6244
MicroF10.59260.59880.59820.61970.61190.61970.59780.62350.61590.62840.62620.6443
Coverage2.02641.98482.00731.98931.92081.94491.97582.00901.96631.87751.87251.8281
Scene1-Hamming0.88280.88540.88340.88350.88680.88770.87400.88210.88490.89180.89180.8987
1-Ranking0.86670.87000.87280.86940.87560.87650.86460.87050.87110.88530.88640.8956
AvgPrec0.79870.80190.80310.80110.80970.81140.78680.80120.80300.81920.82200.8323
1-OneError0.67990.68230.68410.68120.69640.69710.65130.67950.68490.70640.71200.7256
MacroF10.66490.66090.66920.65960.67140.67730.63090.65960.66840.68060.68590.7021
MicroF10.65650.65560.65970.65330.66410.67080.62230.65140.65990.67560.68060.6977
Coverage0.75600.72810.72680.73720.71310.70710.77030.73330.73170.65990.66140.6051
Corel5k1-Hamming0.99050.99050.99050.99050.99050.99050.99050.99050.99060.99060.99060.9906
1-Ranking0.86140.86330.86220.86190.86000.86400.86230.86190.86090.86010.86060.8645
AvgPrec0.23590.24590.23980.23490.23420.23890.24010.23380.23030.23240.23460.2457
1-OneError0.25390.27020.25380.24390.24420.25190.25460.24280.24000.25050.25390.2694
MacroF10.00670.01110.00850.00670.00800.00790.00910.00650.00560.00410.00620.0088
MicroF10.02670.04490.03440.02320.02890.02770.03400.02670.01920.01850.02540.0334
Coverage117.6995116.8086116.7635117.3177118.7547116.2923117.1544117.0889117.6433117.9017118.3098115.5051
Yeast1-Hamming0.79340.78560.79050.78760.78910.79180.78840.79380.78750.79070.79080.7963
1-Ranking0.81680.80950.81360.80920.80830.81470.81070.81710.81070.81470.81310.8193
AvgPrec0.74520.74040.74390.73700.73750.74340.74110.74710.74070.74600.74200.7501
1-OneError0.74890.75810.75410.74350.74040.74410.75230.75070.75390.75440.75080.7581
MacroF10.34990.29160.34590.34210.35260.36270.33470.36920.31540.32760.34730.3519
MicroF10.62280.58810.61440.61210.61520.62410.61430.63270.60460.61550.61640.6249
Coverage6.59706.74196.63756.70216.72006.59066.70106.55656.69726.68506.63136.5409
Arts1-Hamming0.93500.93830.93650.93610.93780.93620.93600.93530.93490.93710.93770.9391
1-Ranking0.84860.85260.85330.85320.85500.85320.85250.84840.84750.85030.85230.8565
AvgPrec0.51670.52810.52710.52210.53180.52280.52170.51690.50740.51610.52240.5354
1-OneError0.40100.41960.41130.40410.41880.40190.40350.39960.38710.39750.40560.4243
MacroF10.20030.16160.19260.18570.19750.19920.19480.19300.17410.16980.17640.1966
MicroF10.29500.26430.29620.27630.30190.28610.27810.29000.26060.25950.27340.2929
Coverage5.44505.38835.32975.34755.27515.31535.32945.45475.51495.42225.37455.2227
Business1-Hamming0.97100.97110.97080.97110.97090.97150.97030.97060.96990.97030.97120.9722
1-Ranking0.96240.96080.96220.96250.96270.96220.96260.96240.96120.95470.96220.9642
AvgPrec0.87380.87120.87470.87320.87430.87440.87120.87160.86470.85950.87470.8815
1-OneError0.87320.86590.87270.87140.87090.87230.86850.86710.85880.85910.87090.8817
MacroF10.18180.13470.17430.17750.18430.18650.18850.18820.16970.08680.17690.1923
MicroF10.69490.67920.68850.69230.69220.69470.68690.69310.68450.66920.69740.7041
Coverage2.21932.27402.21012.20832.22492.20602.22092.18292.27062.47652.21142.1547
Computers1-Hamming0.95590.95930.95720.95810.95770.95840.95710.95350.95330.95530.95860.9601
1-Ranking0.91200.90970.91320.91440.91420.91440.91300.91050.90590.90120.91530.9172
AvgPrec0.63450.63720.63740.63690.64020.63810.63380.63110.60560.60140.64420.6573
1-OneError0.54910.55810.54970.55050.55190.54990.54730.54420.51040.51050.56040.5812
MacroF10.18200.12850.18330.17620.18290.17110.17870.18550.14840.09030.18360.1802
MicroF10.44830.43730.45670.44880.45840.43950.44470.44060.40770.37200.45570.4664
Coverage4.08484.18384.01964.05543.99994.03474.10994.14324.32294.52343.99193.9225
Education1-Hamming0.95440.95600.95530.95500.95540.95520.95540.95380.95360.95550.95550.9562
1-Ranking0.90020.90330.90150.90170.90240.90140.90340.89900.89360.90090.90300.9043
AvgPrec0.52500.53090.53180.52930.53290.52690.52890.52210.49520.52090.53100.5396
1-OneError0.38290.38650.38990.38650.39230.38280.38230.37740.34010.37670.38810.4003
MacroF10.15160.13520.13670.14780.14100.14690.15120.14410.10310.12220.12930.1405
MicroF10.29310.26240.27750.27470.29340.27730.27940.28540.20460.23760.26010.2824
Coverage4.18444.10464.18514.15384.13094.16184.08514.26234.41574.20774.12374.0771
Entertainment1-Hamming0.93300.93650.93210.93330.93410.93390.93330.93370.93210.93540.93700.9372
1-Ranking0.87730.88240.87920.87910.87890.88170.87980.87760.87500.87850.88170.8856
AvgPrec0.58580.60290.59340.59500.59950.59910.59180.59040.58200.59210.60710.6127
1-OneError0.44910.47350.46330.46490.47020.46520.45670.45610.44600.45710.48170.4854
MacroF10.24970.19540.23290.23330.23530.22720.21330.25550.20460.21620.23390.2436
MicroF10.40960.37290.38860.39030.39320.38270.36010.41770.35990.37420.39960.4026
Coverage3.25663.18813.21923.25113.22673.17513.24523.26773.30873.21893.20413.1018
Health1-Hamming0.95690.95850.95840.95780.95840.95830.95690.95720.95270.95580.95890.9593
1-Ranking0.93990.93950.94210.94100.94330.94140.93920.94090.93060.93760.94230.9437
AvgPrec0.69340.69980.70620.69860.70870.70350.69140.70070.65000.67800.70640.7113
1-OneError0.60550.61930.62570.61430.62600.62400.60230.61790.54650.58290.62280.6297
MacroF10.22800.18510.23450.23440.24680.23300.22830.23150.19430.19650.22960.2312
MicroF10.50620.49380.52350.51550.52700.50750.49460.52410.44960.46070.51910.5130
Coverage3.12293.15033.07043.09013.00033.11123.14843.08633.44173.19923.02342.9960
Recreation1-Hamming0.93070.93710.93500.93420.93470.93460.93340.93170.93240.93440.93650.9386
1-Ranking0.82660.83030.83430.83130.83490.83540.82810.82830.81990.81480.83200.8399
AvgPrec0.51860.53030.53500.52550.53750.53470.51540.52400.50590.48950.53010.5458
1-OneError0.39420.41150.41210.39950.41460.41190.38590.39810.37310.35250.40830.4265
MacroF10.24970.22760.26290.24690.25970.26660.23850.25190.20670.18240.24760.2666
MicroF10.32290.32080.33910.33010.34350.33970.30520.32660.29230.24560.32590.3440
Coverage4.57684.52874.42234.47894.42334.40324.56294.56534.76354.86954.47914.2944
Reference1-Hamming0.96390.96620.96410.96470.96460.96490.96250.96300.96140.96440.96700.9670
1-Ranking0.90680.90940.90850.90840.90940.90650.89960.91020.89120.90220.91240.9143
AvgPrec0.60230.61480.60550.60990.60630.60250.56970.60800.54150.58290.62570.6284
1-OneError0.49140.50560.49310.50250.49230.49110.44850.49330.41670.47050.51990.5254
MacroF10.12870.12300.13520.13660.13890.13540.11410.13410.09100.08850.13660.1471
MicroF10.43590.44120.43590.42950.43510.42170.38520.44610.35360.35380.44750.4476
Coverage3.51433.41973.46153.47663.41413.51313.76543.42434.02473.68283.33253.2698
Science1-Hamming0.95900.96210.96040.96020.96040.96050.95960.95890.95920.96080.96280.9634
1-Ranking0.86330.86270.86600.86570.86830.86860.86390.86340.85590.84770.86840.8735
AvgPrec0.46230.46530.47010.46520.47450.47420.45970.46600.43090.41210.47950.4910
1-OneError0.33570.34230.34830.34010.35320.34890.33180.34370.29850.27870.35700.3723
MacroF10.15900.12350.16390.15360.15180.15700.14790.15270.12280.07590.14500.1565
MicroF10.27020.24980.28120.26550.28110.28070.26160.28350.22110.17260.27730.2832
Coverage6.87516.90526.73876.73116.66566.68816.87976.83907.19297.53746.67816.4324
Social1-Hamming0.96250.96810.96530.96470.96540.96570.96230.96380.95820.96780.96910.9704
1-Ranking0.91600.92090.91910.92200.92080.91870.91010.92080.90070.91880.92810.9323
AvgPrec0.58490.60930.59640.59510.60040.59000.55980.60980.50720.59810.64890.6581
1-OneError0.44910.48230.46490.45820.46730.45530.42070.48550.35470.44220.53680.5434
MacroF10.14120.11510.14910.15170.16610.14690.11840.15430.10470.00800.16720.1622
MicroF10.37010.40690.39840.38160.40080.37940.34540.41470.27020.08850.46300.4661
Coverage4.08373.87273.94833.81663.89274.01214.29233.87634.67084.04243.57473.4423
Society1-Hamming0.93540.94130.93800.93650.93730.93750.93540.93560.93560.93940.94140.9421
1-Ranking0.85600.85840.85880.85540.85750.85840.84760.85600.84620.85300.85880.8628
AvgPrec0.55720.57870.56650.55480.56450.56360.53730.55970.53520.55840.58180.5898
1-OneError0.46880.50030.48270.46940.48270.47950.44450.47410.43890.47840.51630.5269
MacroF10.16850.13760.17080.16330.16700.17050.12120.16820.13010.11750.15730.1682
MicroF10.34700.32780.35260.33210.34710.34460.29070.35330.30100.31210.36650.3681
Coverage5.55975.51295.46955.62885.56915.55615.83835.55965.85175.65505.52355.4241
The best results are shown in bold.
Table 5. Friedman statistics F F and the critical value of each evaluation metric.
Table 5. Friedman statistics F F and the critical value of each evaluation metric.
Metric F F ( K = 12 , N = 15 )Critical Value ( α = 0.05 )
Hamming loss 50.5529 1.8513
Ranking loss 59.7234
Average Precision 64.3119
One Error 47.8750
MacroF1 16.8654
MicroF1 19.9092
Coverage 54.2087
Table 6. Calculation time (s) of different methods on four data sets.
Table 6. Calculation time (s) of different methods on four data sets.
DatasetCCAwMLDAbwMLDAewMLDAcwMLDAfwMLDAdMLDA-LCMDDMSSMLDRSMDRdmNMLSDRSMDR-IC
Emotions0.06620.02970.02010.02760.03020.05200.02060.02420.04220.03950.03610.1429
Scene0.11930.11830.12030.11160.12240.13780.16970.14630.78380.70030.65980.8260
Yeast0.14350.14690.14070.15030.15691.03970.15570.14050.70120.53100.49050.6690
Arts0.54850.57480.58660.57891.18611.52680.95601.39946.18655.81235.298811.9221
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, R.; Du, J.; Ding, J.; Jia, L.; Chen, Y.; Shang, Z. Semi-Supervised Multi-Label Dimensionality Reduction Learning by Instance and Label Correlations. Mathematics 2023, 11, 782. https://0-doi-org.brum.beds.ac.uk/10.3390/math11030782

AMA Style

Li R, Du J, Ding J, Jia L, Chen Y, Shang Z. Semi-Supervised Multi-Label Dimensionality Reduction Learning by Instance and Label Correlations. Mathematics. 2023; 11(3):782. https://0-doi-org.brum.beds.ac.uk/10.3390/math11030782

Chicago/Turabian Style

Li, Runxin, Jiaxing Du, Jiaman Ding, Lianyin Jia, Yinong Chen, and Zhenhong Shang. 2023. "Semi-Supervised Multi-Label Dimensionality Reduction Learning by Instance and Label Correlations" Mathematics 11, no. 3: 782. https://0-doi-org.brum.beds.ac.uk/10.3390/math11030782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop