Next Article in Journal
An Animated Visualization Method for Large-Scale Unstructured Unsteady Flow
Next Article in Special Issue
Construction Project Cost Prediction Method Based on Improved BiLSTM
Previous Article in Journal
The Surface Free Energy of Resin-Based Composite in Context of Wetting Ability of Dental Adhesives
Previous Article in Special Issue
An Intelligent Facial Expression Recognition System Using a Hybrid Deep Convolutional Neural Network for Multimedia Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Kernel Geometric Mean Metric Learning

School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
*
Author to whom correspondence should be addressed.
Submission received: 19 July 2023 / Revised: 7 September 2023 / Accepted: 9 October 2023 / Published: 6 November 2023
(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

Abstract

:
Geometric mean metric learning (GMML) algorithm is a novel metric learning approach proposed recently. It has many advantages such as unconstrained convex objective function, closed form solution, faster computational speed, and interpretability over other existing metric learning technologies. However, addressing the nonlinear problem is not effective enough. The kernel method is an effective method to solve nonlinear problems. Therefore, a kernel geometric mean metric learning (KGMML) algorithm is proposed. The basic idea is to transform the input space into a high-dimensional feature space through nonlinear transformation, and use the integral representation of the weighted geometric mean and the Woodbury matrix identity in new feature space to generalize the analytical solution obtained in the GMML algorithm as a form represented by a kernel matrix, and then the KGMML algorithm is obtained through operations. Experimental results on 15 datasets show that the proposed algorithm can effectively improve the accuracy of the GMML algorithm and other metric algorithms.

1. Introduction

It is well known that the distance measure is one of the most commonly used measures to describe the similarity between samples. At present, various distance metrics have been proposed, such as the Euclidean distance and Mahalanobis distance. However, these distance metric expressions are fixed, i.e., there are non-adjustable parameters, which result in varying degrees of effectiveness in dealing with various problems. Thus, an effective distance metric is proposed by constructing learning from training samples. From the definition of the distance metric, it follows that any binary function d x , x defined in the feature space is called a distance function, provided that the four conditions of symmetry, self-similarity, non-negativity, and trigonometric inequality are satisfied simultaneously. Thus, any binary function
d M x , x = x x T M x x ,
is a distance function determined by any symmetric positive definite (SPD) matrix M, where x , x are two samples, and usually M is called a metric matrix. The purpose of metric learning is to use training samples to learn a metric matrix M such that the resulting distance function d M x , x can improve the performance of the learning algorithm or satisfy some application requirements. Thus, metric learning has wide applications in many fields, such as pattern recognition [1,2], data mining [3,4,5], information security [6,7], bioinformatics [8,9], and medical diagnosis [10,11,12].
Because of the wide application space, metric learning techniques have received a lot of attention, and many excellent algorithms have been proposed. Xing [13] first proposed a metric learning algorithm. The main idea of the algorithm is to learn a metric matrix so that the distance between similar pairs of samples is small and the distance between dissimilar pairs of samples is large. The algorithm can make the distribution of similar pairs of samples more compact in the new metric space, while the distribution of dissimilar pairs is more discrete. The proposal of this algorithm marked the real development of metric learning, and many subsequent research works were inspired by this algorithm. Davis [14] proposed an information theoretic metric learning algorithm. The basic idea of the algorithm is to assume the existence of a priority metric matrix M 0 , while ensuring that the distance between similar pairs of samples is less than a threshold and the distance between dissimilar pairs of samples is greater than a threshold. By minimizing the relative entropy between the multivariate Gaussian distributions corresponding to M and M 0 , Wang [15] proposed an information geometry metric learning algorithm. The basic idea of the algorithm is to use the category labeling information of the training samples to construct a kernel matrix that can reflect the desired distance between samples. Construct an actual kernel matrix describing the realistic distance relationship between the samples using the eigenvectors of the samples as well as the metric matrix M. The metric matrix M is then solved by minimizing the distance between these two kernel matrices. Weinberger [16] proposed a metric learning algorithm based on the maximum margin. The basic idea is to define an interval function between similar samples of each sample and other classes of samples. The metric matrix is then solved by minimizing the distance between similar pairs of samples and maximizing the defined interval.
Geometric mean metric learning algorithm was proposed by Pourya [17] in 2016. The essence of most metric learning algorithms is to minimize the distance between similar pairs of samples rather than maximize the distance between similar pairs of samples. Therefore, the sign of the term corresponding to the dissimilar pair of samples in the objective function is usually negative, while the strategy of the geometric mean metric learning model is to use the inverse matrix of the metric matrix M to represent the distance between dissimilar pairs of samples. The advantage of this approach is that the sign of the item corresponding to the dissimilar pair of samples in the objective function becomes positive, which reduces the difficulty of solving the model. Its objective function is as follows:
min M > 0 x 1 i , x 2 i D + d M x 1 i , x 2 i + x ¯ 1 j , x ¯ 2 j D d M 1 x ¯ 1 j , x ¯ 2 j ,
where the similar pair sample set D + and the dissimilar pair sample set D can be expressed as
D + = x 1 i , x 2 i x 1 i , x 2 i R d are in the same class , i = 1 , 2 , , n , D = x ¯ 1 j , x ¯ 2 j x ¯ 1 j , x ¯ 2 j R d are in different class , j = 1 , 2 , , m .
Although the GMML algorithm has some advantages, such as unconstrained convex objective function, closed-form solution and interpretability, and faster calculation speed [18], it is actually a linear learning method and does not work well to address non-linear problems. Because kernel methods are the key technology for addressing nonlinear problems, kernel algorithms [19,20,21,22,23] have been proposed. In view of this, kernel geometric mean metric learning algorithm is proposed. The closed-form solution of GMML algorithm in the high-dimensional feature space is rephrased. Then, the solution is generalized as a form of the kernel matrix by using the integral representation of the weighted geometric mean and the Woodbury matrix identity, leading to the KGMML algorithm. It retains the advantages of the GMML algorithm, while effectively addressing nonlinear problems.
The structure of this paper is organized as follows: In Section 2, some lemmas of weighted geometric mean and Woodbury matrix identity are formulated. Then, in Section 3, the optimization problem is discussed, then the extension to the weighted geometric mean and solution are discussed. The setup of the experiment, the analysis of parameter sensitivity, and experimental results are given in Section 4. Finally, our conclusions are summarized in Section 5.

2. Preliminaries

Lemma 1
([24]). For any t ( 0 , 1 ) , A is n × n positive definite, B is n × n Hermitian, and the following equation holds:
A t B = 2 sin ( π t ) π A 1 1 ( 1 s ) t ( 1 + s ) t 1 ( 1 s ) I + ( 1 + s ) B 1 A 1 d s ,
which is the integral representation of the A , B weighted geometric mean, where I is the identity matrix, and t represents the weighted geometric mean with weight t
Lemma 2
([25]). If A is a n × n invertible matrix corrected by U C V , where U is the n × k matrix, C is the k × k matrix, and V is the k × n matrix, then the Woodbury matrix identity is
( A + U C V ) 1 = A 1 A 1 U C 1 + V A 1 U 1 V A 1 .
Lemma 3
([24]). For any t ( 0 , 1 ) , A is n × n positive definite, B is n × n Hermitian, and the following equation holds:
A t B = A B 1 A t .

3. Main Results

3.1. Optimization Problem

The core of the kernel method is to transform the linearly inseparable point set in the low-dimensional space into the high-dimensional space through a nonlinear transformation Φ · , making it linearly separable. In this process, it is not necessary to know the specific analytical expression of the nonlinear transformation, but only know the inner product Φ x , Φ x after the transformation of the two samples x , x , and Φ x , Φ x can be represented by a kernel function, i.e., Φ x , Φ x = k x , x , where k x , x is a kernel function. For ease of presentation, we first derive the model using Φ · and then replace it with the kernel function to obtain the final kernel geometric mean metric learning algorithm. Let D Φ + and D Φ represent the similarity sample set D + and the non-similar pair sample set D mapped to the point set in the high-dimensional space F , that is
D Φ + = Φ ( x 1 i ) , Φ ( x 2 i ) Φ ( x 1 i ) , Φ ( x 2 i ) F are in the same class , i = 1 , 2 , , n , D Φ = Φ ( x ¯ 1 j ) , Φ ( x ¯ 2 j ) Φ ( x ¯ 1 j ) , Φ ( x ¯ 2 j ) F are in different class , j = 1 , 2 , , m .
In high dimensional space, the objective function (2) can be written in the following form:
min M Φ > 0 Φ ( x 1 i ) , Φ ( x 2 i ) D Φ + d M Φ Φ ( x 1 i ) , Φ ( x 2 i ) + Φ ( x ¯ 1 j ) , Φ ( x ¯ 2 j ) D Φ d M Φ 1 Φ ( x ¯ 1 j ) , Φ ( x ¯ 2 j ) = min M Φ > 0 Φ ( x 1 i ) , Φ ( x 2 i ) D Φ + Φ ( x 1 i ) Φ ( x 2 i ) T M Φ Φ ( x 1 i ) Φ ( x 2 i ) + Φ ( x ¯ 1 j ) , Φ ( x ¯ 2 j ) D Φ Φ ( x ¯ 1 j ) Φ ( x ¯ 2 j ) T M Φ 1 Φ ( x ¯ 1 j ) Φ ( x ¯ 2 j ) .
where d M Φ Φ x 1 i , Φ x 2 i = Φ x 1 i Φ x 2 i T M Φ Φ x 1 i Φ x 2 i represents the distance metric in the high dimensional space and M Φ is the metric matrix.
Rewriting the distance uses traces, Equation (5) can be turned into the following optimization problem:
min M Φ > 0 Φ ( x 1 i ) , Φ ( x 2 i ) D Φ + tr M Φ Φ ( x 1 i ) Φ ( x 2 i ) Φ ( x 1 i ) Φ ( x 2 i ) T + Φ ( x ¯ 1 j ) , Φ ( x ¯ 2 j ) D Φ tr M Φ 1 Φ ( x ¯ 1 j ) Φ ( x ¯ 2 j ) Φ ( x ¯ 1 j ) Φ ( x ¯ 2 j ) T = min M Φ > 0 tr ( M Φ S Φ ) + tr M Φ 1 D Φ = min M Φ > 0 h ( M Φ ) ,
where S Φ , D Φ are
S Φ = Φ ( x 1 i ) , Φ ( x 2 i ) D Φ + Φ ( x 1 i ) Φ ( x 2 i ) Φ ( x 1 i ) Φ ( x 2 i ) T , D Φ = Φ ( x ¯ 1 j ) , Φ ( x ¯ 2 j ) D Φ Φ ( x ¯ 1 j ) Φ ( x ¯ 2 j ) Φ ( x ¯ 1 j ) Φ ( x ¯ 2 j ) T .
Differentiating h ( M Φ ) with respect to M Φ yields
h M Φ = S Φ M Φ 1 D Φ M Φ 1 .
Then, h ( M ) is set to 0, which implies that
M Φ S Φ M Φ = D Φ ,
and it is clear that the above equation is a Riccati equation. Since S Φ and D Φ are usually positive definite matrices, Equation (9) has a unique positive solution which is the midpoint of the geodesic joining S Φ 1 to D Φ [26], that is
M Φ = S Φ 1 1 / 2 D Φ = S Φ 1 / 2 S Φ 1 / 2 D Φ S Φ 1 / 2 1 / 2 S Φ 1 / 2 .

3.2. Extension to Weighted Geometric Mean and Solution

To incorporate the weighted geometric mean, it is necessary to take into account the determination of weights for the objective function. When assigning linear weights for S Φ 1 and D Φ , only the metric matrix M Φ can be uniformly scaled by a constant factor. Consequently, it is illogical to assign linear weights to the two components in Equation (5). Nevertheless, by employing nonlinear weights derived from the SPD manifold Riemannian geometry, the weights can be transformed into trade-offs between the two terms. Thus, the Riemann distance δ R is introduced, and the problem of finding the minimum value of h ( M Φ ) is equivalent to solving the minimum value of the following optimization problem:
min M Φ 0 δ R 2 M Φ , S Φ 1 + δ R 2 M Φ , D Φ ,
where the Riemannian distance between SPD matrices X and Y is denoted by δ R 2 ( X , Y ) = log Y 1 / 2 X Y 1 / 2 F , and . F is the Frobenius norm of a matrix. A linear parameter t [ 0 , 1 ] is introduced to trade off the relationship of the two terms in the formula:
min M Φ 0 ( 1 t ) δ R 2 M Φ , S Φ 1 + t δ R 2 M Φ , D Φ = min M Φ > 0 h t ( M Φ ) .
Because h t ( M Φ ) is still geodesically convex [17], Equation (12) has a unique positive solution, that is,
M Φ = S Φ 1 t D Φ .
Theorem 1.
The solution M Φ = S Φ 1 t D Φ of the objective function (5) can be rewritten as
M Φ = Q Φ K P Q T K P Q t 1 Q Φ T ,
where the kernel matrix K P Q = P Φ T Q Φ = K P 1 Q 1 K P 1 Q 2 K P 2 Q 1 + K P 2 Q 2 ,
P Φ = Φ x 11 Φ x 12 Φ x 1 n Φ x 21 Φ x 22 Φ x 2 n = P Φ 1 P Φ 2 , Q Φ = Φ x ¯ 11 Φ x ¯ 12 Φ x ¯ 1 m Φ x ¯ 21 Φ x ¯ 22 Φ x ¯ 2 m = Q Φ 1 Q Φ 2 , a n d
K P 1 Q 1 = P Φ 1 T Q Φ 1 = Φ x 11 , Φ x ¯ 11 Φ x 11 , Φ x ¯ 1 m Φ x 1 n , Φ x ¯ 11 Φ x 1 n , Φ x ¯ 1 m = k x 11 , x ¯ 11 k x 11 , x ¯ 1 m k x 1 n , x ¯ 11 k x 1 n , x ¯ 1 m K P 1 Q 2 = P Φ 1 T Q Φ 2 = Φ x 11 , Φ x ¯ 21 Φ x 11 , Φ x ¯ 2 m Φ x 1 n , Φ x ¯ 21 Φ x 1 n , Φ x ¯ 2 m = k x 11 , x ¯ 21 k x 11 , x ¯ 2 m k x 1 n , x ¯ 21 k x 1 n , x ¯ 2 m K P 2 Q 1 = P Φ 2 T Q Φ 1 = Φ x 21 , Φ x ¯ 11 Φ x 21 , Φ x ¯ 1 m Φ x 2 n , Φ x ¯ 11 Φ x 2 n , Φ x ¯ 1 m = k x 21 , x ¯ 11 k x 21 , x ¯ 1 m k x 2 n , x ¯ 11 k x 2 n , x ¯ 1 m K P 2 Q 2 = P Φ 2 T Q Φ 2 = Φ x 21 , Φ x ¯ 21 Φ x 21 , Φ x ¯ 2 m Φ x 2 n , Φ x ¯ 21 Φ x 2 n , Φ x ¯ 2 m = k x 21 , x ¯ 21 k x 21 , x ¯ 2 m k x 2 n , x ¯ 21 k x 2 n , x ¯ 2 m .
Proof. 
According to Lemma 1,
M Φ = S Φ 1 t D Φ = 2 s i n ( π t ) π S Φ 1 1 1 ( 1 s ) t ( 1 + s ) t 1 ( ( 1 s ) I + ( 1 + s ) D Φ 1 S Φ 1 ) 1 d s = 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 ( ( 1 s ) S Φ + ( 1 + s ) D Φ 1 ) 1 d s = 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 ( ( 1 s ) S Φ + ( 1 + s ) D Φ 1 ) 1 d s .
According to the definitions of P Φ and Q Φ , S Φ and D Φ can be written as S Φ = P Φ P Φ T , D Φ = Q Φ Q Φ T . Substituting them into Equation (14), it is clear that
M Φ = 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 ( ( 1 s ) P Φ P Φ T + ( 1 + s ) ( Q Φ Q Φ T ) 1 ) 1 d s .
For convenience in the following discussion, we introduce the notation
G = 1 s P Φ P Φ T + 1 + s Q Φ Q Φ T 1 1 .
From Lemma 2, it follows that
G = ( A + U C V ) 1 = A 1 A 1 U C 1 + V A 1 U 1 V A 1 ,
where A = 1 + s Q Φ Q Φ T 1 , U = P Φ , C = ( 1 s ) I , V = P Φ T . Then, taking it into Equation (15), one has
M Φ = 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 [ Q Φ Q Φ T 1 + s Q Φ Q Φ T 1 + s · P Φ I 1 1 s + P Φ T · Q Φ Q Φ T 1 + s · P Φ 1 P Φ T Q Φ Q Φ T 1 + s ] d s = 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 [ Q Φ Q Φ T 1 + s Q Φ 1 + s · K P Q T I 1 1 s + K P Q K P Q T 1 + s 1 K P Q Q Φ T 1 + s ] d s = 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 Q Φ [ I 1 + s K P Q T 1 + s I 1 1 s + K P Q K P Q T 1 + s 1 K P Q 1 + s ] Q Φ T d s = 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 Q Φ ( ( 1 + x ) I + ( 1 x ) K P Q T K P Q ) 1 Q Φ T d s = Q Φ 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 1 + s K P Q T K P Q 1 K P Q T K P Q + 1 s K P Q T K P Q 1 d s Q Φ T = Q Φ K P Q T K P Q 1 2 s i n ( π t ) π 1 1 ( 1 s ) t ( 1 + s ) t 1 1 s I + 1 + s K P Q T K P Q 1 1 d s Q Φ T = Q Φ [ K P Q T K P Q 1 t I ] Q Φ T .
From Lemma 3, it follows that
M Φ = Q Φ K P Q T K P Q 1 I · K P Q T K P Q 1 t Q Φ T = Q Φ K P Q T K P Q t 1 Q Φ T .
   □
Theorem 2.
The distance d M Φ ( x , x ) = Φ x Φ x T M Φ Φ x Φ x of any two sample x and x can be rewritten as
d M Φ ( x , x ) = ( K x Q K x Q ) K P Q T K P Q t 1 ( K x Q T K x Q T ) .
where K x Q = Φ x T Q Φ = K x Q 1 K x Q 2 , K x Q = Φ x T Q Φ = K x Q 1 K x Q 2 , and
K x Q 1 = Φ x T Q Φ 1 = Φ x 1 , Φ x ¯ 11 Φ x 1 , Φ x ¯ 1 n Φ x n , Φ x ¯ 11 Φ x n , Φ x ¯ 1 n = k x 1 , x ¯ 11 k x 1 , x ¯ 1 n k x n , x ¯ 11 k x n , x ¯ 1 n K x Q 2 = Φ x T Q Φ 2 = Φ x 1 , Φ x ¯ 21 Φ x 1 , Φ x ¯ 2 n Φ x n , Φ x ¯ 21 Φ x n , Φ x ¯ 2 n = k x 1 , x ¯ 21 k x 1 , x ¯ 2 n k x n , x ¯ 21 k x n , x ¯ 2 n K x Q 1 = Φ x T Q Φ 1 = Φ x 1 , Φ x ¯ 11 Φ x 1 , Φ x ¯ 1 n Φ x n , Φ x ¯ 11 Φ x n , Φ x ¯ 1 n = k x 1 , x ¯ 11 k x 1 , x ¯ 1 n k x n , x ¯ 11 k x n , x ¯ 1 n K x Q 2 = Φ x T Q Φ 2 = Φ x 1 , Φ x ¯ 21 Φ x 1 , Φ x ¯ 2 n Φ x n , Φ x ¯ 21 Φ x n , Φ x ¯ 2 n = k x 1 , x ¯ 21 k x 1 , x ¯ 2 n k x n , x ¯ 21 k x n , x ¯ 2 n .
Proof. 
According to Theorem 1,
d M Φ ( x , x ) = Φ x Φ x T Q Φ K P Q T K P Q t 1 Q Φ T Φ x Φ x = Φ x Φ x T Q Φ 1 Q Φ 2 K P Q T K P Q t 1 Q Φ 1 Q Φ 2 T Φ x Φ x = ( K x Q 1 K x Q 2 K x Q 1 + K x Q 2 ) K P Q T K P Q t 1 ( K x Q 1 T K x Q 2 T K x Q 1 T + K x Q 2 T ) = ( K x Q K x Q ) K P Q T K P Q t 1 ( K x Q T K x Q T )
   □
Equation (19) can be taken as the final distance metric of the kernel geometric mean metric learning algorithm. The algorithm is summarized in Algorithm 1.
Algorithm 1 Kernel geometric mean metric learning algorithm.
Input: Similar pair sample set D + , dissimilar pair sample set D , and two samples x, x .
Parameter: t: t [ 0 , 1 ] , the weight coefficient in Equation (13), p: kernel parameters.
Output: d M Φ ( x , x ) the distance learned for KGMML,
Step 1. Compute kernel matrices K P 1 Q 1 , K P 1 Q 2 , K P 2 Q 1 , K P 2 Q 2 , and K P Q according to Theorem 1.
Step 2. Compute kernel matrices K x Q and K x Q according to Theorem 2.
Step 3. Compute the Schur (spectral) decomposition of K P Q T K P Q t 1 = U Λ t 1 U = A , where U denotes the eigenvalue of K P Q T K P Q , and Λ denotes the eigenvector of K P Q T K P Q .
Step 4. Compute d M Φ ( x , x ) = ( K x Q K x Q ) A ( K x Q T K x Q T ) .

4. Experiment

4.1. Experimental Setup

To verify the effectiveness of Algorithm 1, simulation experiments were conducted on 15 UCI [27] datasets, where the basic information is shown in Table 1. We divided the dataset into large and small datasets according to the size of the sample size (1–11 were small datasets, and 12–15 were large datasets; a sample size greater than 3000 is considered a large dataset, and a sample size of less than 3000 is considered a small dataset).
Next, some efficient algorithms are described for distance metric learning, and the proposed method is compared with existing excellent classical algorithms. The specific introduction is given in Table 2. These algorithms are all from the original author’s homepage. The kernel function used in the experiment is the Gaussian kernel function. For the KGMML, the settings of t and p are given in detail in the next subsection. In this paper, 5-fold cross-validation is repeated 10 times to measure the average accuracy. Moreover, as a preprocessing step, each feature of the experimental dataset is scaled to the one with zero mean and unit variance.

4.2. Parameter Sensitivity Analysis

To determine the impact of these two parameters on the KGMML algorithm, the experiments are conducted on five datasets selected from 15 datasets. Five-fold cross validation is used to choose the best t-value, and the two-step method is used to test different t-values. The above approach is used to find the best t-value, and its precision can be verified in Figure 1 and Figure 2. Firstly, obtain the optimal t in the set 0.1 , 0.3 , 0.5 , 0.7 , 0.9 , and the result is shown in Figure 1. Secondly, test 5 t-values using intervals with a step size of 0.02. As shown in Figure 2, the variation of accuracy in the test interval with a step size of 0.02 is not significant, so we choose the middle t-value of 0.05. When the p-value is 10, the precision has an inflection point in Figure 3, and the precision is higher at the inflection point. Thus, the p-value is chosen as 10. Vary t in the set 0.1 , 0.3 , 0.5 , 0.7 , 0.9 , and p is fixed.

4.3. Experimental Results

The accuracy of each compared algorithm on 15 datasets is shown in Table 3, where the best result on each dataset is shown in boldface. Compared with the other six algorithms, our algorithm achieves the highest accuracy on the eight datasets of pima, vehicle, german, segment, usps, mnist, lymphgraphy and spambase. From the experimental results, we conclude that our proposed KGMML algorithm is more accurate than the GMML algorithm on all 15 datasets. The accuracy of the KGMML algorithm can be improved by roughly 2% to 4% over the GMML algorithm in small datasets, and the improvement is generally smaller for large datasets. In the small dataset, the Hayes-Roth dataset improves accuracy by 18.17%; in the large dataset, the Spambase dataset improves accuracy by 8.13%. The analysis reveals that the accuracy improvement is greater when the number of features is relatively small in both large and small datasets. Table 4 lists the average running time of the GMML algorithm and the KGMML algorithm on the dataset, which shows that the proposed algorithm does not drag down the running efficiency of the original algorithm. All experimental methods are implemented on MATLABR2018b (64-bit), and the simulations are run on a laptop with an Intel Core i5 (2.5 GHz) processor.
To show the performance advantages of the KGMML algorithm, a score statistic is performed on the KGMML algorithm and other classic algorithms. The scoring process is as follows. Assuming that A is the result of using the KGMML algorithm on a certain dataset, and B is the result of uniting the KGMML algorithm on this dataset (that is, using other algorithms), firstly, comparative analysis of A and B is performed by using a 5% significance level t test. In a statistical sense, if A > B , it is considered that the result A obtained by using the KGMML algorithm on this dataset wins the result B using other algorithms. Thus, the statistical result of the score on this dataset is recorded as “1/0/0”. If A < B , the score statistics result is recorded as “0/0/1”. If A = B , it means that A and B are the same in the statistical sense. Thus, it is considered that they are tied and recorded as “0/1/0”. It is evident that the notation “5/0/0” signifies that the outcomes achieved by the KGMML algorithm outperform those of other algorithms across five datasets. On each dataset and the overall score, the statistical results of the KGMML algorithm are listed in Table 4.

5. Conclusions

In this paper, kernel geometric mean metric learning is proposed for the nonlinear distance metric with the introduction of a kernel function. Traditional metric learning methods aim to learn global linear metrics; for most data, there are nonlinear relationships, so traditional methods are not suitable for nonlinear problems. The experimental results on the UCI dataset show that the algorithm can effectively improve the accuracy of the GMML algorithm. Especially for small datasets, the highest can reach more than 40%, and the proposed algorithm can address nonlinear problems.
In practice, only a small amount of data is tagged, while most of the data remains untagged. Therefore, the question of how to make the best use of this large amount of untagged data prompted us to think. Under the partial marker learning framework, the real class markers of the training samples are usually hidden in a set of candidate markers and are no longer explicit, which makes the construction of learning algorithms more difficult than the traditional classification problems, and the research results are still less available. In future work, we will try to improve the inaccurate similarity pair problem existing in the kernel-based geometric mean metric learning algorithm. Using the partial labeling metric learning algorithm proposed in recent years, a partial labeling algorithm based on kernel geometric mean metric learning will be proposed.

Author Contributions

Conceptualization, J.H.; methodology, J.H.; software, T.Y.; validation, T.Y.; formal analysis, Z.F.; investigation, Y.Z.; resources, Y.Z.; data curation, T.Y.; writing—original draft preparation, Z.F.; writing—review and editing, Z.F.; visualization, Y.Z.; supervision, Y.Z.; project administration, R.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (62102062), the Humanities and Social Science Research Project of Ministry of Education(21YJCZH037), the Natural Science Foundation of Liaoning Province (2020-MS-134, 2020-MZLH-29).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors would like to thank the School of Information and Communication Engineering, Dalian Minzu University for assistance with simulation verifications related to this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lu, J.; Wang, R.; Mian, A.; Ajay, K.; Sudeep, S. Distance metric learning for pattern recognition. Pattern Recognit. 2018, 75, 1–3. [Google Scholar] [CrossRef]
  2. Wei, Z.; Cui, Y.; Zhou, X.; Yang, W.; Li, Y.; Yi, X.; Dai, H. A research on metric learning in computer vision and pattern recognition. In Proceedings of the 2018 Tenth International Conference on Advanced Computational Intelligence, Xiamen, China, 29–31 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 254–259. [Google Scholar]
  3. Yan, Y.; Xia, J.; Sun, D.; Hu, Q. Research on combination evaluation of operational stability of energy industry innovation ecosystem based on machine learning and data mining algorithms. Energy Rep. 2022, 8, 4641–4648. [Google Scholar] [CrossRef]
  4. Wang, F.; Sun, J. Survey on distance metric learning and dimensionality reduction in data mining. Data Min. Knowl. Discov. 2015, 29, 534–564. [Google Scholar] [CrossRef]
  5. Yan, M.; Zhang, Y.; Wang, H. Tree-Based Metric Learning for Distance Computation in Data Mining. In Asia-Pacific Web Conference, Proceedings of the 17th Asia-Pacific Web Conference, APWeb 2015, Guangzhou, China, 18–20 September 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 377–388. [Google Scholar]
  6. Mojisola, F.O.; Misra, S.; Febisola, C.F.; Abayomi-Alli, O.; Sengul, G. An improved random bit-stuffing technique with a modified RSA algorithm for resisting attacks in information security. Egypt. Inform. J. 2022, 23, 291–301. [Google Scholar] [CrossRef]
  7. Kraeva, I.; Yakhyaeva, G. Application of the metric learning for security incident playbook recommendation. In Proceedings of the 2021 IEEE 22nd International Conference of Young Professionals in Electron Devices and Materials, Souzga, Russia, 30 June–4 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 475–479. [Google Scholar]
  8. Bennett, J.; Pomaznoy, M.; Singhania, A.; Peters, B. A metric for evaluating biological information in gene sets and its application to identify co-expressed gene clusters in PBMC. PLoS Comput. Biol. 2021, 17, e1009459. [Google Scholar] [CrossRef]
  9. Makrodimitris, S.; Reinders, M.J.T.; Van Ham, R.C.H.J. Metric learning on expression data for gene function prediction. Bioinformatics 2020, 36, 1182–1190. [Google Scholar] [CrossRef] [PubMed]
  10. Yuan, T.; Dong, L.; Liu, B.; Huang, J.; Xiao, C. Deep Metric Learning by Exploring Confusing Triplet Embeddings for COVID-19 Medical Images Diagnosis. In Proceedings of the Workshop on Healthcare AI and COVID-19, Baltimore, MA, USA, 22 July 2022; pp. 1–10. [Google Scholar]
  11. Jin, Y.; Lu, H.; Li, Z.; Wang, Y. A cross-modal deep metric learning model for disease diagnosis based on chest X-ray images. Multimed. Tools Appl. 2023, 82, 33421–33442. [Google Scholar] [CrossRef] [PubMed]
  12. Xing, Y.; Meyer, B.J.; Harandi, M.; Drummond, T.; Ge, Z. Multimorbidity Content-Based Medical Image Retrieval and Disease Recognition Using Multi-label Proxy Metric Learning. IEEE Access 2023, 11, 50165–50179. [Google Scholar] [CrossRef]
  13. Xing, E.; Ng, A.Y.; Jordan, M.; Russell, S.J. Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 2002, 15, 1–8. [Google Scholar]
  14. Davis, J.V.; Kulis, B.; Jain, P.; Dhillon, I.S. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 209–216. [Google Scholar]
  15. Wang, S.; Jin, R. An information geometry approach for distance metric learning. In Proceedings of the Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18 April 2009; 2009; pp. 591–598. [Google Scholar]
  16. Weinberger, K.Q.; Saul, L.K. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 2009, 10, 207–244. [Google Scholar]
  17. Zadeh, P.; Hosseini, R.; Sra, S. Geometric mean metric learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 2464–2471. [Google Scholar]
  18. Zhou, Y.; Gu, H. Geometric mean metric learning for partial label data. Neurocomputing 2018, 275, 394–402. [Google Scholar] [CrossRef]
  19. Mika, S.; Ratsch, G.; Weston, J.; Scholkopf, B.; Mullers, K.R. Fisher discriminant analysis with kernels. In Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (cat. no. 98th8468), Madison, WI, USA, 25 August 1999; IEEE: New York, NY, USA, 1999; pp. 41–48. [Google Scholar]
  20. Li, Z.; Kruger, U.; Xie, L.; Almansoori, A.; Su, H. Adaptive KPCA modeling of nonlinear systems. IEEE Trans. Signal Process. 2015, 63, 2364–2376. [Google Scholar] [CrossRef]
  21. Lee, J.M.; Qin, S.J.; Lee, I.B. Fault detection of non-linear processes using kernel independent component analysis. Can. J. Chem. Eng. 2007, 85, 526–536. [Google Scholar] [CrossRef]
  22. Zhang, L.; Zhou, W.D.; Jiao, L.C. Kernel clustering algorithm. Chin. J. Comput.-Chin. Ed. 2002, 25, 587–590. [Google Scholar]
  23. Choi, H.; Choi, S. Kernel isomap. Electron. Lett. 2004, 40, 1612–1613. [Google Scholar] [CrossRef]
  24. Fasi, M.; Iannazzo, B. Computing the weighted geometric mean of two large-scale matrices and its inverse times a vector. SIAM J. Matrix Anal. Appl. 2018, 39, 178–203. [Google Scholar] [CrossRef]
  25. Higham, N. Accuracy and Stability of Numerical Algorithms. SIAM 2002, 258. Available online: http://en.wikipedia.org/wiki/Woodburymatrixidentity (accessed on 8 October 2023).
  26. Bhatia, R. Positive Definite Matrices; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
  27. Asuncion, A.; Newman, D. UCI Machine Learning Repository. 2007. Available online: https://archive.ics.uci.edu/ (accessed on 8 October 2023).
  28. Nguyen, B.; Morell, C.; De Baets, B. Supervised distance metric learning through maximization of the Jeffrey divergence. Pattern Recognit. 2017, 64, 215–225. [Google Scholar] [CrossRef]
  29. Kedem, D.; Tyree, S.; Sha, F.; Lanckriet, G.; Weinberger, K.Q. Non-linear metric learning. Adv. Neural Inf. Process. Syst. 2012, 25, 2582–2590. [Google Scholar]
  30. Bhutani, M.; Jawanpuria, P.; Kasai, H.; Mishra, B. Low-rank geometric mean metric learning. arXiv 2018, arXiv:1806.05454. [Google Scholar]
Figure 1. Varying t (p is fixed).
Figure 1. Varying t (p is fixed).
Applsci 13 12047 g001
Figure 2. Varying t (p is fixed).
Figure 2. Varying t (p is fixed).
Applsci 13 12047 g002
Figure 3. Varying p (t is fixed).
Figure 3. Varying p (t is fixed).
Applsci 13 12047 g003
Table 1. Characteristics of experimental datasets.
Table 1. Characteristics of experimental datasets.
DatasetsOf FeaturesOf InstancesOf Classes
1Pima87682
2Vehicle188464
3German2410002
4Segment1823107
5Heart-Disease132702
6Lymphgraphy181484
7Liver-Disorders63452
8Hayes-Roth41603
9Ionosphere343512
10Glasses92146
11Balance-Scale46253
12Usps256929810
13Mnist784400010
14DNA18031862
15Spambase5746012
Table 2. Brief description of the distance metric learning method used in this paper.
Table 2. Brief description of the distance metric learning method used in this paper.
NameDescription
1EuclideanThe Euclidean distance metric [28].
2DMLMJDistance metric learning through maximization of the Jeffrey divergence [28].
3LMNNLarge margin nearest neighbor classification [16].
4GB-LMNNNon-linear transformations with gradient boosting [29].
5GMMLGeometric mean metric learning [17].
6Low-rankLow-rank geometric mean metric learning [30].
7KGMMLThe kernelized version of GMML
Table 3. Classification errors on UCI dataset. The best result is highlighted in boldface.
Table 3. Classification errors on UCI dataset. The best result is highlighted in boldface.
DatasetsGMMLDMLMJLMNNGB-LMNNEuclideanLow-RankKGMML
1Pima27.6630.1833.8237.1427.2729.5825.17
2Vehicle22.0925.7546.5341.2433.5341.3221.21
3German27.4124.7930.5029.3231.5326.9624.40
4Segment4.133.645.194.556.935.593.22
5Heart-Disease20.8219.7331.5218.5133.3322.5218.97
6Lymphography56.8873.6270.7660.2175.1357.5753.75
7Liver-Disorders35.0030.0530.4634.8831.8841.8630.17
8Hayes-Roth37.6916.8431.3331.3516.6739.5519.52
9Ionosphere15.3411.275.714.291.4317.2611.54
10Glasses36.9633.0830.2023.3030.2333.6432.47
11Balance-Scale12.848.6212.8015.2014.4012.319.19
12Usps3.722.8833.1110.6010.554.082.82
13Mnist9.6516.4486.8082.6217.1275.348.32
14DNA23.6521.7522.3222.8427.6326.2423.05
15Spambase19.3318.2738.6015.8016.0911.4311.20
Table 4. Win/tie/loss counts of each algorithm between the results obtained the KGMML algorithm and other classical algorithms.
Table 4. Win/tie/loss counts of each algorithm between the results obtained the KGMML algorithm and other classical algorithms.
DatasetsKGMML
GMMLDMLMJLMNNGB-LMNNEuclideanLow-Rank
Pima1/0/01/0/01/0/01/0/01/0/01/0/0
Vehicle0/1/01/0/01/0/01/0/01/0/01/0/0
German1/0/00/1/01/0/01/0/01/0/01/0/0
Segment1/0/00/1/01/0/01/0/01/0/01/0/0
Usps1/0/00/1/01/0/01/0/01/0/01/0/0
Mnist0/1/01/0/01/0/01/0/01/0/01/0/0
Glasses1/0/00/1/00/0/10/0/10/1/01/0/0
DNA0/1/00/0/10/0/10/1/01/0/01/0/0
Heart-Disease1/0/00/1/01/0/00/1/01/0/01/0/0
Lymphography1/0/01/0/01/0/01/0/01/0/01/0/0
Liver-Disorders1/0/00/1/00/1/01/0/00/1/01/0/0
Hayes-Roth1/0/00/0/11/0/01/0/00/0/11/0/0
Ionosphere1/0/00/1/00/0/10/0/10/0/11/0/0
Spambase1/0/01/0/01/0/01/0/01/0/00/1/0
Balance1/0/00/1/01/0/01/0/01/0/01/0/0
Total12/3/05/8/211/1/39/2/211/2/214/1/0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, Z.; Yun, T.; Zhou, Y.; Zheng, R.; He, J. Kernel Geometric Mean Metric Learning. Appl. Sci. 2023, 13, 12047. https://0-doi-org.brum.beds.ac.uk/10.3390/app132112047

AMA Style

Feng Z, Yun T, Zhou Y, Zheng R, He J. Kernel Geometric Mean Metric Learning. Applied Sciences. 2023; 13(21):12047. https://0-doi-org.brum.beds.ac.uk/10.3390/app132112047

Chicago/Turabian Style

Feng, Zixin, Teligeng Yun, Yu Zhou, Ruirui Zheng, and Jianjun He. 2023. "Kernel Geometric Mean Metric Learning" Applied Sciences 13, no. 21: 12047. https://0-doi-org.brum.beds.ac.uk/10.3390/app132112047

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop