A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms

Shang, Fanhua; Liu, Yuanyuan; Shang, Fanjie; Liu, Hongying; Kong, Lin; Jiao, Licheng

doi:10.3390/math8081325

Open AccessArticle

A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms

by

Fanhua Shang

^1,2,*

,

Yuanyuan Liu

¹,

Fanjie Shang

¹,

Hongying Liu

¹

,

Lin Kong

¹ and

Licheng Jiao

¹

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an 710071, China

²

Peng Cheng Laboratory, Shenzhen 518066, China

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(8), 1325; https://0-doi-org.brum.beds.ac.uk/10.3390/math8081325

Submission received: 7 July 2020 / Revised: 5 August 2020 / Accepted: 5 August 2020 / Published: 10 August 2020

(This article belongs to the Section Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

The Schatten quasi-norm is an approximation of the rank, which is tighter than the nuclear norm. However, most Schatten quasi-norm minimization (SQNM) algorithms suffer from high computational cost to compute the singular value decomposition (SVD) of large matrices at each iteration. In this paper, we prove that for any p,

p_{1}

,

p_{2} > 0

satisfying

1 / p = 1 / p_{1} + 1 / p_{2}

, the Schatten p-(quasi-)norm of any matrix is equivalent to minimizing the product of the Schatten

p_{1}

-(quasi-)norm and Schatten

p_{2}

-(quasi-)norm of its two much smaller factor matrices. Then, we present and prove the equivalence between the product and its weighted sum formulations for two cases:

p_{1} = p_{2}

and

p_{1} \neq p_{2}

. In particular, when

p > 1 / 2

, there is an equivalence between the Schatten p-quasi-norm of any matrix and the Schatten

2 p

-norms of its two factor matrices. We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can see that for any

0 < p < 1

, the Schatten p-quasi-norm of any matrix is the minimization of the mean of the Schatten

(⌊ 1 / p ⌋ + 1) p

-norms of

⌊ 1 / p ⌋ + 1

factor matrices, where

⌊ 1 / p ⌋

denotes the largest integer not exceeding

1 / p

.

Keywords:

Schatten quasi-norm; nuclear norm; rank function; factor matrix; equivalent formulations

1. Introduction

The affine rank minimization problem arises directly in various areas of science and engineering, including statistics, machine learning, information theory, data mining, medical imaging, and computer vision. Some representative applications include matrix completion [1], robust principal component analysis (RPCA) [2], low-rank representation [3], multivariate regression [4], multi-task learning [5], and system identification [6]. To efficiently solve such problems, we mainly relax the rank function to its tractable convex envelope, that is, the nuclear norm

{∥ \cdot ∥}_{*}

(sum of the singular values, also known as the trace norm or Schatten 1-norm), which leads to a convex optimization problem [1,7,8,9]. In fact, the nuclear norm of a matrix is the

ℓ_{1}

-norm of the vector of its singular values, and thus it can motivate a low-rank solution. However, it has been shown in [10,11] that the

ℓ_{1}

-norm over-penalizes large entries of vectors and therefore results in a solution from a possibly biased solution space. Recall from the relationship between the

ℓ_{1}

-norm and nuclear norm that the nuclear norm penalty shrinks all singular values equally, which also leads to the over-penalization of large singular values [12]. That is, the nuclear norm may make the solution deviate from the original solution as the

ℓ_{1}

-norm does. Unlike the nuclear norm, the Schatten p-quasi-norm with

0 < p < 1

is non-convex, and it can give a closer approximation to the rank function. Thus, the Schatten quasi-norm minimization (SQNM) has received a significant amount of attention from researchers in various communities, such as image recovery [13], collaborative filtering [14,15], and MRI analysis [16].

Recently, several iteratively re-weighted least squares (IRLS) algorithms as in [17,18] have been proposed to approximately solve associated Schatten quasi-norm minimization problems. Lu et al. [13] proposed a family of iteratively re-weighted nuclear norm (IRNN) algorithms to solve various non-convex surrogate (including the Schatten quasi-norm) minimization problems. In [13,14,19,20], the Schatten quasi-norm has been shown to be empirically superior to the nuclear norm for many real-world problems. Moreover, [21] theoretically proved that the SQNM requires significantly fewer measurements than nuclear norm minimization as in [1,9]. However, all the algorithms have to be solved iteratively and involve singular value decomposition (SVD) in each iteration. Thus, they suffer from a high computational complexity of

O (min (m, n) m n)

and are not applicable for large-scale matrix recovery problems [22].

It is well known that the nuclear norm has a scalable equivalent formulation (also called the bilinear spectral penalty [9,23,24]), which has been successfully applied in many applications, such as collaborative filtering [15,25,26]. Zuo et al. [27] proposed a generalized shrinkage-thresholding operator to iteratively solve

ℓ_{p}

quasi-norm minimization with

0 \leq p < 1

. Since the Schatten p-quasi-norm of one matrix is equivalent to the

ℓ_{p}

quasi-norm on its singular values, we may naturally ask the following question: can we design a unified scalable equivalent formulation to the Schatten p-quasi-norm with arbitrary p values (i.e.,

0 < p < 1

)?

In this paper (note that the first version [28] of this paper appeared on ArXiv, and in the current version we have included experimental results and fixed a few typos.), we first present and prove the equivalence relation on the Schatten p-(quasi-)norm of any matrix and minimization of the product of the Schatten

p_{1}

-(quasi-)norm and Schatten

p_{2}

-(quasi-)norm of its two smaller factor matrices, for any p,

p_{1}

,

p_{2} > 0

satisfying

1 / p = 1 / p_{1} + 1 / p_{2}

. Moreover, we prove the equivalence between the product formulation of the Schatten (quasi-)norm and its weighted sum formulation for the two cases of

p_{1}

and

p_{2}

:

p_{1} = p_{2}

and

p_{1} \neq p_{2}

. When

p > 1 / 2

and by setting the same value for

p_{1}

and

p_{2}

, there is an equivalence between the Schatten p-(quasi-)norm of any matrix and the Schatten

2 p

-norms of its two factor matrices, where a representative example is the equivalent formulation of the nuclear norm, i.e.,

{∥ X ∥}_{*} = min_{X = U V^{T}} \frac{{∥ U ∥}_{F}^{2} + {∥ V ∥}_{F}^{2}}{2} .

That is, the SQNM problems with

p > 1 / 2

can be transformed into problems only involving the smooth convex norms of two factor matrices, which can lead to simpler and more efficient algorithms (e.g., [12,22,29,30]). Clearly, the resulting algorithms can reduce the per-iteration cost from

O (min (m, n) m n)

to

O (m n d)

, where

d ≪ m, n

in general.

We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can know that for any

0 < p < 1

, the Schatten p-quasi-norm of any matrix is equivalent to the minimization of the mean of the Schatten

(⌊ 1 / p ⌋ + 1) p

-norms of

⌊ 1 / p ⌋ + 1

factor matrices, where

⌊ 1 / p ⌋

denotes the largest integer not exceeding

1 / p

. Note that the norms of all the smaller factor matrices are convex and smooth. Besides the theoretical results, we also present several representative examples for two and three factor matrices. In fact, the bi-nuclear and Frobenius/nuclear quasi-norms defined in our previous work [22] and the tri-nuclear quasi-norm defined in our previous work [29] are three important special cases of our unified scalable formulations for Schatten quasi-norms.

2. Notations and Background

In this section, we present the definition of the Schatten p-norm and discuss some related work about Schatten p-norm minimization. We first briefly recall the definitions of both the norm and the quasi-norm.

Definition 1.

A quasi-norm

∥ \cdot ∥

on a vector space X is a real-valued map

X \to [0, \infty)

with the following conditions:

$∥ x ∥ \geq 0$ for any vector $x \in X$ , and $∥ x ∥ = 0$ if and only if $x = 0$ .
$∥ α x ∥ = | α | ∥ x ∥$ if $α \in ℝ$ or $ℂ$ , and $x \in X$ .
There is a constant $c \geq 1$ so that if $x_{1}, x_{2} \in X$ we have $∥ x_{1} + x_{2} ∥ \leq c (∥ x_{1} ∥ + ∥ x_{2} ∥)$ (triangle inequality).

If the triangle inequality is replaced by

∥ x_{1} + x_{2} ∥ \leq ∥ x_{1} ∥ + ∥ x_{2} ∥

, then it becomes a norm.

Definition 2.

The Schatten p-norm (

0 < p < \infty

) of a matrix

X \in R^{m \times n}

(without loss of generality, we assume that

m \geq n

) is defined as

{∥ X ∥}_{S_{p}} = {(\sum_{i = 1}^{n} σ_{i}^{p} (X))}^{1 / p},

(1)

where

σ_{i} (X)

denotes the i-th singular value of X.

When

p \geq 1

, Definition 2 defines a natural norm, whereas it defines a quasi-norm for

0 < p < 1

. For instance, the Schatten 1-norm is the so-called nuclear norm,

{∥ X ∥}_{*}

, and the Schatten 2-norm is the well-known Frobenius norm,

{∥ X ∥}_{F} = \sqrt{\sum_{i = 1}^{m} \sum_{j = 1}^{n} x_{i j}^{2}}

. As the non-convex surrogate for the rank function, the Schatten p-quasi-norm is the better approximation than the nuclear norm [21], analogous to the superiority of the

ℓ_{p}

quasi-norm to the

ℓ_{1}

-norm [18,31].

To recover a low-rank matrix from a small set of linear observations,

b \in R^{l}

, the general SQNM problem is formulated as follows:

min_{X \in R^{m \times n}} {∥ X ∥}_{S_{p}}^{p}, subject to A (X) = b,

(2)

where

A : R^{m \times n} \to R^{l}

is a general linear operator. Alternatively, the Lagrangian version of (2) is

min_{X \in R^{m \times n}} \{{λ ∥ X ∥}_{S_{p}}^{p} + f (A (X) - b)\},

(3)

where

λ > 0

is a regularization parameter, and

f (\cdot) : R^{l} \to R

denotes a certain measurement for characterizing the loss

(A (X) - b)

. For instance,

A

is the projection operator

P_{Ω}

, and

f (\cdot) = {∥ \cdot ∥}_{2}^{2}

in matrix completion [13,17,20,32], where

P_{Ω}

is the orthogonal projection onto the linear subspace of matrices supported on

Ω : = {(i, j) | D_{i j} is observed}

, that is,

P_{Ω} {(D)}_{i j} = D_{i j}

if

(i, j) \in Ω

, and

P_{Ω} {(D)}_{i j} = 0

otherwise. For RPCA problems [2,33,34,35,36],

A

is an identity operator and

f (\cdot) = {∥ \cdot ∥}_{1}

. In the multivariate regression problem in [37],

A (X) = A X

with A being a given matrix, and

f (\cdot) = {∥ \cdot ∥}_{F}^{2}

. In addition,

f (\cdot)

may be chosen as the Hinge loss in [23] or the

ℓ_{p}

quasi-norm in [14].

Generally, SQNM problems (e.g., (2) and (3)) are non-convex, non-smooth, and even non-Lipschitz [38]. So far, only a few algorithms (e.g., IRLS [17,18] and IRNN [13]) have been developed to solve such challenging problems. However, since the SQNM algorithms involve SVD of large matrices in each iteration, they suffer from a high computational cost of

O (m n^{2})

, which severely limits their applicability to large-scale problems. While there have been many efforts towards fast SVD computation such as partial SVD [39], the performances of these methods are still unsatisfactory for many real applications [40]. As an alternative to reduce the computational complexity of SVD on a large matrix, one can factorize X into two smaller factor matrices (i.e.,

X = U V^{T}

). According to the unitary invariant property of norms, (2) and (3) can be reformulated into two much smaller matrix optimization problems as in [23,41,42,43,44], which are still non-convex and non-smooth. Therefore, it is very important to determine how to transform challenging problems such as (2) and (3) into more tractable ones, which can be solved by simpler and more efficient algorithms as in [30,43].

3. A Unified Formulation for Schatten Quasi-Norms

In this section, we first present and prove an equivalence relation on the Schatten p-(quasi-)norm of any matrix and the Schatten

p_{1}

-(quasi-)norm and Schatten

p_{2}

-(quasi-)norm of its two factor matrices, where

1 / p = 1 / p_{1} + 1 / p_{2}

with any

p_{1} > 0

and

p_{2} > 0

. Moreover, we prove the equivalence between the product formulation of the Schatten (quasi-)norm and its weighted sum formulation for the two cases:

p_{1} = p_{2}

and

p_{1} \neq p_{2}

, respectively. For any

1 / 2 < p \leq 1

, the Schatten p-(quasi-)norm of any matrix is equivalent to minimizing the mean of the Schatten

2 p

-norms of both factor matrices, which can lead to simpler and more efficient algorithms. Finally, we extend the theoretical results of two factor matrices to the cases of three and more factor matrices. One can see that for any

0 < p < 1

, the Schatten p-(quasi-)norm of any matrix is the minimization of the mean of the Schatten

(⌊ 1 / p ⌋ + 1) p

-norms of

⌊ 1 / p ⌋ + 1

factor matrices.

3.1. Unified Schatten Quasi-Norm Formulations of Two Factor Matrices

In this subsection, we present some unified Schatten quasi-norm scalable formulations for the case of two factor matrices.

Theorem 1.

Each matrix

X \in R^{m \times n}

with

r a n k (X) = r \leq d

can be decomposed into the product of two much smaller matrices

U \in R^{m \times d}

and

V \in R^{n \times d}

, i.e.,

X = U V^{T}

. For any

0 < p \leq 1

,

p_{1} > 0

and

p_{2} > 0

satisfying

1 / p_{1} + 1 / p_{2} = 1 / p

, we have

{∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} .

(4)

The proof of Theorem 1 is provided in Section 5.1. From Theorem 1, it is clear that for any

0 < p \leq 1

and

p_{1}, p_{2} > 0

satisfying

1 / p = 1 / p_{1} + 1 / p_{2}

, the Schatten p-(quasi-)norm of any matrix X is equivalent to minimizing the product of the Schatten

p_{1}

-(quasi-)norm and Schatten

p_{2}

-(quasi-)norm of its two factor matrices.

In fact, we can find that

p_{1}

and

p_{2}

may have the same value (i.e.,

p_{1} = p_{2} = 2 p

) or different values (i.e.,

p_{1} \neq p_{2}

). Next, we discuss the two cases for

p_{1}

and

p_{2}

, that is,

p_{1} = p_{2}

and

p_{1} \neq p_{2}

.

3.1.1. The Case of $p_{1} = p_{2}$

We first discuss the case when

p_{1} = p_{2}

. In fact, for any given

0 < p \leq 1

, there exist infinitely many pairs of positive numbers

p_{1}

and

p_{2}

satisfying

1 / p_{1} + 1 / p_{2} = 1 / p

, such that the equality (4) holds. By setting the same value for

p_{1}

and

p_{2}

(i.e.,

p_{1} = p_{2} = 2 p

), we give a unified scalable equivalent formulation for the Schatten p-(quasi-)norm as follows.

Theorem 2.

Given any matrix

X \in R^{m \times n}

of

r a n k (X) = r \leq d

, then the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{2 p}} {∥ V ∥}_{S_{2 p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {(\frac{{∥ U ∥}_{S_{2 p}}^{2 p} + {∥ V ∥}_{S_{2 p}}^{2 p}}{2})}^{1 / p} . \end{matrix}

(5)

Remark 1.

The proof of Theorem 2 is provided in Section 5.2. From the equality in (5), we know that for any

0 < p \leq 1

, the Schatten p-(quasi-)norm minimization problems in many low-rank matrix completion and recovery applications can be transformed into the problems of minimizing the mean of the Schatten

2 p

-(quasi-)norms of two smaller factor matrices. It should be noted that when

1 / 2 < p \leq 1

, the norms of two smaller factor matrices are convex and smooth (see Example 2 below) due to

2 p > 1

, which can lead to simpler and more efficient algorithms as in [43].

When

p = 1

and

p_{1} = p_{2} = 2

, the equalities in Theorem 2 become the following form [23].

Corollary 1.

Given any matrix

X \in R^{m \times n}

with

r a n k (X) = r \leq d

, the following equalities hold:

\begin{matrix} {∥ X ∥}_{*} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{F} {∥ V ∥}_{F} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} \frac{1}{2} {(∥ U ∥}_{F}^{2} + {∥ V ∥}_{F}^{2}) . \end{matrix}

(6)

The bilinear spectral penalty in (6) has been widely used in many low-rank matrix completion and recovery problems, such as collaborative filtering [9,23], RPCA [45], online RPCA [46], and image recovery [47]. Note that the well-known equivalent formulation of the nuclear norm in Corollary 1 is just a special case of Theorem 2 when

p = 1

and

p_{1} = p_{2} = 2

. Moreover, we give two more representative examples for the case of

p_{1} = p_{2}

.

Example 1.

When

p = 1 / 2

, and by setting

p_{1} = p_{2} = 1

and using Theorem 1, we have the following property, as in our previous work [22,29].

Property 1.

For any matrix

X \in R^{m \times n}

with

r a n k (X) = r \leq d

, the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{1 / 2}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{*} {∥ V ∥}_{*} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {(\frac{{∥ U ∥}_{*} + {∥ V ∥}_{*}}{2})}^{2} . \end{matrix}

(7)

In [22,29], the scalable formulations in the above equalities are known as the bi-nuclear quasi-norm. In other words, the bi-nuclear quasi-norm is also a special case of Theorem 2 when

p = 1 / 2

and

p_{1} = p_{2} = 1

.

Example 2.

When

p = 2 / 3

, and by setting

p_{1} = p_{2} = 4 / 3

and using Theorem 1, we have the following property.

Property 2.

For any matrix

X \in R^{m \times n}

with

r a n k (X) = r \leq d

, the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{2 / 3}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{4 / 3}} {∥ V ∥}_{S_{4 / 3}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {(\frac{{∥ U ∥}_{S_{4 / 3}}^{4 / 3} + {∥ V ∥}_{S_{4 / 3}}^{4 / 3}}{2})}^{3 / 2} . \end{matrix}

3.1.2. The Case of $p_{1} \neq p_{2}$

In this part, we discuss the case of

p_{1} \neq p_{2}

. Unlike the case of

p_{1} = p_{2}

, we may set infinitely many different values for

p_{1}

and

p_{2}

. For any given

0 < p \leq 1

, there must exist

p_{1}, p_{2} > 0

, at least one of which is no less than 1, such that

1 / p = 1 / p_{1} + 1 / p_{2}

. Indeed, for any

0 < p \leq 1

, the values of

p_{1}

and

p_{2}

may be different (e.g.,

p_{1} = 1

and

p_{2} = 2

for

p = 2 / 3

), and thus we give the following unified scalable equivalent formulations for the Schatten p-(quasi-)norm. Note that after our work [28] appeared on ArXiv, [12] also presented the same result independently as in Theorem 3.

Theorem 3.

Given any matrix

X \in R^{m \times n}

of

r a n k (X) = r \leq d

, and any

0 < p \leq 1

,

p_{1} > 0

and

p_{2} > 0

satisfying

1 / p_{1} + 1 / p_{2} = 1 / p

, then the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{p}} & = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} \\ = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {(\frac{p_{2} {∥ U ∥}_{S_{p_{1}}}^{p_{1}} + p_{1} {∥ V ∥}_{S_{p_{2}}}^{p_{2}}}{p_{1} + p_{2}})}^{1 / p} \\ = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {(\frac{{∥ U ∥}_{S_{p_{1}}}^{p_{1}} / p_{1} + {∥ V ∥}_{S_{p_{2}}}^{p_{2}} / p_{2}}{1 / p})}^{1 / p} . \end{matrix}

(8)

Remark 2.

The proof of Theorem 3 is given in Section 5.3. From Theorem 3, we know that Theorem 2 and Corollary 1 can be viewed as two special cases of Theorem 3 when

p_{1} = p_{2} = 2 p

and

p_{1} = p_{2} = 2

, respectively. That is, Theorem 3 is the more general form of Theorem 2 and Corollary 1. From the equality in (8), we can see that for any

0 < p \leq 1

, the Schatten p-(quasi-)norm minimization problem can be transformed into the one of minimizing the weighted sum of the Schatten

p_{1}

-(quasi-)norm and Schatten

p_{2}

-(quasi-)norm of two smaller factor matrices (see Examples 3 and 4 below), where the weights of the two terms are

p_{2} / (p_{1} + p_{2})

and

p_{1} / (p_{1} + p_{2})

, respectively.

Below we give two representative examples for the case of

p_{1} \neq p_{2}

.

Example 3.

When

p = 2 / 3

, and by setting

p_{1} = 1

and

p_{2} = 2

, we have the following property, as in our previous work [22].

Property 3.

Given any matrix

X \in R^{m \times n}

with

r a n k (X) = r \leq d

, the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{2 / 3}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{*} {∥ V ∥}_{F} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {(\frac{{2 ∥ U ∥}_{*} + {∥ V ∥}_{F}^{2}}{3})}^{3 / 2} . \end{matrix}

(9)

The scalable formulation in (9) is known as the Frobenius/nuclear hybrid quasi-norm defined in our previous work [22]. It is clear that the Frobenius/nuclear hybrid quasi-norm is also a special case of Theorem 3 when

p = 2 / 3

,

p_{1} = 1

, and

p_{2} = 2

. As shown in [22,29], we can design more efficient algorithms to solve the Schatten p-quasi-norm minimization with

1 / 2 \leq p < 1

.

Example 4.

When

p = 2 / 5

, and by setting

p_{1} = 1 / 2

and

p_{2} = 2

, we have the following property.

Property 4.

Given any matrix

X \in R^{m \times n}

with

r a n k (X) = r \leq d

, the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{2 / 5}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{1 / 2}} {∥ V ∥}_{F} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {(\frac{{4 ∥ U ∥}_{S_{1 / 2}}^{1 / 2} + {∥ V ∥}_{F}^{2}}{5})}^{5 / 2} . \end{matrix}

3.2. Extensions to Multiple Factor Matrices

In this subsection, we extend the unified Schatten quasi-norm formulations of two factor matrices to the case of multiple factor matrices.

Theorem 4.

Each matrix

X \in R^{m \times n}

of

r a n k (X) = r \leq d

can be decomposed into the product of three much smaller matrices

U \in R^{m \times d}

,

V \in R^{d \times d}

and

W \in R^{n \times d}

, that is,

X = U V W^{T}

. For any

0 < p \leq 1

and

p_{i} > 0

for all

i = 1, 2, 3

, satisfying

1 / p_{1} + 1 / p_{2} + 1 / p_{3} = 1 / p

, then

\begin{matrix} {∥ X ∥}_{S_{p}} = & min_{U \in R^{m \times d}, V \in R^{d \times d}, W \in R^{n \times d} : X = U V W^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} {∥ W ∥}_{S_{p_{3}}} \\ = & min_{U \in R^{m \times d}, V \in R^{d \times d}, W \in R^{n \times d} : X = U V W^{T}} {(\frac{{∥ U ∥}_{S_{p_{1}}}^{p_{1}} / p_{1} + {∥ V ∥}_{S_{p_{2}}}^{p_{2}} / p_{2} + {∥ W ∥}_{S_{p_{3}}}^{p_{3}} / p_{3}}{1 / p})}^{1 / p} . \end{matrix}

(10)

The proof of Theorem 4 is provided in Section 5.4. From Theorem 4, one can see that for any

0 < p \leq 1

and

p_{1}, p_{2}, p_{3} > 0

satisfying

1 / p_{1} + 1 / p_{2} + 1 / p_{3} = 1 / p

, the Schatten p-(quasi-)norm of any matrix is equivalent to minimizing the weighted sum of the Schatten

p_{1}

-(quasi-)norm, Schatten

p_{2}

-(quasi-)norm, and Schatten

p_{3}

-(quasi-)norm of the three smaller factor matrices, where the weights of the three terms are

p / p_{1}

,

p / p_{2}

, and

p / p_{3}

, respectively. Similarly, we extend the results in Theorem 4 to the case of more factor matrices as follows.

Theorem 5.

Each matrix

X \in R^{m \times n}

of

r a n k (X) = r \leq d

can be decomposed into the product of multiple much smaller matrices

U_{i}

,

i = 1, 2, \dots, M

(i.e.,

X = \prod_{i = 1}^{M} U_{i}

). For any

0 < p \leq 1

and

p_{i} > 0

for all

i = 1, 2, \dots, M

, satisfying

\sum_{i = 1}^{M} 1 / p_{i} = 1 / p

, then

\begin{matrix} {∥ X ∥}_{S_{p}} = min_{U_{i} : X = \prod_{i = 1}^{M} U_{i}} \prod_{i = 1}^{M} {∥ U_{i} ∥}_{S_{p_{i}}} = min_{U_{i} : X = \prod_{i = 1}^{M} U_{i}} {(\frac{\sum_{i = 1}^{M} {∥ U_{i} ∥}_{S_{p_{i}}}^{p_{i}} / p_{i}}{1 / p})}^{1 / p} . \end{matrix}

(11)

The proof of Theorem 5 is very similar to that of Theorem 4, and is thus omitted. From Theorem 5, we can know that for any

0 < p \leq 1

and

p_{i} > 0

for all

i = 1, 2, \dots, M

satisfying

\sum_{i = 1}^{M} 1 / p_{i} = 1 / p

, the Schatten p-(quasi-)norm of any matrix is equivalent to minimizing the weighted sum of the Schatten

p_{i}

-(quasi-)norms of the smaller factor matrices, where the weights for these terms are

p / p_{i}

for all

i = 1, 2, \dots, M

.

Similar to the case of two factor matrices, for any given

0 < p \leq 1

, there exist infinitely many positive numbers

p_{1}

,

p_{2}

, and

p_{3}

such that

1 / p_{1} + 1 / p_{2} + 1 / p_{3} = 1 / p

, and the equality (10) holds. By setting the same value for

p_{1}

,

p_{2}

, and

p_{3}

(i.e.,

p_{1} = p_{2} = p_{3} = 3 p

), we give the following unified scalable equivalent formulations for the Schatten p-(quasi-)norm.

Corollary 2.

Given any matrix

X \in R^{m \times n}

of

r a n k (X) = r \leq d

, then the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{p}} = & min_{U \in R^{m \times d}, V \in R^{d \times d}, W \in R^{n \times d} : X = U V W^{T}} {∥ U ∥}_{S_{3 p}} {∥ V ∥}_{S_{3 p}} {∥ W ∥}_{S_{3 p}} \\ = & min_{U \in R^{m \times d}, V \in R^{d \times d}, W \in R^{n \times d} : X = U V W^{T}} {(\frac{{∥ U ∥}_{S_{3 p}}^{3 p} + {∥ V ∥}_{S_{3 p}}^{3 p} + {∥ W ∥}_{S_{3 p}}^{3 p}}{3})}^{1 / p} . \end{matrix}

(12)

Remark 3.

The proof of Corollary 2 is provided in Section 5.5. From the equality in (12), we know that for any

0 < p < 1

, Schatten p-quasi-norm minimization can be transformed into the problem of minimizing the mean of the Schatten

3 p

-norms (or quasi-norms) of three much smaller factor matrices. We also note that when

1 / 3 < p \leq 1

, the norms of the three factor matrices are convex and smooth because

3 p > 1

, which can also lead to some simpler and more efficient algorithms.

Example 5.

Below we give a representative example. When

p = 1 / 3

and

p_{1} = p_{2} = p_{3} = 1

, the equalities in Corollary 2 become the following forms, as in our previous work [29].

Property 5.

For any matrix

X \in R^{m \times n}

of

r a n k (X) = r \leq d

, the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{1 / 3}} = & min_{U \in R^{m \times d}, V \in R^{d \times d}, W \in R^{n \times d} : X = U V W^{T}} {∥ U ∥}_{*} {∥ V ∥}_{*} {∥ W ∥}_{*} \\ = & min_{U \in R^{m \times d}, V \in R^{d \times d}, W \in R^{n \times d} : X = U V W^{T}} {(\frac{{∥ U ∥}_{*} + {∥ V ∥}_{*} + {∥ W ∥}_{*}}{3})}^{3} . \end{matrix}

(13)

From Property 5, we can see that the tri-nuclear quasi-norm defined in our previous work [29] is also a special case of Corollary 2.

Theorem 2 shows that for any p satisfying

1 / 2 < p \leq 1

, the Schatten p-(quasi-)norm of any matrix is equivalent to minimizing the mean of the Schatten

2 p

-norms of both much smaller factor matrices, as well as Corollary 2 for any p satisfying

1 / 3 < p \leq 1

. In other words, if

1 / 2 < p \leq 1

or

1 / 3 < p \leq 1

, Schatten p-(quasi-)norm minimization can be transformed into a simpler problem only involving the convex and smooth norms of two or three smaller factor matrices. Moreover, we extend the results of Theorem 2 and Corollary 2 to the case of more factor matrices, as shown in Corollary 3 below. The proof of Corollary 3 is very similar to that of Corollary 2, and is thus omitted. That is, for any

0 < p < 1

, the Schatten p-quasi-norm of any matrix can theoretically be equivalent to the minimization of the mean of the Schatten

(M p)

-norms of all M factor matrices, where

M = (⌊ 1 / p ⌋ + 1)

. It needs to be strongly emphasized that the norms of all factor matrices are convex and smooth due to

M p > 1

, which can help us to design simpler and more efficient algorithms.

Corollary 3.

Given any matrix

X \in R^{m \times n}

of

r a n k (X) = r \leq d

, the following equalities hold:

\begin{matrix} {∥ X ∥}_{S_{p}} = min_{U_{i} : X = \prod_{i = 1}^{M} U_{i}} \prod_{i = 1}^{M} {∥ U_{i} ∥}_{S_{M p}} = min_{U_{i} : X = \prod_{i = 1}^{M} U_{i}} {(\frac{\sum_{i = 1}^{M} {∥ U_{i} ∥}_{S_{M p}}^{M p}}{M})}^{1 / p} . \end{matrix}

(14)

4. Numerical Experiments

We conducted numerical experiments to evaluate the performance of our scalable Schatten quasi-norm formulations. For a simple reason, we only briefly consider the following bi-nuclear (BN) quasi-norm and Frobenius/nuclear (FN) quasi-norm regularized least-squares problems:

\begin{matrix} min_{U, V} \{\frac{λ}{2} {(∥ U ∥}_{*} + {∥ V ∥}_{*}) + \frac{1}{2} {∥ P_{Ω} (U V^{T} - D) ∥}_{F}^{2}\}, \\ min_{U, V} \{\frac{λ}{3} {(2 ∥ U ∥}_{*} + {∥ V ∥}_{F}^{2}) + \frac{1}{2} {∥ P_{Ω} (U V^{T} - D) ∥}_{F}^{2}\} . \end{matrix}

Here we introduce the alternating direction method of multipliers (ADMM) for solving the above two scalable Schatten quasi-norm minimization models as in [43]. We compared our BN and FN methods with existing state-of-the-art methods such as weighted nuclear norm minimization (WNNM) [48] (http://www4.comp.polyu.edu.hk/~cslzhang/) and iteratively re-weighted nuclear norm (IRNN) [49], which usually have better performance than traditional nuclear norm solvers such as [50]. As suggested in [13], we chose the

ℓ_{q}

-norm penalty, where q was chosen from the range of

{0.1, 0.2, \dots, 1}

. That is, IRNN is one Schatten q-quasi-norm minimization method. The source code of IRNN can be downloaded at the link: https://sites.google.com/site/canyilu/. We selected the same values for the parameters d,

ρ

, and

μ_{0}

for both our methods (e.g.,

d = 9

as in [51]), and all the experiments were performed on a PC with an Intel i7-7700 CPU and 32 GB RAM.

We report the inpainting results of all the methods on the lenna and pepper images with 50% random missing pixels, as shown in Figure 1 and Figure 2. The results show that both our methods produced significantly better PSNR results than the other methods. As analyzed in [22,29], our quasi-norm formulations required significantly fewer observations than traditional nuclear norm solvers (e.g., [50]), and also led to scalable optimization problems. In particular, the resulting Schatten quasi-norm formulations may be Lipschitz, while the original Schatten quasi-norm minimization is non-Lipschitz and generally NP-hard [38]. Table 1 shows the running time of all the methods. It is clear that both our methods were more than 50 times faster than the well-known methods, especially IRNN, which suffers from high computational costs due to performing SVD in each iteration.

5. Proofs of Main Results

In this section, we give the proofs for some theorems and corollaries. We first introduce several key lemmas, such as Jensen’s inequality, Hölder’s inequality, and Young’s inequality, which are used throughout our proofs.

Lemma 1 (Jensen’s inequality).

Assume that the function

g (\cdot) : R^{+} \to R^{+}

is a continuous concave function on

[0, + \infty)

. For all

t_{i} \geq 0

satisfying

\sum_{i} t_{i} = 1

, and any

x_{i} \in R^{+}

for

i = 1, \dots, n

, then

g (\sum_{i = 1}^{n} t_{i} x_{i}) \geq \sum_{i = 1}^{n} t_{i} g (x_{i}) .

(15)

Lemma 2 (Hölder’s inequality).

For any

p, q > 1

satisfying

1 / p + 1 / q = 1

, then for any

x_{i}

and

y_{i}

,

i = 1, \dots, n

,

\sum_{i = 1}^{n} | x_{i} y_{i} | \leq {(\sum_{i = 1}^{n} {| x_{i} |}^{p})}^{1 / p} {(\sum_{i = 1}^{n} {| y_{i} |}^{q})}^{1 / q}

(16)

with equality iff there is a constant

c \neq 0

such that each

x_{i}^{p} = c y_{i}^{q}

.

Lemma 3 (Young’s inequality).

Let

a, b \geq 0

and

1 < p, q < \infty

be such that

1 / p + 1 / q = 1

. Then

\frac{a^{p}}{p} + \frac{b^{q}}{q} \geq a b

(17)

with equality iff

a^{p} = b^{q}

.

5.1. Proof of Theorem 1

Before giving a complete proof for Theorem 1, we first present and prove the following lemma.

Lemma 4.

Suppose that

Z \in R^{m \times n}

is a matrix of rank

r \leq min (m, n)

, and we denote its thin SVD by

Z = L_{Z} Σ_{Z} R_{Z}^{T}

, where

L_{Z} \in R^{m \times r}

,

R_{Z} \in R^{n \times r}

, and

Σ_{Z} \in R^{r \times r}

. For any

A \in R^{r \times r}

satisfying

A A^{T} = A^{T} A = I_{r \times r}

, and given

p (0 < p \leq 1)

, then

{(A Σ_{Z} A^{T})}_{k, k} \geq 0

for all

k = 1, \dots, r

, and

T r^{p} (A Σ_{Z} A^{T}) \geq T r^{p} (Σ_{Z}) = {∥ Z ∥}_{S_{p}}^{p},

where

T r^{p} (B) = \sum_{i} B_{i i}^{p}

.

Proof.

For any

k \in {1, \dots, r}

, we have

{(A Σ_{Z} A^{T})}_{k, k} = \sum_{i} a_{k i}^{2} σ_{i} \geq 0

, where

σ_{i} \geq 0

is the i-th singular value of Z. Then

T r^{p} (A Σ_{Z} A^{T}) = \sum_{k} {(\sum_{i} a_{k i}^{2} σ_{i})}^{p} .

(18)

Recall that

g (x) = x^{p}

with

0 < p < 1

is a concave function on

R^{+}

. By Jensen’s inequality [52], as stated in Lemma 1, and

\sum_{i} a_{k i}^{2} = 1

for any

k \in {1, \dots, r}

, we have

{(\sum_{i} a_{k i}^{2} σ_{i})}^{p} \geq \sum_{i} a_{k i}^{2} σ_{i}^{p} .

Using the above inequality and

\sum_{k} a_{k i}^{2} = 1

for any

i \in {1, \dots, r}

, (18) can be rewritten as

\begin{matrix} T r^{p} (A Σ_{Z} A^{T}) = \sum_{k} {(\sum_{i} a_{k i}^{2} σ_{i})}^{p} & \geq \sum_{k} \sum_{i} a_{k i}^{2} σ_{i}^{p} \\ = \sum_{i} σ_{i}^{p} = T r^{p} (Σ_{Z}) = {∥ Z ∥}_{S_{p}}^{p} . \end{matrix}

(19)

In addition, when

g (x) = x

(i.e.,

p = 1

), we obtain

{(\sum_{i} a_{k i}^{2} σ_{i})}^{p} = \sum_{i} a_{k i}^{2} σ_{i},

which means that the inequality (19) is still satisfied. □

Proof of Theorem 1.

Let

U = L_{U} Σ_{U} R_{U}^{T}

and

V = L_{V} Σ_{V} R_{V}^{T}

be the thin SVDs of U and V, respectively, where

L_{U} \in R^{m \times d}

,

L_{V} \in R^{n \times d}

, and

R_{U}, Σ_{U}, R_{V}, Σ_{V} \in R^{d \times d}

.

X = L_{X} Σ_{X} R_{X}^{T}

, where the columns of

L_{X} \in R^{m \times d}

and

R_{X} \in R^{n \times d}

are the left and right singular vectors associated with the top d singular values of X with rank at most r

(r \leq d)

, and

Σ_{X} = diag ([σ_{1} (X), \dots, σ_{r} (X), 0, \dots, 0]) \in R^{d \times d}

.

Recall that

X = U V^{T}

(i.e.,

L_{X} Σ_{X} R_{X}^{T} = L_{U} Σ_{U} R_{U}^{T} R_{V} Σ_{V} L_{V}^{T}

), then

\exists O_{1}, {\hat{O}}_{1} \in R^{d \times d}

satisfy

L_{X} = L_{U} O_{1}

and

L_{U} = L_{X} {\hat{O}}_{1}

, which implies that

O_{1} = L_{U}^{T} L_{X}

and

{\hat{O}}_{1} = L_{X}^{T} L_{U}

. Thus,

O_{1} = {\hat{O}}_{1}^{T}

. Since

L_{X} = L_{U} O_{1} = L_{X} {\hat{O}}_{1} O_{1}

, we have

{\hat{O}}_{1} O_{1} = O_{1}^{T} O_{1} = I_{d}

. Similarly, we have

O_{1} {\hat{O}}_{1} = O_{1} O_{1}^{T} = I_{d}

. In addition,

\exists O_{2} \in R^{d \times d}

satisfies

R_{X} = L_{V} O_{2}

with

O_{2} O_{2}^{T} = O_{2}^{T} O_{2} = I_{d}

. Let

O_{3} = O_{2} O_{1}^{T} \in R^{d \times d}

, then we have

O_{3} O_{3}^{T} = O_{3}^{T} O_{3} = I_{d}

, that is,

\sum_{i} {(O_{3})}_{i j}^{2} = \sum_{j} {(O_{3})}_{i j}^{2} = 1

for

\forall i, j \in {1, 2, \dots, d}

, where

a_{i, j}

denotes the element in the i-th row and the j-th column of the matrix A. In addition, let

O_{4} = R_{U}^{T} R_{V}

, we have

\sum_{i} {(O_{4})}_{i j}^{2} \leq 1

and

\sum_{j} {(O_{4})}_{i j}^{2} \leq 1

for

\forall i, j \in {1, 2, \dots, d}

.

According to the above analysis, we have

O_{2} Σ_{X} O_{2}^{T} = O_{2} O_{1}^{T} Σ_{U} O_{4} Σ_{V} = O_{3} Σ_{U} O_{4} Σ_{V}

. Let

ϱ_{i}

and

τ_{j}

denote the i-th and j-th diagonal elements of

Σ_{U}

and

Σ_{V}

, respectively. In the following, we consider the two cases of

p_{1}

and

p_{2}

, that is, at least one of

p_{1}

and

p_{2}

must be no less than 1, or both of them are smaller than 1. It is clear that for any

1 / 2 \leq p \leq 1

and

p_{1}, p_{2} > 0

satisfying

1 / p_{1} + 1 / p_{2} = 1 / p

, at least one of

p_{1}

and

p_{2}

must be no less than 1. On the other hand, only if

0 < p < 1 / 2

, there exist

0 < p_{1} < 1

and

0 < p_{2} < 1

such that

1 / p_{1} + 1 / p_{2} = 1 / p

, that is, both of them are smaller than 1.

Case 1. For any

0 < p \leq 1

, there exist

p_{1} > 0

and

p_{2} > 0

, at least one of which is no less than 1, such that

1 / p_{1} + 1 / p_{2} = 1 / p

. Without loss of generality, we assume that

p_{2} \geq 1

. Here, we set

k_{1} = p_{1} / p

and

k_{2} = p_{2} / p

. Clearly, we can know that

k_{1}, k_{2} > 1

and

1 / k_{1} + 1 / k_{2} = 1

. From Lemma 4, we have

\begin{matrix} {∥ X ∥}_{S_{p}} \leq & {(T r^{p} (O_{2} Σ_{X} O_{2}^{T}))}^{1 / p} = {(T r^{p} (O_{2} O_{1}^{T} Σ_{U} O_{4} Σ_{V}))}^{1 / p} = {(T r^{p} (O_{3} Σ_{U} O_{4} Σ_{V}))}^{1 / p} \\ = & {(\sum_{i = 1}^{d} {[\sum_{j = 1}^{d} τ_{j} {(O_{3})}_{i j} {(O_{4})}_{j i} ϱ_{i}]}^{p})}^{1 / p} = {(\sum_{i = 1}^{d} ϱ_{i}^{p} {(\sum_{j = 1}^{d} τ_{j} {(O_{3})}_{i j} {(O_{4})}_{j i})}^{p})}^{1 / p} \\ ^{a} \leq & {({[\sum_{i = 1}^{d} {(ϱ_{i}^{p})}^{k_{1}}]}^{1 / k_{1}} {[\sum_{i = 1}^{d} {(\sum_{j = 1}^{d} τ_{j} {(O_{3})}_{i j} {(O_{4})}_{j i})}^{p \times k_{2}}]}^{1 / k_{2}})}^{1 / p} \\ = & {(\sum_{i = 1}^{d} ϱ_{i}^{p_{1}})}^{1 / p_{1}} {[\sum_{i = 1}^{d} {(\sum_{j = 1}^{d} τ_{j} {(O_{3})}_{i j} {(O_{4})}_{j i})}^{p_{2}}]}^{1 / p_{2}} \\ ^{b} \leq & {(\sum_{i = 1}^{d} ϱ_{i}^{p_{1}})}^{1 / p_{1}} {[\sum_{i = 1}^{d} {(\sum_{j = 1}^{d} τ_{j} \frac{{(O_{3})}_{i j}^{2} + {(O_{4})}_{j i}^{2}}{2})}^{p_{2}}]}^{1 / p_{2}} \\ ^{c} \leq & {(\sum_{i = 1}^{d} ϱ_{i}^{p_{1}})}^{1 / p_{1}} {(\sum_{j = 1}^{d} τ_{j}^{p_{2}})}^{1 / p_{2}}, \end{matrix}

where the inequality

^{a} \leq

holds due to the Hölder’s inequality [52], as stated in Lemma 2. The inequality

^{b} \leq

follows from the basic inequality

x y \leq \frac{x^{2} + y^{2}}{2}

for any real numbers x and y, and the inequality

^{c} \leq

relies on the facts that

\sum_{i} {(O_{3})}_{i j}^{2} = 1

and

\sum_{i} {(O_{4})}_{j i}^{2} \leq 1

, and we apply Jensen’s inequality (see Lemma 1) for the convex function

h (x) = x^{p_{2}}

with

p_{2} \geq 1

.

For any matrices

U \in R^{m \times d}

and

V \in R^{n \times d}

satisfying

X = U V^{T}

, we have

{∥ X ∥}_{S_{p}} \leq {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} .

(20)

On the other hand, let

U_{⋆} = L_{X} Σ_{X}^{p / p_{1}}

and

V_{⋆} = R_{X} Σ_{X}^{p / p_{2}}

, where

Σ_{X}^{p}

is entry-wise power to p, then we obtain

X = U_{⋆} V_{⋆}^{T}, ∥ U_{⋆} ∥_{S_{p_{1}}}^{p_{1}} = ∥ V_{⋆} ∥_{S_{p_{2}}}^{p_{2}} = {∥ X ∥}_{S_{p}}^{p} with 1 / p = 1 / p_{1} + 1 / p_{2},

and

{∥ X ∥}_{S_{p}} = {(T r^{p} (Σ_{X}))}^{1 / p} = ∥ U_{⋆} ∥_{S_{p_{1}}} {∥ V_{⋆} ∥}_{S_{p_{2}}} .

Therefore, under the constraint

X = U V^{T}

, we have

{∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} .

Case 2. For any

0 < p < 1 / 2

, there exist

0 < {\hat{p}}_{1} < 1

and

0 < {\hat{p}}_{2} < 1

such that

1 / {\hat{p}}_{1} + 1 / {\hat{p}}_{2} = 1 / p

. Next, we prove that the result of Theorem 1 also holds. In fact, for any given p, there must exist

p_{1} > 0

and

p_{2} \geq 1

such that

1 / p_{1} + 1 / p_{2} = 1 / p

and

1 / p_{1} = 1 / {\hat{p}}_{1} + 1 / q

with

q \geq 1

. Clearly, we can know that

1 / {\hat{p}}_{1} < 1 / p_{1}

. Let

U^{*} = L_{X} Σ_{X}^{p / p_{1}}

,

V^{*} = R_{X} Σ_{X}^{p / p_{2}}

,

U_{1}^{*} = L_{X} Σ_{X}^{p / {\hat{p}}_{1}}

and

V_{1}^{*} = R_{X} Σ_{X}^{p / {\hat{p}}_{2}}

, then we have

X = U^{*} {(V^{*})}^{T} = U_{1}^{*} {(V_{1}^{*})}^{T}

, which implies that

{∥ X ∥}_{S_{p}} = ∥ U^{*} ∥_{S_{p_{1}}} ∥ V^{*} ∥_{S_{p_{2}}} = ∥ U_{1}^{*} ∥_{S_{{\hat{p}}_{1}}} {∥ V_{1}^{*} ∥}_{S_{{\hat{p}}_{1}}} .

(21)

Since

1 / p = 1 / p_{1} + 1 / p_{2} = 1 / {\hat{p}}_{1} + 1 / {\hat{p}}_{2}

and

1 / p_{1} = 1 / {\hat{p}}_{1} + 1 / q

, then

1 / {\hat{p}}_{2} = 1 / q + 1 / p_{2}

. Consider any factor matrices U and V satisfying

X = U V^{T}

,

V = L_{V} Σ_{V} R_{V}^{T}

is the thin SVD of V. Let

U_{1} = U U_{2}^{T}

and

V_{1} = L_{V} Σ_{V}^{{\hat{p}}_{2} / p_{2}}

, where

U_{2}^{T} = R_{V} Σ_{V}^{{\hat{p}}_{2} / q}

, and thus it is not difficult to verify that

\begin{matrix} V = V_{1} U_{2}, X = U_{1} V_{1}^{T} {, ∥ V ∥}_{S_{{\hat{p}}_{2}}} = ∥ U_{2} ∥_{S_{q}} ∥ V_{1} ∥_{S_{p_{2}}}, ∥ U_{1} ∥_{S_{p_{1}}} \leq {∥ U ∥}_{S_{{\hat{p}}_{1}}} {∥ U_{2} ∥}_{S_{q}}, \end{matrix}

(22)

where the above inequality follows from (20) with

q \geq 1

. Combining (21) and (22) and for any U and V satisfying

X = U V^{T}

, we have

\begin{matrix} {∥ X ∥}_{S_{p}} = ∥ U^{*} ∥_{S_{p_{1}}} {∥ V^{*} ∥}_{S_{p_{2}}} & \leq ∥ U_{1} ∥_{S_{p_{1}}} {∥ V_{1} ∥}_{S_{p_{2}}} \\ \leq {∥ U ∥}_{S_{{\hat{p}}_{1}}} ∥ U_{2} ∥_{S_{q}} ∥ V_{1} ∥_{S_{p_{2}}} = {∥ U ∥}_{S_{{\hat{p}}_{1}}} {∥ V ∥}_{S_{{\hat{p}}_{2}}}, \end{matrix}

(23)

where the first inequality follows from (20). Recall that

{∥ X ∥}_{S_{p}} = ∥ U_{1}^{*} ∥_{S_{{\hat{p}}_{1}}} {∥ V_{1}^{*} ∥}_{S_{{\hat{p}}_{2}}} .

(24)

Therefore, for any

0 < {\hat{p}}_{1} < 1

and

0 < {\hat{p}}_{2} < 1

satisfying

1 / p = 1 / {\hat{p}}_{1} + 1 / {\hat{p}}_{2}

, and by (23) and (24), we also have

{∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{{\hat{p}}_{1}}} {∥ V ∥}_{S_{{\hat{p}}_{2}}} .

In summary, for any

0 < p \leq 1

,

p_{1} > 0

and

p_{2} > 0

satisfying

1 / p = 1 / p_{1} + 1 / p_{2}

, we have

{∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} .

This completes the proof. □

5.2. Proof of Theorem 2

Proof.

Since

p_{1} = p_{2} = 2 p > 0

,

1 / p_{1} + 1 / p_{2} = 1 / p

, and using Theorem 1,

{∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{2 p}} {∥ V ∥}_{S_{2 p}} .

Due to the basic inequality

x y \leq \frac{x^{2} + y^{2}}{2}

for any real numbers x and y, we have

\begin{matrix} {∥ X ∥}_{S_{p}} = & min_{U, V : X = U V^{T}} {∥ U ∥}_{S_{2 p}} {∥ V ∥}_{S_{2 p}} = min_{U, V : X = U V^{T}} {({∥ U ∥}_{S_{2 p}}^{p} {∥ V ∥}_{S_{2 p}}^{p})}^{1 / p} \\ \leq & min_{U, V : X = U V^{T}} {(\frac{{∥ U ∥}_{S_{2 p}}^{2 p} + {∥ V ∥}_{S_{2 p}}^{2 p}}{2})}^{1 / p} . \end{matrix}

Let

U_{⋆} = L_{X} Σ_{X}^{1 / 2}

and

V_{⋆} = R_{X} Σ_{X}^{1 / 2}

, where

Σ_{X}^{1 / 2}

is entry-wise power to

1 / 2

. Then, we obtain

X = U_{⋆} V_{⋆}^{T}, ∥ U_{⋆} ∥_{S_{2 p}}^{2 p} = ∥ V_{⋆} ∥_{S_{2 p}}^{2 p} = {∥ X ∥}_{S_{p}}^{p},

which implies that

{∥ X ∥}_{S_{p}} = ∥ U_{⋆} ∥_{S_{2 p}} {∥ V_{⋆} ∥}_{S_{2 p}} = {(\frac{∥ U_{⋆} ∥_{S_{2 p}}^{2 p} + {∥ V_{⋆} ∥}_{S_{2 p}}^{2 p}}{2})}^{1 / p} .

The theorem now follows because

\begin{matrix} min_{U, V : X = U V^{T}} {∥ U ∥}_{S_{2 p}} {∥ V ∥}_{S_{2 p}} = min_{U, V : X = U V^{T}} {(\frac{{∥ U ∥}_{S_{2 p}}^{2 p} + {∥ V ∥}_{S_{2 p}}^{2 p}}{2})}^{1 / p} . \end{matrix}

This completes the proof. □

5.3. Proof of Theorem 3

Proof.

For any

0 < p \leq 1

,

p_{1} > 0

and

p_{2} > 0

satisfying

1 / p_{1} + 1 / p_{2} = 1 / p

, and using Theorem 1, we have

{∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} .

Let

k_{1} = (p_{1} + p_{2}) / p_{2}

and

k_{2} = (p_{1} + p_{2}) / p_{1}

, and it is easy to know that

1 / k_{1} + 1 / k_{2} = 1

. Then

\begin{matrix} {∥ X ∥}_{S_{p}} & = min_{U, V : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} = min_{U, V : X = U V^{T}} {({∥ U ∥}_{S_{p_{1}}}^{p} {∥ V ∥}_{S_{p_{2}}}^{p})}^{1 / p} \\ \leq min_{U, V : X = U V^{T}} {(\frac{{∥ U ∥}_{S_{p_{1}}}^{p k_{1}}}{k_{1}} + \frac{{∥ V ∥}_{S_{p_{2}}}^{p k_{2}}}{k_{2}})}^{1 / p} \\ = min_{U, V : X = U V^{T}} {(\frac{p_{2} {∥ U ∥}_{S_{p_{1}}}^{p_{1}} + p_{1} {∥ V ∥}_{S_{p_{2}}}^{p_{2}}}{p_{1} + p_{2}})}^{1 / p} \end{matrix}

where the above inequality follows from the well-known Young inequality and the monotone increasing property of the function

g (x) = x^{1 / p}

.

Let

U_{⋆} = L_{X} Σ_{X}^{p / p_{1}}

and

V_{⋆} = R_{X} Σ_{X}^{p / p_{2}}

, then

X = U_{⋆} V_{⋆}^{T}

and

\begin{matrix} {∥ X ∥}_{S_{p}} & = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} = ∥ U_{⋆} ∥_{S_{p_{1}}} {∥ V_{⋆} ∥}_{S_{p_{2}}} \\ = {(\frac{p_{2} ∥ U_{⋆} ∥_{S_{p_{1}}}^{p_{1}} + p_{1} {∥ V_{⋆} ∥}_{S_{p_{2}}}^{p_{2}}}{p_{1} + p_{2}})}^{1 / p} \end{matrix}

which implies that

\begin{matrix} {∥ X ∥}_{S_{p}} & = min_{U \in R^{m \times d}, V \in R^{n \times d} : X = U V^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} \\ = min_{U, V : X = U V^{T}} {(\frac{p_{2} {∥ U ∥}_{S_{p_{1}}}^{p_{1}} + p_{1} {∥ V ∥}_{S_{p_{2}}}^{p_{2}}}{p_{1} + p_{2}})}^{1 / p} \\ = min_{U, V : X = U V^{T}} {(\frac{{∥ U ∥}_{S_{p_{1}}}^{p_{1}} / p_{1} + {∥ V ∥}_{S_{p_{2}}}^{p_{2}} / p_{2}}{1 / p})}^{1 / p} . \end{matrix}

(25)

This completes the proof. □

5.4. Proof of Theorem 4

Proof.

Let

U \in R^{m \times d}

and

\hat{V} \in R^{n \times d}

be any factor matrices such that

X = U {\hat{V}}^{T}

, and

{\hat{p}}_{1} = p_{1} > 0

and

{\hat{p}}_{2} = p_{2} p_{3} / (p_{2} + p_{3}) > 0

, which means that

1 / {\hat{p}}_{1} + 1 / {\hat{p}}_{2} = 1 / p

. According to Theorem 1, we obtain

{∥ X ∥}_{S_{p}} = min_{U \in R^{m \times d}, \hat{V} \in R^{n \times d} : X = U {\hat{V}}^{T}} {∥ U ∥}_{S_{{\hat{p}}_{1}}} {∥ \hat{V} ∥}_{S_{{\hat{p}}_{2}}} .

(26)

Let

V \in R^{d \times d}

and

W \in R^{n \times d}

be factor matrices of

\hat{V}

(i.e.,

V W^{T} = {\hat{V}}^{T}

). Since

{\hat{p}}_{2} = p_{2} p_{3} / (p_{2} + p_{3})

, then

1 / {\hat{p}}_{2} = 1 / p_{2} + 1 / p_{3}

and

∥ \hat{V} ∥_{S_{{\hat{p}}_{2}}} = min_{V \in R^{d \times d}, W \in R^{n \times d} : \hat{V} = {(V W^{T})}^{T}} {∥ V ∥}_{S_{p_{2}}} {∥ W ∥}_{S_{p_{3}}} .

(27)

Combining (26) and (27), we obtain

{∥ X ∥}_{S_{p}} = min_{U, V, W : X = U V W^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} {∥ W ∥}_{S_{p_{3}}} .

Using the above result, we have

\begin{matrix} {∥ X ∥}_{S_{p}} = & min_{U, V, W : X = U V W^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} {∥ W ∥}_{S_{p_{3}}} \\ = & min_{U, V, W : X = U V W^{T}} {({∥ U ∥}_{S_{p_{1}}}^{p} {∥ V ∥}_{S_{p_{2}}}^{p} {∥ W ∥}_{S_{p_{3}}}^{p})}^{1 / p} \\ \leq & min_{U, V, W : X = U V W^{T}} {(\frac{p_{2} p_{3} {∥ U ∥}_{S_{p_{1}}}^{p_{1}} + p_{1} p_{3} {∥ V ∥}_{S_{p_{2}}}^{p_{2}} + p_{1} p_{2} {∥ W ∥}_{S_{p_{3}}}^{p_{3}}}{p_{2} p_{3} + p_{1} p_{3} + p_{1} p_{2}})}^{1 / p}, \end{matrix}

where the above inequality follows from the well-known Young inequality.

Let

U_{⋆} = L_{X} Σ_{X}^{p / p_{1}}

,

V_{⋆} = Σ_{X}^{p / p_{2}}

, and

W_{⋆} = R_{X} Σ_{X}^{p / p_{3}}

. It is easy to verify that

X = U_{⋆} V_{⋆} W_{⋆}^{T}

. Then, we have

\begin{matrix} {∥ X ∥}_{S_{p}} = & min_{U, V, W : X = U V W^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} {∥ W ∥}_{S_{p_{3}}} = ∥ U_{⋆} ∥_{S_{p_{1}}} ∥ V_{⋆} ∥_{S_{p_{2}}} {∥ W_{⋆} ∥}_{S_{p_{3}}} \\ = & {(\frac{p_{2} p_{3} ∥ U_{⋆} ∥_{S_{p_{1}}}^{p_{1}} + p_{1} p_{3} ∥ V_{⋆} ∥_{S_{p_{2}}}^{p_{2}} + p_{1} p_{2} {∥ W_{⋆} ∥}_{S_{p_{3}}}^{p_{3}}}{p_{2} p_{3} + p_{1} p_{3} + p_{1} p_{2}})}^{1 / p} . \end{matrix}

Therefore, we have

\begin{matrix} {∥ X ∥}_{S_{p}} = & min_{U, V, W : X = U V W^{T}} {∥ U ∥}_{S_{p_{1}}} {∥ V ∥}_{S_{p_{2}}} {∥ W ∥}_{S_{p_{3}}} \\ = & min_{U, V, W : X = U V W^{T}} {(\frac{p_{2} p_{3} {∥ U ∥}_{S_{p_{1}}}^{p_{1}} + p_{1} p_{3} {∥ V ∥}_{S_{p_{2}}}^{p_{2}} + p_{1} p_{2} {∥ W ∥}_{S_{p_{3}}}^{p_{3}}}{p_{2} p_{3} + p_{1} p_{3} + p_{1} p_{2}})}^{1 / p} \\ = & min_{U, V, W : X = U V W^{T}} {(\frac{{∥ U ∥}_{S_{p_{1}}}^{p_{1}} / p_{1} + {∥ V ∥}_{S_{p_{2}}}^{p_{2}} / p_{2} + {∥ W ∥}_{S_{p_{3}}}^{p_{3}} / p_{3}}{1 / p})}^{1 / p} . \end{matrix}

This completes the proof. □

5.5. Proof of Corollary 2

Proof.

Since

p_{1} = p_{2} = p_{3} = 3 p > 0

and

1 / p_{1} + 1 / p_{2} + 1 / p_{3} = 1 / p

, and using Theorem 4, we have

{∥ X ∥}_{S_{p}} = min_{U, V, W : X = U V W^{T}} {∥ U ∥}_{S_{3 p}} {∥ V ∥}_{S_{3 p}} {∥ W ∥}_{S_{3 p}} .

From the basic inequality

x y z \leq \frac{x^{3} + y^{3} + z^{3}}{3}

for arbitrary positive numbers x, y, and z, we obtain

\begin{matrix} {∥ X ∥}_{S_{p}} = & min_{U, V, W : X = U V W^{T}} {∥ U ∥}_{S_{3 p}} {∥ V ∥}_{S_{3 p}} {∥ W ∥}_{S_{3 p}} \\ = & min_{U, V, W : X = U V W^{T}} {({∥ U ∥}_{S_{3 p}}^{p} {∥ V ∥}_{S_{3 p}}^{p} {∥ W ∥}_{S_{3 p}}^{p})}^{1 / p} \\ \leq & min_{U, V, W : X = U V W^{T}} {(\frac{{∥ U ∥}_{S_{3 p}}^{3 p} + {∥ V ∥}_{S_{3 p}}^{3 p} + {∥ W ∥}_{S_{3 p}}^{3 p}}{3})}^{1 / p} . \end{matrix}

Let

U_{⋆} = L_{X} Σ_{X}^{1 / 3}

,

V_{⋆} = Σ_{X}^{1 / 3}

, and

W_{⋆} = R_{X} Σ_{X}^{1 / 3}

, where

Σ_{X}^{1 / 3}

is entry-wise power to

1 / 3

, then we have

X = U_{⋆} V_{⋆} W_{⋆}^{T}, ∥ U_{⋆} ∥_{S_{3 p}}^{3 p} = ∥ V_{⋆} ∥_{S_{3 p}}^{3 p} = ∥ W_{⋆} ∥_{S_{3 p}}^{3 p} = {∥ X ∥}_{S_{p}}^{p},

which implies that

\begin{matrix} {∥ X ∥}_{S_{p}} = & ∥ U_{⋆} ∥_{S_{3 p}} ∥ V_{⋆} ∥_{S_{3 p}} {∥ W_{⋆} ∥}_{S_{3 p}} \\ = & {(\frac{∥ U_{⋆} ∥_{S_{3 p}}^{3 p} + ∥ V_{⋆} ∥_{S_{3 p}}^{3 p} + {∥ W_{⋆} ∥}_{S_{3 p}}^{3 p}}{3})}^{1 / p} . \end{matrix}

The theorem now follows because

\begin{matrix} min_{U, V, W : X = U V W^{T}} {∥ U ∥}_{S_{3 p}} {∥ V ∥}_{S_{3 p}} {∥ W ∥}_{S_{3 p}} \\ = & min_{U, V, W : X = U V W^{T}} {(\frac{{∥ U ∥}_{S_{3 p}}^{3 p} + {∥ V ∥}_{S_{3 p}}^{3 p} + {∥ W ∥}_{S_{3 p}}^{3 p}}{3})}^{1 / p} . \end{matrix}

This completes the proof. □

6. Conclusions

Generally, the Schatten quasi-norm minimization problem is non-convex, non-smooth, and even non-Lipschitz, and thus most existing algorithms are too slow or even impractical for large-scale matrix recovery and completion problems. Therefore, it is very important to know how to transform such challenging problems into simpler ones. In this paper, we first presented and rigorously proved that for any p,

p_{1}

,

p_{2} > 0

satisfying

1 / p = 1 / p_{1} + 1 / p_{2}

, the Schatten p-quasi-norm of any matrix is equivalent to the minimization of the product (or weighted sum) of the Schatten

p_{1}

-(quasi-)norm and Schatten

p_{2}

-(quasi-)norm of two much smaller factor matrices. In particular, when

p > 1 / 2

, there is an equivalence between the Schatten p-(quasi-)norm of any matrix and the Schatten

2 p

-norms of its two factor matrices (i.e., Property 1). That is, the Schatten quasi-norm minimization problem with

p > 1 / 2

can be transformed into a simpler one only involving the smooth norms of smaller factor matrices, which can naturally lead to simpler and more efficient algorithms (e.g., [12,22,29,30]).

We further extended the equivalence formulation of two factor matrices to the cases of three and more factor matrices, from which we can see that for any

0 < p < 1

, the Schatten p-quasi-norm of any matrix is the minimization of the mean of the Schatten

(⌊ 1 / p ⌋ + 1) p

-norms of

⌊ 1 / p ⌋ + 1

factor matrices. In other words, for any

0 < p < 1

, the Schatten quasi-norm minimization problem can be transformed into an optimization problem only involving the smooth norms of multiple factor matrices. Finally, we provided some representative examples for two and three factor matrices. It is clear that the bi-nuclear and Frobenius/nuclear quasi-norms in our previous work [22] and the tri-nuclear quasi-norm in our previous work [29] are three important special cases. Our future work is the theoretical analysis of the properties of the proposed unified Schatten p-norm formulations compared to the nuclear norm and the Schatten quasi-norm as in [53,54] (e.g., how many observations are sufficient for our models to reliably recover low-rank matrices).

Author Contributions

Methodology and Formal analysis, F.S. (Fanhua Shang); Formal analysis, Y.L.; Investigation, L.K.; Software, F.S. (Fanjie Shang); Visualization, H.L.; Review & editing, L.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 61876220, 61876221, 61976164, 61836009 and U1701267, and 61871310), the Project supported the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 61621005), the Program for Cheung Kong Scholars and Innovative Research Team in University (No. IRT_15R53), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No. B07048), the Science Foundation of Xidian University (Nos. 10251180018 and 10251180019), the National Science Basic Research Plan in Shaanxi Province of China (Nos. 2019JQ-657 and 2020JM-194), and the Key Special Project of China High Resolution Earth Observation System-Young Scholar Innovation Fund.

Acknowledgments

We thank all the reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Candès, E.; Recht, B. Exact Matrix Completion via Convex Optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [CrossRef] [Green Version]
Candès, E.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM 2011, 58, 1–37. [Google Scholar] [CrossRef]
Liu, G.; Lin, Z.; Yu, Y. Robust Subspace Segmentation by Low-Rank Representation. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 663–670. [Google Scholar]
Yuan, M.; Ekici, A.; Lu, Z.; Monteiro, R. Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. B 2007, 69, 329–346. [Google Scholar] [CrossRef]
Argyriou, A.; Micchelli, C.A.; Pontil, M.; Ying, Y. A Spectral Regularization Framework for Multi-Task Structure Learning. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 25–32. [Google Scholar]
Liu, Z.; Vandenberghe, L. Interior-Point Method for Nuclear Norm Approximation with Application to System Identification. SIAM J. Matrix Anal. Appl. 2009, 31, 1235–1256. [Google Scholar] [CrossRef]
Fazel, M.; Hindi, H.; Boyd, S.P. A Rank Minimization Heuristic with Application to Minimum Order System Approximation. In Proceedings of the 2001 American Control Conference, Arlington, VA, USA, 25–27 June 2001; pp. 4734–4739. [Google Scholar]
Candès, E.; Tao, T. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inform. Theory 2010, 56, 2053–2080. [Google Scholar] [CrossRef] [Green Version]
Recht, B.; Fazel, M.; Parrilo, P.A. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. SIAM Rev. 2010, 52, 471–501. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its Oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1361. [Google Scholar] [CrossRef]
Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
Xu, C.; Lin, Z.; Zha, H. A Unified Convex Surrogate for the Schatten-p Norm. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 926–932. [Google Scholar]
Lu, C.; Tang, J.; Yan, S.; Lin, Z. Generalized Nonconvex Nonsmooth Low-Rank Minimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 4130–4137. [Google Scholar]
Nie, F.; Wang, H.; Cai, X.; Huang, H.; Ding, C. Robust Matrix Completion via Joint Schatten p-Norm and L_p-Norm Minimization. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10 December 2012; pp. 566–574. [Google Scholar]
Aravkin, A.; Kumar, R.; Mansour, H.; Recht, B.; Herrmann, F.J. Fast methods for denoising matrix completion formulations, with applications to robust seismic data interpolation. SIAM J. Sci. Comput. 2014, 36, S237–S266. [Google Scholar] [CrossRef] [Green Version]
Majumdar, A.; Ward, R.K. An algorithm for sparse MRI reconstruction by Schatten p-norm minimization. Magn. Reson. Imaging 2011, 29, 408–417. [Google Scholar] [CrossRef]
Mohan, K.; Fazel, M. Iterative Reweighted Algorithms for Matrix Rank Minimization. J. Mach. Learn. Res. 2012, 13, 3441–3473. [Google Scholar]
Lai, M.; Xu, Y.; Yin, W. Improved iteratively rewighted least squares for unconstrained smoothed ℓ_p minimization. SIAM J. Numer. Anal. 2013, 51, 927–957. [Google Scholar] [CrossRef]
Nie, F.; Huang, H.; Ding, C. Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization. In Proceedings of the Twenty-sixth AAAI conference on artificial intelligence, Toronto, AB, Canada, 22–26 July 2012; pp. 655–661. [Google Scholar]
Marjanovic, G.; Solo, V. On ℓ_p Optimization and Matrix Completion. IEEE Trans. Signal Process. 2012, 60, 5714–5724. [Google Scholar] [CrossRef]
Zhang, M.; Huang, Z.; Zhang, Y. Restricted p-Isometry Properties of Nonconvex Matrix Recovery. IEEE Trans. Inform. Theory 2013, 59, 4316–4323. [Google Scholar] [CrossRef]
Shang, F.; Liu, Y.; Cheng, J. Scalable Algorithms for Tractable Schatten Quasi-Norm Minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2016–2022. [Google Scholar]
Srebro, N.; Rennie, J.; Jaakkola, T. Maximum-Margin Matrix Factorization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004; pp. 1329–1336. [Google Scholar]
Mazumder, R.; Hastie, T.; Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 2010, 11, 2287–2322. [Google Scholar] [PubMed]
Mitra, K.; Sheorey, S.; Chellappa, R. Large-Scale Matrix Factorization with Missing Data under Additional Constraints. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6–9 December 2010; pp. 1651–1659. [Google Scholar]
Recht, B.; Ré, C. Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 2013, 5, 201–226. [Google Scholar] [CrossRef] [Green Version]
Zuo, W.; Meng, D.; Zhang, L.; Feng, X.; Zhang, D. A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 217–224. [Google Scholar]
Shang, F.; Liu, Y.; Cheng, J. Unified Scalable Equivalent Formulations for Schatten Quasi-Norms. arXiv 2016, arXiv:1606.00668. [Google Scholar]
Shang, F.; Liu, Y.; Cheng, J. Tractable and Scalable Schatten Quasi-Norm Approximations for Rank Minimization. In Proceedings of the Artificial Intelligence and Statistics, Cadiz, Spain, 9–11 May 2016; pp. 620–629. [Google Scholar]
Yang, L.; Pong, T.K.; Chen, X. A Nonmonotone Alternating Updating Method for a Class of Matrix Factorization Problems. SIAM J. Optim. 2018, 28, 3402–3430. [Google Scholar] [CrossRef] [Green Version]
Foucart, S.; Lai, M. Sparsest solutions of underdetermined linear systems via ℓ_q-minimization for 0 < q ≤ 1. Appl. Comput. Harmon. Anal. 2009, 26, 397–407. [Google Scholar]
Liu, Y.; Shang, F.; Cheng, H.; Cheng, J. A Grassmannian Manifold Algorithm for Nuclear Norm Regularized Least Squares Problems. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, Quebec City, QC, Canada, 23–27 July 2014; pp. 515–524. [Google Scholar]
Shang, F.; Liu, Y.; Cheng, J.; Cheng, H. Robust Principal Component Analysis with Missing Data. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, 3–7 November 2014; pp. 1149–1158. [Google Scholar]
Xu, H.; Caramanis, C.; Sanghavi, S. Robust PCA via outlier pursuit. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6–9 December 2010; pp. 2496–2504. [Google Scholar]
Chen, Y.; Jalali, A.; Sanghavi, S.; Caramanis, C. Low-rank matrix recovery from errors and erasures. IEEE Trans. Inform. Theory 2013, 59, 4324–4337. [Google Scholar] [CrossRef] [Green Version]
Shang, F.; Liu, Y.; Tong, H.; Cheng, J.; Cheng, H. Robust bilinear factorization with missing and grossly corrupted observations. Inform. Sci. 2015, 307, 53–72. [Google Scholar] [CrossRef]
Hsieh, C.; Olsen, P.A. Nuclear Norm Minimization via Active Subspace Selection. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 575–583. [Google Scholar]
Bian, W.; Chen, X.; Ye, Y. Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Math. Program. 2015, 149, 301–327. [Google Scholar] [CrossRef] [Green Version]
Larsen, R. PROPACK-Software for Large and Sparse SVD Calculations. 2015. Available online: http://sun.stanford.edu/srmunk/PROPACK/ (accessed on 28 March 2020).
Cai, J.F.; Osher, S. Fast Singular Value Thresholding without Singular Value Decomposition. Methods Anal. Appl. 2013, 20, 335–352. [Google Scholar] [CrossRef] [Green Version]
Liu, G.; Yan, S. Active subspace: Toward scalable low-rank learning. Neural Comp. 2012, 24, 3371–3394. [Google Scholar] [CrossRef]
Shang, F.; Liu, Y.; Cheng, J. Generalized Higher-Order Tensor Decomposition via Parallel ADMM. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1279–1285. [Google Scholar]
Shang, F.; Cheng, J.; Liu, Y.; Luo, Z.Q.; Lin, Z. Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2066–2080. [Google Scholar] [CrossRef] [Green Version]
Krechetov, M.; Marecek, J.; Maximov, Y.; Takac, M. Entropy-Penalized Semidefinite Programming. arXiv 2019, arXiv:1802.04332. [Google Scholar]
Cabral, R.; Torre, F.; Costeira, J.; Bernardino, A. Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-rank Matrix Decomposition. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2488–2495. [Google Scholar]
Feng, J.; Xu, H.; Yan, S. Online robust PCA via stochastic optimization. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 404–412. [Google Scholar]
Jin, K.H.; Ye, J.C. Annihilating Filter-Based Low-Rank Hankel Matrix Approach for Image Inpainting. IEEE Trans. Image Process. 2015, 24, 3498–3511. [Google Scholar]
Gu, S.; Xie, Q.; Meng, D.; Zuo, W.; Feng, X.; Zhang, L. Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision. Int. J. Comput. Vis. 2017, 121, 183–208. [Google Scholar] [CrossRef]
Lu, C.; Tang, J.; Yan, S.; Lin, Z. Nonconvex Nonsmooth Low Rank Minimization via Iteratively Reweighted Nuclear Norm. IEEE Trans. Image Process. 2016, 25, 829–839. [Google Scholar] [CrossRef] [Green Version]
Toh, K.C.; Yun, S. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pac. J. Optim. 2010, 6, 615–640. [Google Scholar]
Hu, Y.; Zhang, D.; Ye, J.; Li, X.; He, X. Fast and Accurate Matrix Completion via Truncated Nuclear Norm Regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2117–2130. [Google Scholar] [CrossRef] [PubMed]
Mitrinović, D.S. Analytic Inequalities; Springer: Berline/Heidelberg, Germany, 1970. [Google Scholar]
Xu, Z. The minimal measurement number for low-rank matrix recovery. Appl. Comput. Harmon. Anal. 2018, 44, 497–508. [Google Scholar] [CrossRef]
Zhang, R.; Li, S. Optimal RIP bounds for sparse signals recovery via ℓ_p minimization. Appl. Comput. Harmon. Anal. 2019, 47, 566–584. [Google Scholar] [CrossRef]

Figure 1. Image inpainting results of IRNN [49], WNNM [48], and our two methods on the lenna image with 50% random missing pixels (best viewed in color).

Figure 2. Image inpainting results of IRNN [49], WNNM [48], and our two methods on the pepper image with 50% random missing pixels (best viewed in color).

Table 1. Comparison of running time of the methods (in seconds).

	IRNN [49]	WNNM [48]	BN	FN
lenna	2381.04	389.53	7.06	5.27
pepper	2257.39	382.78	7.35	5.31

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shang, F.; Liu, Y.; Shang, F.; Liu, H.; Kong, L.; Jiao, L. A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms. Mathematics 2020, 8, 1325. https://0-doi-org.brum.beds.ac.uk/10.3390/math8081325

AMA Style

Shang F, Liu Y, Shang F, Liu H, Kong L, Jiao L. A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms. Mathematics. 2020; 8(8):1325. https://0-doi-org.brum.beds.ac.uk/10.3390/math8081325

Chicago/Turabian Style

Shang, Fanhua, Yuanyuan Liu, Fanjie Shang, Hongying Liu, Lin Kong, and Licheng Jiao. 2020. "A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms" Mathematics 8, no. 8: 1325. https://0-doi-org.brum.beds.ac.uk/10.3390/math8081325

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms

Abstract

1. Introduction

2. Notations and Background

3. A Unified Formulation for Schatten Quasi-Norms

3.1. Unified Schatten Quasi-Norm Formulations of Two Factor Matrices

3.1.1. The Case of $p_{1} = p_{2}$

3.1.2. The Case of $p_{1} \neq p_{2}$

3.2. Extensions to Multiple Factor Matrices

4. Numerical Experiments

5. Proofs of Main Results

5.1. Proof of Theorem 1

5.2. Proof of Theorem 2

5.3. Proof of Theorem 3

5.4. Proof of Theorem 4

5.5. Proof of Corollary 2

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms

Abstract

1. Introduction

2. Notations and Background

3. A Unified Formulation for Schatten Quasi-Norms

3.1. Unified Schatten Quasi-Norm Formulations of Two Factor Matrices

3.1.1. The Case of p 1 = p 2

3.1.2. The Case of p 1 ≠ p 2

3.2. Extensions to Multiple Factor Matrices

4. Numerical Experiments

5. Proofs of Main Results

5.1. Proof of Theorem 1

5.2. Proof of Theorem 2

5.3. Proof of Theorem 3

5.4. Proof of Theorem 4

5.5. Proof of Corollary 2

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1.1. The Case of $p_{1} = p_{2}$

3.1.2. The Case of $p_{1} \neq p_{2}$