Robust Sparse Representation for Incomplete and Noisy Data

Shi, Jiarong; Zheng, Xiuyun; Yang, Wei

doi:10.3390/info6030287

Open AccessArticle

Robust Sparse Representation for Incomplete and Noisy Data

by

Jiarong Shi

^*

,

Xiuyun Zheng

and

Wei Yang

School of Science, Xi'an University of Architecture and Technology, Xi'an 710055, China

^*

Author to whom correspondence should be addressed.

Information 2015, 6(3), 287-299; https://0-doi-org.brum.beds.ac.uk/10.3390/info6030287

Submission received: 12 June 2015 / Revised: 15 June 2015 / Accepted: 16 June 2015 / Published: 24 June 2015

(This article belongs to the Section Information Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Owing to the robustness of large sparse corruptions and the discrimination of class labels, sparse signal representation has been one of the most advanced techniques in the fields of pattern classification, computer vision, machine learning and so on. This paper investigates the problem of robust face classification when a test sample has missing values. Firstly, we propose a classification method based on the incomplete sparse representation. This representation is boiled down to an l₁ minimization problem and an alternating direction method of multipliers is employed to solve it. Then, we provide a convergent analysis and a model extension on incomplete sparse representation. Finally, we conduct experiments on two real-world face datasets and compare the proposed method with the nearest neighbor classifier and the sparse representation-based classification. The experimental results demonstrate that the proposed method has the superiority in classification accuracy, completion of the missing entries and recovery of noise.

Keywords:

sparse representation; robust; face classification; alternating direction method of multipliers; incomplete; l₁ minimization

1. Introduction

As a parsimony method, the sparse signal representation means that we desire to represent a signal by the linear combination of a few basis elements in an over-complete dictionary. The emerging theory of sparse representation and compressed sensing [1,2] has made exciting breakthroughs and received a great deal of attention in the past decade. Nowadays, the sparse representation has already been a powerful technique for efficiently acquiring, compressing and reconstructing a signal. Besides, the sparse representation also has two powerful functions, that is, it is robust to large sparse corruptions and discriminative to class labels. These two distinguished functions promote its extensive and successful applications in areas such as pattern classification [3,4,5], computer vision [6] and machine learning [7].

It is worth mentioning that Wright et al. [3] proposed a novel method for robust face classification. They applied the idea of sparse representation to pattern classification and demonstrated that this unorthodox method can obtain significant improvements in classification accuracy over traditional methods. Subsequently, Yin et al. [8] extended the aforementioned classification method to the kernel version. Moreover, Huang et al. [9] and Qiao et al. [4] performed respectively face classification and signal classification by combining the discriminative methods with sparse representation. In [7], Cheng et al., constructed a robust and datum-adaptive l₁-graph on the basis of sparse representation. Compared with k-nearest graph and ε-ball graph, the l₁-graph is more robust to large sparse noise and more discriminative to neighbors. Zhang et al. [10] presented a robust seminonnegative graph embedding framework, and Chen et al. [11] applied non-negative sparse coding to facial expression classification. Elhamifar et al. [12] proposed a framework of sparse subspace clustering, which harnessed the sparse representation to cluster data drawn from multiple low dimensional subspaces.

In the community of pattern classification and machine learning, we mainly restrict our attention to the situation that all samples do not have any missing entries. However, the datasets with missing values are ubiquitous in many practical applications such as image in-painting, video encoding and collaborative filtering. A commonly-used modeling assumption in data analysis is that the investigated dataset is (approximately) low-rank. Based on this assumption, Candès et al. [13] proposed a technique of matrix completion via convex optimization and showed that most low-rank matrices can be exactly completed under certain conditions. If we make a further clustering analysis on these datasets, Shi et al. [14] proposed the method of incomplete low-rank representation, which is validated to be very robust to missing values.

For the task of pattern classification, we usually stipulate that all samples from the same class lie in a low dimensional subspace. If the training samples have missing entries, the matrix completion or the incomplete low-rank representation can be employed to complete or recover all missing values. This paper considers the pattern classification problem that the test samples have missing values while the training samples are complete. To address it, we propose a method of incomplete sparse representation. This method treats each incomplete test sample as the linear combination of all training samples and searches the sparest representation.

The remainder of this paper is organized as follows. Section 2 reviews the classification problem based on sparse representation. In Section 3, we propose a model of incomplete sparse representation and develop an alternating direction method of multipliers (ADMM) [15] to solve it. Convergence analysis and model extension are made in Section 4. In Section 5, we carry out experiments on two well-known face datasets and validate the superiority of the proposed method by comparing with other techniques. The last section draws the conclusions.

2. Sparse Representation for Classification

A fundamental problem in pattern classification is how to determine the class label of a test sample according to labeled training samples from distinct classes. Given a training set collected by C classes, we express all samples from the i-th class as

{a_{i j} \in ℝ^{d}}_{j = 1}^{N_{i}}

, where d is the dimensionality of each sample, and

N_{i}

is the sample number of the i-th class,

i = 1, 2, ..., C

. Denote

N = \sum_{i = 1}^{C} N_{i}

and

A_{i} = (a_{i 1}, a_{i 2}, \dots, a_{i N_{i}})

. Thus, the entire training samples can be concatenated into a

d \times N

matrix

A = (A_{1}, A_{2}, \dots, A_{C})

.

We assume that the samples from the same class lie in a low-dimensional linear subspace and there are sufficient training samples for each class. Given a new test sample

y = {(y_{1}, y_{2}, ..., y_{d})}^{T} \in ℝ^{d}

with label i, we can express it as the linear combination of the training samples from the i-th class:

y = w_{i 1} a_{i 1} + w_{i 2} a_{i 2} + \dots + w_{i N_{i}} a_{i N_{i}}

(1)

for some real scalars

w_{i j}, j = 1, 2, ..., N_{i}

. Set

w = {(0, ..., 0, w_{i 1}, w_{i 2}, ..., w_{i N_{i}}, 0, ..., 0)}^{T} \in ℝ^{N}

. Then the linear representation of

y

can be re-expressed in terms of N training samples as

y = A w .

(2)

In view of the assumption that the training samples are enough, the coefficient vector

w

satisfying Equation (2) is not unique. Moreover, the vector

w

is also sparse if

N_{i} / N

is very small.

If the class label of

y

is unknown, we still desire to obtain its label on the basis of the linear representation coefficients

w

. The sparse representation is essentially discriminative to classification and robust to large sparse noise or outliers. Considering these advantages, we will construct a sparse representation model to perform classification. For the given dictionary matrix

A

, the sparse representation of

y

can be reached by solving an l₁-minimization problem

\min_{w} {‖ w ‖}_{1}, s . t . y = A w

(3)

or its stable version

\min_{w} {‖ w ‖}_{1}, s . t . {‖ y - A w ‖}_{2} \leq δ

(4)

where the error bound

δ > 0

,

{‖ \cdot ‖}_{1}

and

{‖ \cdot ‖}_{2}

are the l₁-norm and the l₂-norm of vectors respectively. Sparse representation is a global method and it has superiority in determining the class label over other local methods such as nearest neighbor (NN) and nearest subspace (NS) [3].

Denote the optimal solution of Problem (3) or (4) by

\hat{w}

. Sparse representation-based classification (SRC) [3] utilizes

\hat{w}

to judge which class

y

belongs to. The following will present the detailed implementation process. We first introduce C characteristic functions

δ_{i} : ℝ^{N} \to ℝ^{N}

as below

{(δ_{i} (x))}_{j} = {\begin{cases} x_{j}, if \sum_{k = 0}^{i - 1} N_{k} < j \leq \sum_{k = 0}^{i} N_{k} \\ 0, otherwise \end{cases}

(5)

for arbitrary

x = {(x_{1}, x_{2}, ..., x_{N})}^{T} \in ℝ^{N}

, where

N_{0} = 0

,

i = 1, 2, ..., C

. Then we computed C residuals

r_{i} (y) = {‖ y - A δ_{i} (\hat{w}) ‖}_{2}, i = 1, 2, ..., C

. Finally, the class of

y

is labeled as

\arg \min_{i} r_{i} (y)

.

3. Incomplete Sparse Representation for Classification

As the second orders generalization of the compressed sensing and sparse representation theory, the low-rank matrix completion is a technique to complete all missing or unknown entries of a matrix by means of the low-rank structure. If all training samples are complete and the test sample has missing entries, we can not effectively recover the missing values through the use of the low rank property. In other words, the available matrix completion methods become invalid. To solve this problem, we propose a method of incomplete sparse representation for classification. The proposed method not only completes effectively the missing entries but also obtains better classification performance.

3.1. Model of Incomplete Sparse Representation

Considering the existence of noise, we decompose a given test sample

y

into the sum of two terms:

y = A w + e

, where

e \in ℝ^{d}

is the noise vector. Let

Ω \subset {1, 2, ..., d}

be an index set, then

y_{k}

is missing if and only if

k \notin Ω

. For the convenience of description, we define an orthogonal projector operator

P_{Ω} (\cdot) : ℝ^{d} \to ℝ^{d}

as follows

{(P_{Ω} (y))}_{k} = {\begin{cases} y_{k}, if k \in Ω, \\ 0, otherwise . \end{cases}

(6)

Thus, the known entries of

y

can be written as

y_{0} = P_{Ω} (y)

.

For the incomplete test sample

y_{0}

, we hope to complete all missing entries and obtain the sparsest linear representation on the basis of

A

and

Ω

. To this end, we construct an l₁-minimization problem

\begin{array}{l} \min_{w, e, y} {‖ w ‖}_{1} + τ {‖ e ‖}_{2}^{2} \\ s . t . y = A w + e, P_{Ω} (y) = y_{0} \end{array}

(7)

where the tradeoff factor

τ > 0

. As a matter of fact, the objective function in the above problem can be replaced by

{‖ w ‖}_{1} + τ {‖ P_{Ω} (e) ‖}_{2}^{2}

. This conclusion means that it is impossible to recover the noise corresponding to the missing positions. Problem (7) is a convex and non-smooth minimization with equality constraints. We will employ the alternating direction method of multipliers (ADMM) to solve this problem.

3.2. Generic Formulation of ADMM

The ADMM [15] is a simple and easily implemented optimization method proposed in 1970s. It is well suited to distributed convex optimization, and, in particular, to large scale optimization problems with multiple non-smooth terms in the objective functions. Hence, the method has received a lot of attention in recent years.

Generally, ADMM solves the constrained optimization problem taking the following generic form:

\min_{x \in ℝ^{m}, y \in ℝ^{n}} f (x) + g (y), s . t . B x + C y = d

(8)

where

f : ℝ^{m} \to ℝ

,

g : ℝ^{n} \to ℝ

, and they are proper and convex. The augmented Lagrange function of Problem (8) is defined as follows

L_{μ} (x, y, λ) = f (x) + g (y) + 〈 λ, B x - C y - d 〉 + μ {‖ B x - C y - d ‖}_{2}^{2} / 2

(9)

where

μ

is a positive scalar,

λ

is the Lagrange multiplier vector and

〈 \cdot, \cdot 〉

is the inner operator of vectors.

ADMM updates alternatively each block of variables. The iterative formulations of blocks of variables are outlined as follows

{\begin{cases} x : = \arg \min_{x} L_{μ} (x, y, λ) \\ y : = \arg \min_{y} L_{μ} (x, y, λ) \\ λ : = \arg \max_{λ} L_{μ} (x, y, λ) . \end{cases}

(10)

Moreover, the values of

μ

will be increased during the procedure of iterations.

3.3. Algorithm for Incomplete Sparse Representation

We adopt the method of ADMM with multiple blocks of variables to solve the problem of incomplete sparse representation. By introducing two auxiliary variables

z \in ℝ^{d}

and

u \in ℝ^{N}

, we reformulate Problem (7) as follows:

\begin{array}{l} \min_{w, e, y, z, u} {‖ w ‖}_{1} + τ {‖ e ‖}_{2}^{2}, \\ s . t . z = A u + e, w = u, z = y, P_{Ω} (y) = y_{0} . \end{array}

(11)

Without considering the constraint

P_{Ω} (y) = y_{0}

, the augmented Lagrange function of the above optimization problem is constructed as

\begin{array}{l} L_{ρ} (w, e, y, z, u, λ_{1}, λ_{2}, λ_{3}) \\ = {‖ w ‖}_{1} + τ {‖ e ‖}_{2}^{2} + \frac{ρ}{2} ({‖ z - A u - e ‖}_{2}^{2} + {‖ w - u ‖}_{2}^{2} + {‖ z - y ‖}_{2}^{2}) \\ + 〈 λ_{1}, z - A u - e 〉 + 〈 λ_{2}, w - u 〉 + 〈 λ_{3}, z - y 〉 \end{array}

(12)

where the penalty coefficient

ρ > 0

,

λ_{1} \in ℝ^{d}, λ_{2} \in ℝ^{N}

and

λ_{3} \in ℝ^{d}

are three Lagrange multipliers vectors. Let

λ = {λ_{1}, λ_{2}, λ_{3}}

, then Equation (12) is equivalent to

\begin{array}{l} L_{ρ} (w, e, y, z, u, λ) \\ = {‖ w ‖}_{1} + τ {‖ e ‖}_{2}^{2} + \frac{ρ}{2} ({‖ z - A u - e + λ_{1} / ρ ‖}_{2}^{2} + {‖ w - u + λ_{2} / ρ ‖}_{2}^{2} + {‖ z - y + λ_{3} / ρ ‖}_{2}^{2}) . \end{array}

(13)

ADMM updates alternatively each block of variables by minimizing or maximizing

L_{ρ}

. We subsequently give the detailed iterative procedure for Problem (11).

Computing

w

. When

w

is unknown and other blocks of variables are fixed, the calculation formulation of

w

is as follows:

\begin{array}{l} w : = \arg \min_{w} L_{ρ} \\ = \arg \min_{w} \frac{1}{ρ} {‖ w ‖}_{1} + \frac{1}{2} {‖ w - (u - λ_{2} / ρ) ‖}_{2}^{2} \\ = S_{1/ ρ} (u - λ_{2} / ρ) \end{array}

(14)

where

S_{1/ ρ} (\cdot): ℝ^{N} \to ℝ^{N}

is the absolute shrinkage operator [16] defined by

{(S_{1/ ρ} (x))}_{j} = \max (| x_{j} | - 1/ ρ, 0) sgn (x_{j})

(15)

for arbitrary

x \in ℝ^{N}

.

Computing

e

. If

e

is unknown and other variables are given,

e

is updated by minimizing

L_{ρ}

:

\begin{array}{l} e : = \arg \min_{e} L_{ρ} \\ = \arg \min_{e} τ {‖ e ‖}_{2}^{2} + \frac{ρ}{2} {‖ (z - A u + λ_{1} / ρ) - e ‖}_{2}^{2} \\ = \frac{ρ}{2 τ + ρ} (z - A u + λ_{1} / ρ) . \end{array}

(16)

Computing

u

. The update formulation of

u

is calculated as follows:

u : = \arg \min_{u} L_{ρ} = \arg \min_{u} f (u)

(17)

where

f (u) = {‖ (z - e + λ_{1} / ρ) - A u ‖}_{2}^{2} + {‖ (w + λ_{2} / ρ) - u ‖}_{2}^{2}

. By setting the derivative of

f (u)

to zeros, we have

A^{T} A u - A^{T} (z - e + λ_{1} / ρ) + u - (w + λ_{2} / ρ) = 0

(18)

or, equivalently,

u = {(A^{T} A + I_{N})}^{- 1} (A^{T} (z - e + λ_{1} / ρ) + (w + λ_{2} / ρ))

(19)

where

I_{N}

is an N-order identity matrix.

Computing

z

. Fix

w, e, y, u

and

λ

, and minimize

L_{ρ}

with respect to

z

:

\begin{array}{l} z : = \arg \min_{z} L_{ρ} \\ = \arg \min_{z} {‖ z - (A u + e - λ_{1} / ρ) ‖}_{2}^{2} + {‖ z - (y - λ_{3} / ρ) ‖}_{2}^{2} \\ = \frac{1}{2} (A u + e + y - (λ_{1} + λ_{3}) / ρ) . \end{array}

(20)

Computing

y

. Given

w, e, z, u

and

λ

, we calculate

y

as follows

\begin{array}{l} y : = \arg \min_{y} L_{ρ} \\ = \arg \min_{y} {‖ (z + λ_{3} / ρ) - y ‖}_{2}^{2} \\ = z + λ_{3} / ρ . \end{array}

(21)

Considering the constraint

P_{Ω} (y) = y_{0}

, we further obtain the iterative formulation of

y

:

y : = y_{0} + P_{\bar{Ω}} (z + λ_{3} / ρ)

(22)

where

\bar{Ω}

is the complementary set of

Ω

.

Computing

λ

. Given

w, e, y, z, u

, we update

λ

as follows

{\begin{cases} λ_{1} : = λ_{1} + ρ (z - A u - e) \\ λ_{2} : = λ_{2} + ρ (w - u) \\ λ_{3} : = λ_{3} + ρ (z - y) \end{cases}

(23)

The whole iterative process for solving Problem (11) is outlined in Algorithm 1. In the initialization step, the blocks of variables can be chosen as follows:

e = 0, y = y_{0}, z = 0, u = 0

,

λ_{1} = 0, λ_{2} = 0, λ_{3} = 0

. We set the stopping condition of Algorithm 1 to be

\max ({‖ z - A u - e ‖}_{2}, {‖ w - u ‖}_{2}, {‖ z - y ‖}_{2}) < ε

(24)

where

ε

is a sufficiently small positive number. The inverse matrix

{(A^{T} A + I)}^{- 1}

is computed only once and the corresponding computational complexity is

O (N^{2} d + N^{3})

. In addition, the complexity of Algorithm 1 is also

O (N^{2} d + N^{3})

for each outer loop.

Algorithm 1. Solving Problem (11) via ADMM.

Input: the dictionary matrix

A

constructed by all training samples, an incomplete test sample

y_{0}

and the sampling index set

Ω

.

Output:,

y

,

w

and

e

.

Initialize:

e, y, z, u, λ_{1}, λ_{2}, λ_{3}, ρ, \bar{ρ}, τ, μ > 1

.

While not converged do

1: Update

w

according to (14).

2: Update

e

according to (16).

3: Update

u

according to (19).

4: Update

z

according to (20).

5: Update

y

according to (22).

6: Update

λ

according to (23).

7: Update

ρ

as

\min (μ ρ, \bar{ρ})

.

End while

Let

\hat{y}

,

\hat{w}

and

\hat{e}

be the output variables of Algorithm 1. The vector

\hat{y}

denotes the completed sample of

y_{0}

and

\hat{w}

indicates the sparse representation of

\hat{y}

over the basis matrix A. In view of the discriminative performance of

\hat{w}

, this sparse vector can be employed to obtain the class label of

y_{0}

. More specially, we first compute C residuals

r_{i} (y_{0}) = {‖ y_{0} - P_{Ω} (A δ_{i} (\hat{w})) ‖}_{2}

and then label the class of

y_{0}

to be

\arg \min_{i} r_{i} (y_{0})

. The above method is called incomplete SRC (ISRC), a variant of SRC.

4. Convergence Analysis and Model Extension

Although the minimization Problem (11) is convex and continuous, it is still difficult to straightly prove the convergence of Algorithm 1. The main reason for this difficulty is that the number of blocks of variables is more than two. If there is no missing value, we can design an exact ADMM for solving Problem (11). The following theorem shows the convergence of the modified method.

Theorem 1. If

Ω = {1, 2, ..., d}

and

L_{0} (w, e, y_{0}, z, u, λ)

has a saddle point, then the iterative formulations on the basis of an exact ADMM

{\begin{cases} (w^{k + 1}, z^{k + 1}) : = \arg \min_{z, w} L_{ρ} (w, e^{k}, y_{0}, z, u^{k}, λ^{k}) \\ (u^{k + 1}, e^{k + 1}) : = \arg \min_{u, e ​} L_{ρ} (w^{k + 1}, e, y_{0}, z^{k + 1}, u, λ^{k}) \\ λ_{1}^{k + 1} : = λ_{1}^{k} + ρ (z^{k + 1} - A u^{k + 1} - e^{k + 1}) \\ λ_{2}^{k + 1} : = λ_{2}^{k} + ρ (w^{k + 1} - u^{k + 1}) \\ λ_{3}^{k + 1} : = λ_{3}^{k} + ρ (z^{k + 1} - y_{0}) \end{cases}

(25)

satisfy that

L_{ρ} (w, e, y_{0}, z, u, λ)

converges to the optimal value.

Proof. The objective function of Problem (11) can be rewritten as

f (z, w) + g (u, e)

, where

f (z, w) = {‖ w ‖}_{1}

and

g (u, e) = τ {‖ e ‖}_{2}^{2}

. It is obvious that

f (z, w)

and

g (u, e)

are two closed, proper and convex functions.

Since

Ω = {1, 2, ..., d}

, we have

y = y_{0}

, which means it is not necessary to consider the update of

y

. Under this circumstance, the constraints in Problem (11) are equivalent to

(\begin{matrix} I_{d} & O_{d \times N} \\ I_{d} & O_{d \times N} \\ O_{N \times d} & I_{N} \end{matrix}) (\begin{array}{l} z \\ w \end{array}) - (\begin{matrix} A & I_{d} \\ O_{d \times N} & O_{d \times d} \\ I_{N} & O_{N \times d} \end{matrix}) (\begin{array}{l} u \\ e \end{array}) = (\begin{array}{l} 0 \\ y_{0} \\ 0 \end{array})

(26)

where

O_{d \times N}

is a zero matrix with size of

d \times N

.

In consideration of the characteristics of the objective function

L_{ρ} (w, e, y_{0}, z, u, λ_{1}, λ_{2}, λ_{3})

and the constraints (26), we have the following results for the iterative formulations (25):

\lim_{k \to \infty} (z^{k} - A u^{k} - e^{k}) = 0,

\lim_{k \to \infty} (w^{k} - u^{k}) = 0,

\lim_{k \to \infty} (z^{k} - y_{0}) = 0

and

L_{ρ} (w, e, y_{0}, z, u, λ_{1}, λ_{2}, λ_{3})

converges to the optimal value [15]. This completes the proof. □

For the aforementioned ISRC, we only consider one incomplete test sample. In the following, we will extent it to the case of a batch of test samples with missing values. Given a set of m incomplete test samples

{y_{0 i} \in ℝ^{d}}_{i = 1}^{m}

, we can construct a matrix

Y_{0} = (y_{01}, y_{02}, ..., y_{0 m}) \in ℝ^{d \times m}

and a two-dimensional index set

Ω^{'} \subset {1, 2, ..., d} \times {1, 2, ..., m}

, where

Ω^{'}

indicates the positions of missing values for

Y_{0}

.

We establish the batch learning model of incomplete sparse representation:

\begin{array}{l} \min_{W, E, Y, Z, U} {‖ W ‖}_{1} + τ {‖ E ‖}_{F}^{2}, \\ s . t . Z = A U + E, W = U, Z = Y, P_{Ω^{'}} (Y) = Y_{0} \end{array}

(27)

where

W \in ℝ^{N \times m}, U \in ℝ^{N \times m}, E \in ℝ^{d \times m}, Y \in ℝ^{d \times m}, Z \in ℝ^{d \times m}

,

P_{Ω^{'}} (\cdot) : ℝ^{d \times m} \to ℝ^{d \times m}

is a two-dimensional generalization of

P_{Ω} (\cdot)

,

{‖ \cdot ‖}_{1}

and

{‖ \cdot ‖}_{F}

are the component-wise l1-norm and the Frobenius norm of matrices respectively. Without considering the constraints

P_{Ω^{'}} (Y) = Y_{0}

, the augmented Lagrange function for Problem (27) is

\begin{array}{l} L_{ρ} (W, E, Y, Z, U, Λ) \\ = {‖ W ‖}_{1} + τ {‖ E ‖}_{F}^{2} + \frac{ρ}{2} ({‖ Z - A U - E + Λ_{1} / ρ ‖}_{F}^{2} + {‖ W - U + Λ_{2} / ρ ‖}_{F}^{2} + {‖ Z - Y + Λ_{3} / ρ ‖}_{F}^{2}) \end{array}

(28)

where

ρ > 0

,

Λ_{1} \in ℝ^{d \times m}, Λ_{2} \in ℝ^{N \times m}, Λ_{3} \in ℝ^{d \times m}

and

Λ = {Λ_{1}, Λ_{2}, Λ_{3}}

. When solving Problem (27), we adopt the similar iterative procedure with Problem (11) by minimizing or maximizing alternatively

L_{ρ} (W, E, Y, Z, U, Λ)

.

5. Experiments

This section demonstrates the effectiveness and the efficiency of ISRC by conducting experiments on Olivetti Research Laboratory (ORL) and Yale face datasets. We compare the results of the proposed method with that of NN and SRC.

5.1. Datasets Description and Experimental Setting

ORL dataset contains 10 different face images of each of 40 individuals [17]. These 400 images were captured at different time with different illuminations and varying facial details. The Yale face dataset consists of 165 images from 15 persons and there are 11 images for each person [18]. The images were taken with different illuminations, varying facial expressions and details. All images in both datasets are in grayscale and resized to be 64 × 64 for computational convenience. Hence, the dimensionality of each sample is

d = 4096

. Moreover, each sample is normalized to a unit vector in the sense of the l₂-norm due to the existence of variable illumination conditions and poses.

In ORL dataset, five images per person are randomly selected for training and the remaining five images for testing. In Yale dataset, we randomly choose six images per person as the training samples and the other images as the testing samples. For any sample

y

from the testing set, we generate randomly an index set

Ω

according to the Bernoulli distribution, i.e., the probability of

i \in Ω

is stipulated as p for arbitrary

i \in {1, 2, ..., d}

, where

p \in (0, 1]

. The probability p is also named the sampling probability and

p = 1

means that no entry is missing. Thus, an incomplete sample of

y

is expressed as

y_{0} = P_{Ω} (y)

. The generating manner of

Ω

indicates that the number of missing entries is approximately

p d

.

In Algorithm 1, the parameters are set as

τ = 10^{3}, ρ = 10^{- 8}, \bar{ρ} = 10^{10}, μ = 1.1, ε = 10^{- 8}

. For each dataset, we consider different values of p. For fixed p, the experiments are repeated 10 times and the average classification accuracies are reported. When carrying out NN, we compute the distance between

y_{0}

and

a_{i j}

as follows:

{‖ y_{0} - P_{Ω} (a_{i j}) ‖}_{2}

. In addition, all missing values are replaced with zeros in implementing SRC.

5.2. Experimental Analysis

We first compare the sparsity of the coefficient vectors obtained by SRC and ISRC respectively. Two different sampling probabilities are considered, that is, p = 0.1 and p = 0.3. The comparison results are partially shown in Figure 1. From this figure, we can see that each linear representation vector of ISRC has only a few relatively large components in the sense of absolute values and other values are close to zero. Compared with ISRC, SRC has worse sparsity performance due to the fact that its amplitude is relatively small. These observations show that ISRC has superiority over SRC in obtaining sparse representations.

Figure 1. Sparsity comparisons of the linear representations between SRC and ISRC.

Then, we compare the classification performance of ISRC with that of SRC and NN on the two given datasets. To this end, we vary the values of p from 0.1 to 1 with an interval of 0.1. When p = 1, ISRC becomes SRC. Figure 2 shows the comparison results of classification accuracy, where (a) and (b) represent the results of ORL and Yale respectively. It can be seen from this figure that ISRC achieves the best classification accuracy compared with SRC and NN, and it is relatively stable for different values of p. SRC is very sensitive to the choice of p and its classification accuracy degenerates steeply with the decreasing of p. In addition, NN has lower classification accuracy although it is stable. To sum it up, ISRC is the most robust method and has the best classification performance.

Figure 2. Classification performance comparisons among NN, SRC and ISRC.

For a test sample with missing values, both SRC and ISRC can recover all missing entries and noise to some extent. Finally, we compare their performance in completing missing entries and recovering the sparse noise. Here, we only consider two sampling probabilities, i.e., p = 0.1 and p = 0.3. For these two given probabilities, we compare the completed images and the recovered noise images obtained by SRC and ISRC respectively, as shown partially in Figure 3 and Figure 4.

Figure 3. Completion and recovery performance comparisons on ORL.

Figure 4. Completion and recovery performance comparisons on Yale.

In the above two figures, the sample probability is set to 0.1 in the first two lines of images and 0.3 in the latter two lines. For each figure, the first two columns of images display the original and the incomplete images respectively, where the positions with missing entries are shown in white. The third and the fifth columns of images give the completed images by SRC and ISRC respectively. The fourth and the last columns of images show the noise recovered by SRC and ISRC respectively. By comparing the completed images with the original images, we can see that ISRC not only has the better completion performance, but also automatically corrects the corruptions to a certain extent. Moreover, ISRC is more efficient in recovering noise than SRC. In summary, ISRC has better recovery performance than SRC.

6. Conclusions

This paper studies the problem of robust face classification with incomplete test samples. To address this problem, we propose a classification model based on incomplete sparse representation, which can be regarded as the generalization of sparse representation-based classification. Firstly, the incomplete sparse representation is described as an l₁-minimization and the alternating direction method of multipliers is employed to solve this optimization problem. Then, we analyze the convergence of the proposed algorithm and extend the model to the case of a batch of test samples. Finally, the experimental results on two well-known face datasets demonstrate that the proposed classification method is very effective in improving classification performance and recovering the missing entries and noise. It still needs further research on the model and algorithm of incomplete sparse representation. In the future, we will consider the sparse representation-based classification problem that both training and testing samples have missing values.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (No. 61403298, and No. 11401457), by the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2014JQ8323), by the Shaanxi Provincial Education Department (No. 2013JK0587).

Author Contributions

Jiarong Shi constructed the model, developed the algorithm and wrote the manuscript. Xiuyun Zheng designed the experiments. Wei Yang implemented all experiments. All three authors were involved in organizing and refining the manuscript. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Donoho, D. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
Candès, E.J.; Michael, W. An introduction to compressive sampling. IEEE Signal Process. Mag. 2008, 25, 21–30. [Google Scholar] [CrossRef]
Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [PubMed]
Qiao, L.; Chen, S.; Tan, X. Sparsity preserving discriminant analysis for single training image face recognition. Pattern Recogn. Lett. 2010, 31, 422–429. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, X.; Lei, B. Robust facial expression recognition via compressive sensing. Sensors 2012, 12, 3747–3761. [Google Scholar] [CrossRef] [PubMed]
Wright, J.; Ma, Y.; Mairal, J.; Sapiro, G.; Huang, T.; Yan, S. Sparse representation for computer vision and pattern recognition. Proc. IEEE 2010, 98, 1031–1044. [Google Scholar] [CrossRef]
Cheng, B.; Yang, J.; Yan, S.; Fu, Y.; Huang, T. Learning with L1-graph for image analysis. IEEE Trans. Image Process. 2010, 19, 858–866. [Google Scholar] [CrossRef] [PubMed]
Yin, J.; Liu, Z.; Jin, Z.; Yang, W. Kernel sparse representation based classification. Neurocomputing 2012, 77, 120–128. [Google Scholar] [CrossRef]
Huang, K.; Aviyente, S. Sparse representation for signal classification. Neural Inf. Proc. Syst. 2006, 19, 609–616. [Google Scholar]
Zhang, H.; Zha, Z.J.; Yang, Y.; Yan, S.; Chua, T.S. Robust (semi) nonnegative graph embedding. IEEE Trans. Image Process. 2014, 23, 2996–3012. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Zhang, S.; Zhao, X. Facial expression recognition via non-negative least-squares sparse coding. Information 2014, 5, 305–318. [Google Scholar] [CrossRef]
Elhamifar, E.; Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef] [PubMed]
Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [CrossRef]
Shi, J.; Yang, W.; Yong, L.; Zheng, X. Low-rank representation for incomplete data. Math. Probl. Eng. 2014, 2014. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM 2011, 58. [Google Scholar] [CrossRef]
ORL Database of Faces. Available online: http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html (accessed on 23 June 2015).
Yale Face Database. Available online: http://vision.ucsd.edu/content/yale-face-database (accessed on 23 June 2015).

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, J.; Zheng, X.; Yang, W. Robust Sparse Representation for Incomplete and Noisy Data. Information 2015, 6, 287-299. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030287

AMA Style

Shi J, Zheng X, Yang W. Robust Sparse Representation for Incomplete and Noisy Data. Information. 2015; 6(3):287-299. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030287

Chicago/Turabian Style

Shi, Jiarong, Xiuyun Zheng, and Wei Yang. 2015. "Robust Sparse Representation for Incomplete and Noisy Data" Information 6, no. 3: 287-299. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030287

Article Menu

Robust Sparse Representation for Incomplete and Noisy Data

Abstract

1. Introduction

2. Sparse Representation for Classification

3. Incomplete Sparse Representation for Classification

3.1. Model of Incomplete Sparse Representation

3.2. Generic Formulation of ADMM

3.3. Algorithm for Incomplete Sparse Representation

4. Convergence Analysis and Model Extension

5. Experiments

5.1. Datasets Description and Experimental Setting

5.2. Experimental Analysis

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI