HOSVD-Based Algorithm for Weighted Tensor Completion

Chao, Zehan; Huang, Longxiu; Needell, Deanna

doi:10.3390/jimaging7070110

Open AccessArticle

HOSVD-Based Algorithm for Weighted Tensor Completion

by

Zehan Chao

^†,

Longxiu Huang

^*,† and

Deanna Needell

^†

Department of Mathematics, University of California, Los Angeles, CA 90095, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Imaging 2021, 7(7), 110; https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging7070110

Submission received: 15 May 2021 / Revised: 17 June 2021 / Accepted: 2 July 2021 / Published: 7 July 2021

Download

Browse Figures

Versions Notes

Abstract

:

Matrix completion, the problem of completing missing entries in a data matrix with low-dimensional structure (such as rank), has seen many fruitful approaches and analyses. Tensor completion is the tensor analog that attempts to impute missing tensor entries from similar low-rank type assumptions. In this paper, we study the tensor completion problem when the sampling pattern is deterministic and possibly non-uniform. We first propose an efficient weighted Higher Order Singular Value Decomposition (HOSVD) algorithm for the recovery of the underlying low-rank tensor from noisy observations and then derive the error bounds under a properly weighted metric. Additionally, the efficiency and accuracy of our algorithm are both tested using synthetic and real datasets in numerical simulations.

Keywords:

HOSVD decomposition; tensor completion; weighted tensor

1. Introduction

In many data-rich domains such as computer vision, neuroscience, and social networks, tensors have emerged as a powerful paradigm for handling the data deluge. In recent years, tensor analysis has gained more and more attention. To a certain degree, tensors can be viewed as the generalization of matrices to higher dimensions, and thus multiple questions from matrix analysis extend naturally to tensors. Similar to matrix decomposition, the problem of tensor decomposition (decomposing an input tensor into several less complex components) has been widely studied both in theory and application (see e.g., [1,2,3]). Thus far, the problem of low-rank tensor completion, which aims to complete missing or unobserved entries of a low-rank tensor, is one of the most actively studied problems (see e.g., [4,5,6,7]). It is noteworthy that, as caused by various unpredictable or unavoidable reasons, multidimensional datasets are commonly raw and incomplete, and thus often only a small subset of entries of tensors are available. It is, therefore, natural to address the above issue using tensor completion in modern data-driven applications, in which data are naturally represented as a tensor, such as image/video inpainting [5,8], link-prediction [9], and recommendation systems [10], to name a few.

In the past few decades, the matrix completion problem, which is a special case of tensor completion, has been extensively studied. In matrix completion, there are mature algorithms [11], theoretical foundations [12,13,14] and various applications [15,16,17,18] that pave the way for solving the tensor completion problem in high-order tensors. Recently, Foucart et al. [19] proposed a simple algorithm for matrix completion for general deterministic sampling patterns, and raised the following questions: given a deterministic sampling pattern

Ω

and corresponding (possibly noisy) observations of the matrix entries, what type of recovery error can we expect? In what metric? How can we efficiently implement recovery? These were investigated in [19] by introducing an appropriate weighted error metric for matrix recovery of the form

∥ H ⊡ (\hat{M} - M) ∥_{F}

, where M is the true underlying low-rank matrix,

\hat{M}

refers to the recovered matrix, and H is a best rank-1 matrix approximation for the sampling pattern

Ω

. In this regard, similar questions arise for the problem of tensor completion with deterministic sampling patterns. Unfortunately, as is often the case, moving from the matrix setting to the tensor setting presents non-trivial challenges, and notions such as rank and SVD need to be re-defined and re-evaluated. We address these extensions for the completion problem here.

Motivated by the matrix case, we propose an appropriate weighted error metric for tensor recovery of the form

∥ H ⊡ (\hat{T} - T) ∥_{F}

, where

T

is the true underlying low-rank tensor,

\hat{T}

is the recovered tensor, and

H

is an appropriate weight tensor. For the existing work, the error is only limited to the form

∥ \hat{T} {- T ∥}_{F}

, which corresponds to the case that all the entries of

H

are 1, where

H

can be considered to be a CP rank-1 tensor. It motivates us to rephrase the questions mentioned above as follows.

Main questions. Given a sampling pattern

Ω

, and noisy observations

T + Z

on

Ω

, for what rank-one weight tensor

H

can we efficiently find a tensor

\hat{T}

so that

∥ H ⊡ (\hat{T} - T) ∥_{F}

is small compared to

{∥H∥}_{F}

? And how can we efficiently find such weight tensor

H

, or determine that a fixed

H

has this property?

1.1. Contributions

Our main goal is to provide an algorithmic tool, theoretical analysis, and numerical results that address the above questions. In this paper, we propose a simple weighted Higher Order Singular Value Decomposition (HOSVD) method. Before we implement the weighted HOSVD algorithm, we first appropriately approximate the sampling pattern

Ω

with a rank one tensor

H

. We can achieve high accuracy if

∥ H - H^{(- 1)} ⊡ 1_{Ω} ∥_{F}

is small, where

H^{(- 1)}

denotes the element-wise inverse. Finally, we present empirical results on synthetic and real datasets. The simulation results show that when the sampling pattern is non-uniform, the use of weights in the weighted HOSVD algorithm is essential, and the results of the weighted HOSVD algorithm can provide a very good initialization for the total variation minimization algorithm which can dramatically reduce the iterative steps without lose the accuracy. In doing so, we extend the weighted matrix completion results of [19] to the tensor setting.

1.2. Organization

The paper is organized as follows. In Section 2, we give a brief review of related work and concepts for tensor analysis, instantiate notations, and state the tensor completion problem under study. Our main results are stated in Section 3 and the proofs are provided in Appendix A and Appendix B. The numerical results are provided and discussed in Section 4.

2. Related Work, Background, and Problem Statement

In this section, we give a brief overview of the works that are related to ours, introduce some necessary background information about tensors, and finally give a formal statement of tensor completion problem under study. The related work can be divided into two lines: that based on matrix completion problems, which leads to a discussion of weighted matrix completion and related work, and that based on tensor analysis, in which we focus on CP and Tucker decompositions.

2.1. Matrix Completion

The matrix completion problem is to determine a complete

d_{1} \times d_{2}

matrix M from its partial entries on a subset

Ω \subseteq [d_{1}] \times [d_{2}]

. We use

1

_{Ω}

to denote the matrix whose entries are 1 on

Ω

and 0 elsewhere so that the entries of

M_{Ω} = 1_{Ω} ⊡ M

are equal to those of the matrix M on

Ω

, and are equal to 0 elsewhere, where ⊡ denotes the Hadamard product. There are various works that aim to understand matrix completion with respect to the sampling pattern

Ω

. For example, the works in [20,21,22] relate the sampling pattern

Ω

to a graph whose adjacency matrix is given by

1_{Ω}

and show that as long as the sampling pattern

Ω

is suitably close to an expander, efficient recovery is possible when the given matrix M is sufficiently incoherent. Mathematically, the task of understanding when there exists a unique low-rank matrix M that can complete

M_{Ω}

as a function of the sampling pattern

Ω

is very important. In [23], the authors give conditions on

Ω

under which there are only finitely many low-rank matrices that agree with

M_{Ω}

, and the work of [24] gives a condition under which the matrix can be locally uniquely completed. The work in [25] generalized the results of [23,24] to the setting where there is sparse noise added to the matrix. The works [26,27] study when rank estimation is possible as a function of a deterministic pattern

Ω

. Recently, [28] gave a combinatorial condition on

Ω

that characterizes when a low-rank matrix can be recovered up to a small error in the Frobenius norm from observations in

Ω

and showed that nuclear minimization will approximately recover M whenever it is possible, where the nuclear norm of M is defined as

{∥ M ∥}_{*} : = \sum_{i = 1}^{r} σ_{i}

with

σ_{1}, \dots, σ_{r}

the non-zero singular values of M.

All the works mentioned above are in the setting where recovery of the entire matrix is possible, but in many cases full recovery is impossible. Ref. [29] uses an algebraic approach to answer the question of when an individual entry can be completed. There are many works (see e.g., [30,31]) that introduce a weight matrix for capturing the recovery results of the desired entries. The work [21] shows that, for any weight matrix, H, there is a deterministic sampling pattern

Ω

and an algorithm that returns

\hat{M}

using the observation

M_{Ω}

such that

∥ H ⊡ (\hat{M} - M) ∥_{F}

is small. The work [32] generalizes the algorithm in [21] to find the “simplest” matrix that is correct on the observed entries. Succinctly, their works give a way of measuring which deterministic sampling patterns,

Ω

, are “good” with respect to a weight matrix H. In contrast to these two works, [19] is interested in the problem of whether one can find a weight matrix H and create an efficient algorithm to find an estimate

\hat{M}

for an underlying low-rank matrix M from a sampling pattern

Ω

and noisy samples

M_{Ω} + Z_{Ω}

such that

∥ H ⊡ (\hat{M} - M) ∥_{F}

is small.

In particular, one of our theoretical results is that we generalize the upper bounds for weighted recovery of low-rank matrices from deterministic sampling patterns in [19] to the upper bound of tensor weighted recovery. The details of the connection between our result and the matrix setting result in [19] is discussed in Section 3.

2.2. Tensor Completion Problem

Tensor completion is the problem of filling in the missing elements of partially observed tensors. Similar to the matrix completion problem, low rankness is often a necessary hypothesis to restrict the degrees of freedom of the missing entries for the tensor completion problem. Since there are multiple definitions of the rank of a tensor, this completion problem has several variations.

The most common tensor completion problems [5,33] may be summarized as follows (we will define the different ranks subsequently, see further on in this section).

Definition 1

(Low-rank tensor completion (LRTC) [7]). Given a low-rank (CP rank, Tucker rank, or other ranks) tensor

T

and sampling pattern Ω, the low-rank completion of

T

is given by the solution of the following optimization problem:

\begin{matrix} \min_{X} {rank}_{*} (X) \\ subject to X_{Ω} = T_{Ω}, \end{matrix}

(1)

where

{rank}_{*}

denotes the specific tensor rank assumed at the beginning.

In the literature, there are many variants of LRTC but most of them are based on the following questions:

(1): What type of the rank should one use (see e.g., [34,35,36])?
(2): Are there any other restrictions based on the observations that one can assume (see e.g., [5,37,38])?
(3): Under what conditions can one expect to achieve a unique and exact completion (see e.g., [34])?

In the rest of this section, we instantiate some notations and review basic operations and definitions related to tensors. Then some tensor decomposition-based algorithms for tensor completion are stated. Finally, a formal problem statement under study will be presented.

2.2.1. Preliminaries and Notations

Tensors, matrices, vectors, and scalars are denoted in different typeface for clarity below. In the sequel, calligraphic boldface capital letters are used for tensors, capital letters are used for matrices, lower boldface letters for vectors, and regular letters for scalars. The set of the first d natural numbers is denoted by

[d] : = {1, \dots, d}

. Let

X \in R^{d_{1} \times \dots \times d_{n}}

and

α \in R

,

X^{(α)}

represents the element-wise power operator, i.e.,

{(X^{(α)})}_{i_{1} \dots i_{n}} = X_{i_{1} \dots i_{n}}^{α}

.

1_{Ω} \in R^{d_{1} \times \dots \times d_{n}}

denotes the tensor with 1 on

Ω

and 0 otherwise. We use

X ≻ 0

to denote the tensor with

X_{i_{1} \dots i_{n}} > 0

for all

i_{1}, \dots, i_{n}

. Moreover, we say that

Ω \sim W

if the entries of

X

are sampled randomly with the sampling set

Ω

such that

(i_{1}, \dots, i_{n}) \in Ω

with probability

W_{i_{1} \dots i_{n}}

. We include here some basic notions relating to tensors, and refer the reader to e.g., [2] for a more thorough survey.

Definition 2

(Tensor). A tensor is a multidimensional array. The dimension of a tensor is called the order (also called the mode). The space of real tensors of order n and size

d_{1} \times \dots \times d_{n}

is denoted as

R^{d_{1} \times \dots \times d_{n}}

. The elements of a tensor

X \in R^{d_{1} \times \dots \times d_{n}}

are denoted by

X_{i_{1} \dots i_{n}}

.

An n-order tensor

X

can be matricized in n ways by unfolding it along each of the n modes. The definition for the matricization of a given tensor is stated below.

Definition 3

(Matricization/unfolding of a tensor). The mode-k matricization/unfolding of tensor

X \in R^{d_{1} \times \dots \times d_{n}}

is the matrix, which is denoted as

X_{(k)} \in R^{d_{k} \times \prod_{j \neq k} d_{j}}

, whose columns are composed of all the vectors obtained from

X

by fixing all indices except for the k-th dimension. The mapping

X \mapsto X_{(k)}

is called the mode-k unfolding operator.

Example 1.

Let

X \in R^{3 \times 4 \times 2}

with the following frontal slices:

X_{1} = [\begin{matrix} 1 & 4 & 7 & 10 \\ 2 & 5 & 8 & 11 \\ 3 & 6 & 9 & 12 \end{matrix}] X_{2} = [\begin{matrix} 13 & 16 & 19 & 22 \\ 14 & 17 & 20 & 23 \\ 15 & 18 & 21 & 24 \end{matrix}],

then the three mode-n matricizations are

\begin{matrix} X_{(1)} & = & [\begin{matrix} 1 & 4 & 7 & 10 & 13 & 16 & 19 & 22 \\ 2 & 5 & 8 & 11 & 14 & 17 & 20 & 23 \\ 3 & 6 & 9 & 12 & 15 & 18 & 21 & 24 \end{matrix}], \\ X_{(2)} & = & [\begin{matrix} 1 & 2 & 3 & 13 & 14 & 15 \\ 4 & 5 & 6 & 16 & 17 & 18 \\ 7 & 8 & 9 & 19 & 20 & 21 \\ 10 & 11 & 12 & 22 & 23 & 24 \end{matrix}], \\ X_{(3)} & = & [\begin{matrix} 1 & 2 & 3 & \dots & 10 & 11 & 12 \\ 13 & 14 & 15 & \dots & 22 & 23 & 24 \end{matrix}] . \end{matrix}

Definition 4

(Folding operator). Suppose that

X

is a tensor. The mode-k folding operator of a matrix

M = X_{(k)}

, denoted as

{fold}_{k} (M)

, is the inverse operator of the unfolding operator.

Definition 5

(∞-norm). Given

X \in R^{d_{1} \times \dots \times d_{n}}

, the norm

{∥X∥}_{\infty}

is defined as

{∥X∥}_{\infty} = max_{i_{1}, \dots, i_{n}} | X_{i_{1} \dots i_{n}} | .

The unit ball under the ∞-norm is denoted by

B

_{\infty}

.

Definition 6

(Frobenius norm). The Frobenius norm for a tensor

X \in R^{d_{1} \times \dots \times d_{n}}

is defined as

{∥X∥}_{F} = \sqrt{\sum_{i_{1}, \dots, i_{n}} X_{i_{1} \dots i_{n}}^{2}} .

Definition 7

(Max-norm for matrix). Given

X \in R^{d_{1} \times d_{2}}

, the max-norm for X is defined as

{∥X∥}_{m a x} = min_{X = U V^{T}} {∥U∥}_{2, \infty} {∥V∥}_{2, \infty} .

Definition 8

(Product operations).

Outer product: Let $a_{1} \in R^{d_{1}}, \dots, a_{n} \in R^{d_{n}}$ . The outer product among these n vectors is a tensor $X \in R^{d_{1} \times \dots \times d_{n}}$ defined as:

$X = a_{1} \otimes \dots \otimes a_{n}, X_{i_{1}, \dots, i_{n}} = \prod_{k = 1}^{n} a_{k} (i_{k}) .$

The tensor $X \in R^{d_{1} \times \dots \times d_{n}}$ is of rank one if it can be written as the outer product of n vectors.
Kronecker product of matrices: The Kronecker product of $A \in R^{I \times J}$ and $B \in R^{K \times L}$ is denoted by $A \otimes B$ . The result is a matrix of size $(K I) \times (J L)$ defined by

$\begin{matrix} A \otimes B & = & [\begin{matrix} A_{11} B & A_{12} B & \dots & A_{1 J} B \\ A_{21} B & A_{22} B & \dots & A_{2 J} B \\ ⋮ & ⋮ & ⋱ & ⋮ \\ A_{I 1} B & A_{I 2} B & \dots & A_{I J} B \end{matrix}] . \end{matrix}$
Khatri-Rao product: Given matrices $A \in R^{d_{1} \times r}$ and $B \in R^{d_{2} \times r}$ , their Khatri-Rao product is denoted by $A ⊙ B$ . The result is a matrix of size $(d_{1} d_{2}) \times r$ defined by

$A ⊙ B = [\begin{matrix} a_{1} \otimes b_{1} & \dots & a_{r} \otimes b_{r} \end{matrix}],$

where $a_{i}$ and $b_{i}$ stand for the i-th column of A and B respectively.
Hadamard product: Given $X, Y \in R^{d_{1} \times \dots \times d_{n}}$ , their Hadamard product $X ⊡ Y \in R^{d_{1} \times \dots \times d_{n}}$ is defined by element-wise multiplication, i.e.,

${(X ⊡ Y)}_{i_{1} \dots i_{n}} = X_{i_{1} \dots i_{n}} Y_{i_{1} \dots i_{n}} .$
Mode-k product: Let $X \in R^{d_{1} \times \dots \times d_{n}}$ and $U \in R^{d_{k} \times J}$ , the multiplication between $X$ on its mode-k with U is denoted as $Y = X \times_{k} U$ with

$Y_{i_{1}, \dots, i_{k - 1}, j, i_{k + 1}, \dots, i_{n}} = \sum_{s = 1}^{d_{k}} X_{i_{1}, \dots, i_{k - 1}, s, i_{k + 1}, \dots, i_{n}} U_{s, j} .$

Definition 9

(Tensor (CP) rank [1,39]). The (CP) rank of a tensor

X

, denoted

rank (X)

, is defined as the smallest number of rank-1 tensors that generate

X

as their sum. We use

K_{r}

to denote the cone of rank-r tensors.

Given

^{k} M \in R^{d_{k} \times r}

, we use

〚^{1} M, \dots,^{n} M 〛

to denote the CP representation of tensor

X

, i.e.,

X = \sum_{j = 1}^{r} (^{1} M (:, j) \otimes \dots \otimes^{n} M (:, j)),

where

M (:, j)

means the j-th column of the matrix M.

Different from the case of matrices, the rank of a tensor is not presently well understood. Additionally, the task of computing the rank of a tensor is an NP-hard problem [40]. Next we introduce an alternative definition of the rank of a tensor, which is easy to compute.

Definition 10

(Tensor Tucker rank [39]). Let

X \in R^{d_{1} \times \dots \times d_{n}}

. The tuple

(r_{1}, \dots, r_{n}) \in N^{n}

is called the Tucker rank of the tensor

X

, where

r_{k} = rank (X_{(k)})

. We use

K_{r}

to denote the cone of tensors with Tucker rank

r

.

Tensor decompositions are powerful tools for extracting meaningful, latent structures in heterogeneous, multidimensional data (see e.g., [2]). In this paper, we focus on two most widely used decomposition methods: CP and HOSVD. For more comprehensive introduction, readers are referred to [2,41,42].

2.2.2. CP-Based Method for Tensor Completion

The CP decomposition was first proposed by Hitchcock [1] and further discussed in [43]. The formal definition of the CP decomposition is the following.

Definition 11

(CP decomposition). Given a tensor

X \in R^{d_{1} \times \dots \times d_{n}}

, its CP decomposition is an approximation of n loading matrices

A_{k} \in R^{d_{k} \times r}

,

k = 1, \dots, n

, such that

X \approx 〚 A_{1}, \dots, A_{n} 〛 = \sum_{i = 1}^{r} A_{1} (:, i) \otimes \dots \otimes A_{n} (:, i),

where r is a positive integer denoting an upper bound of the rank of

X

and

A_{k} (:, i)

is the i-th column of matrix

A_{k}

. If we unfold

X

along its k-th mode, we have

X_{(k)} \approx A_{k} {(A_{1} ⊙ \dots ⊙ A_{k - 1} ⊙ A_{k + 1} ⊙ \dots ⊙ A_{n})}^{T} .

Here the ≈ sign means that the algorithm should find an optimal

\hat{X}

with the given rank such that the distance between the low-rank approximation and the original tensor,

∥ X - \hat{X} ∥_{F}

, is minimized.

Given an observation set

Ω

, the main idea to implement tensor completion for a low-rank tensor

T

is to conduct imputation based on the equation

X = T_{Ω} + {\hat{X}}_{Ω^{c}},

where

\hat{X} = 〚 A_{1}, \dots, A_{n} 〛

is the interim low-rank approximation based on the CP decomposition,

X

is the recovered tensor used in next iteration for decomposition, and

Ω^{c} = \{(i_{1}, \dots, i_{n}) : 1 \leq i_{k} \leq d_{k}\} ∖ Ω

. For each iteration, we usually estimate the matrices

A_{k}

using the alternating least squares optimization method (see e.g., [44,45,46]).

2.2.3. HOSVD-Based Method for Tensor Completion

The Tucker decomposition was proposed by Tucker [47] and further developed in [48,49].

Definition 12

(Tucker decomposition). Given an n-order tensor

X

, its Tucker decomposition is defined as an approximation of a core tensor

C \in R^{r_{1} \times \dots \times r_{n}}

multiplied by n factor matrices

A_{k} \in R^{d_{k} \times r_{k}}

,

k = 1, \dots, n

along each mode, such that

X \approx C \times_{1} A_{1} \times_{2} \dots \times_{n} A_{n} = 〚 C; A_{1}, \dots, A_{n} 〛,

where

r_{k}

is a positive integer denoting an upper bound of the rank of the matrix

X_{(k)}

.

If we unfold

X

along its k-th mode, we have

X_{(k)} \approx A_{k} C_{(k)} {(A_{1} \otimes \dots \otimes A_{k - 1} \otimes A_{k + 1} \otimes \dots \otimes A_{n})}^{T}

Tucker decomposition is a widely used tool for tensor completion. To implement Tucker decomposition, one popular method is called the higher-order SVD (HOSVD) [47]. The main idea of HOSVD is:

Unfold $X$ along mode k to obtain matrix $X_{(k)}$ ;
Find the economic SVD decomposition of $X_{(k)} =^{k} U^{k} Σ^{k} V^{T}$ ;
Set $A_{k}$ to be the first $r_{k}$ columns of $^{k} U$ ;
$C = X \times_{1} A_{1}^{T} \times_{2} \dots \times_{n} A_{n}^{T}$ .

If we want to find a Tucker rank

r = [r_{1}, \dots, r_{n}]

approximation for the tensor

X

via HOSVD process, we just replace

A_{k}

by the first

r_{k}

columns of

U_{k}

.

2.2.4. Tensor Completion Problem under Study

In our setting, it is supposed that

T

is an unknown tensor in

K_{r} \cap β B_{\infty}

or

K_{r} \cap β B_{\infty}

. Fix a sampling pattern

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

and the weight tensor

W

. Our goal is to design an algorithm that gives provable guarantees for a worst-case

T

, even if it is adapted to

Ω

.

In our algorithm, the observed data are

T_{Ω} + Z_{Ω} = 1_{Ω} ⊡ (T + Z)

, where

Z_{i_{1} \dots i_{n}} \sim N (0, σ^{2})

are i.i.d. Gaussian random variables. From the observations, the goal is to learn something about

T

. In this paper, instead of measuring our recovered results with the underlying true tensor in a standard Frobenius norm

∥ T - \hat{T} ∥_{F}

, we are interested in learning

T

using a weighted Frobenius norm, i.e., to develop an efficient algorithm to find

\hat{T}

so that

{∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F}

is as small as possible for some weight tensor

W

. When measuring the weighted error, it is important to normalize appropriately to understand the meaning of the error bounds. In our results, we always normalize the error bounds by

{∥W^{(1 / 2)}∥}_{F}

. It is noteworthy that

\begin{matrix} \frac{{∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F}}{{∥W^{(1 / 2)}∥}_{F}} & = & {(\sum_{i_{1}, \dots, i_{n}} \frac{W_{i_{1} \dots i_{n}}}{\sum_{i_{1}, \dots, i_{n}} W_{i_{1}, \dots, i_{n}}} {(T_{i_{1} \dots i_{n}} - {\hat{T}}_{i_{1} \dots i_{n}})}^{2})}^{1 / 2}, \end{matrix}

which gives a weighted average of the per entry squared error. Generally, our problem can be formally stated below.

Problem: Weighted Universal Tensor Completion
Parameters:

Dimensions $d_{1}, \dots, d_{n}$ ;
A sampling pattern $Ω \subseteq [d_{1}] \times \dots \times [d_{n}]$ ;
Parameters $σ, β > 0$ , r or $r = [r_{1} \dots r_{n}]$ ;
A rank-1 weight tensor $W \in R^{d_{1} \times \dots \times d_{n}}$ so that $W_{i_{1} \dots i_{n}} > 0$ for all $i_{1}, \dots, i_{n}$ ;
A set K (e.g., $K_{r} \cap β B_{\infty}$ or $K_{r} \cap β B_{\infty}$ ).

Goal: Design an efficient algorithm

A

with the following guarantees:

$A$ takes as input entries $T_{Ω} + Z_{Ω}$ so that $Z_{i_{1} \dots i_{n}} \sim N (0, σ^{2})$ are i.i.d.;
$A$ runs in polynomial time;
With high probability over the choice of $Z$ , $A$ returns an estimate $\hat{T}$ of $T$ so that

$\frac{{∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F}}{{∥W^{(1 / 2)}∥}_{F}} \leq δ$

for all $T \in K$ , where $δ$ depends on the problem parameters.

Remark 1

(Strictly positive

W

). The requirement that

W_{i_{1} \dots i_{n}}

is strictly greater than zero is a generic condition. In fact, if

W_{i_{1} \dots i_{n}} = 0

for some

(i_{1}, \dots, i_{n})

, some mode k with index

i_{k}

of

W

is zero, then we can reduce the problem to a smaller one by ignoring that mode k with index

i_{k}

.

3. Main Results

In this section, we state informal versions of our main results. With fixed sampling pattern

Ω

and weight tensor

W

, we can find

\hat{T}

by solving the following optimization problem:

\hat{T} = W^{(- 1 / 2)} ⊡ \underset{rank (X) = r}{argmin} {∥X - W^{(- 1 / 2)} ⊡ Y_{Ω}∥}_{F},

(2)

or

\hat{T} = W^{(- 1 / 2)} ⊡ \underset{Tucker - rank (X) = r}{argmin} {∥X - W^{(- 1 / 2)} ⊡ Y_{Ω}∥}_{F},

(3)

where

Y_{Ω} \in R^{d_{1} \times \dots \times d_{n}}

with

Y_{Ω} (i_{1}, \dots, i_{n}) = \{\begin{matrix} T_{i_{1} \dots i_{n}} + Z_{i_{1} \dots i_{n}} & if (i_{1}, \dots, i_{n}) \in Ω \\ 0 & if (i_{1}, \dots, i_{n}) \notin Ω \end{matrix} .

It is known that solving (2) is NP-hard [40]. However, there are some polynomial time algorithms to find approximate solutions for (2) such that the approximation is (empirically) close to the actual solution of (2) in terms of the Frobenius norm. In our numerical experiments, we solve (2) via the CP-ALS algorithm [43]. To solve (3), we use the HOSVD process [48]. Assume that

T

has Tucker rank

r = [r_{1}, \dots, r_{n}]

. Let

{\hat{A}}_{i} = \underset{rank (A) = r_{i}}{argmin} {∥A - {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(i)}∥}_{2}

and set

{\hat{U}}_{i}

to be the left singular vector matrix of

{\hat{A}}_{i}

. Then the estimated tensor is of the form

\hat{T} = W^{(- 1 / 2)} ⊡ ((W^{(- 1 / 2)} ⊡ Y_{Ω}) \times_{1} {\hat{U}}_{1} {\hat{U}}_{1}^{T} \times_{2} \dots \times_{n} {\hat{U}}_{n} {\hat{U}}_{n}^{T} .

In the following, we call this the weighted HOSVD algorithm.

3.1. General Upper Bound

Suppose that the optimal solution

\hat{T}

for (2) or (3)

\hat{T}

can be found, we would like to give an upper bound estimations for

∥ W^{(1 / 2)} ⊡ (T - \hat{T}) ∥_{F}

with some proper weight tensor

W

.

Theorem 1.

Let

W = w_{1} \otimes \dots \otimes w_{n} \in R^{d_{1} \times \dots \times d_{n}}

have strictly positive entries, and fix

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

. Suppose that

T \in R^{d_{1} \times \dots \times d_{n}}

has rank r for problem (2) or Tucker rank

r = [r_{1}, \dots, r_{n}]

for problem (3), and let

\hat{T}

be the optimal solutions for (2) or (3). Suppose that

Z_{i_{1} \dots i_{n}} \sim N (0, σ^{2})

. Then with probability at least

1 - 2^{- | Ω | / 2}

over the choice of

Z

,

{∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F} \leq 2 {∥T∥}_{\infty} {∥W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω}∥}_{F} + 4 σ μ \sqrt{| Ω | log (2)},

Recall here,

{(W^{(1 / 2)})}_{i_{1} \dots i_{n}} = W_{i_{1} \dots i_{n}}^{(1 / 2)}

and

{(W^{(- 1 / 2)})}_{i_{1} \dots i_{n}} = W_{i_{1} \dots i_{n}}^{(- 1 / 2)}

as defined in Section 2.2.1 and

μ^{2} = {max}_{(i_{1}, \dots, i_{n}) \in Ω} \frac{1}{W_{i_{1} \dots i_{n}}}

.

Notice that the upper bound in Theorem 1 is for the optimal output

\hat{T}

for problems (2) and (3), which is general. However, the upper bound in Theorem 1 contains no rank information of the underlying tensor

T

. To introduce the rank information of the underlying tensor

T

, we restrict our analysis for Problem (3) by considering the HOSVD process in the sequel.

3.2. Results for Weighted HOSVD Algorithm

In this section, we begin by giving a general upper bound for the weighted HOSVD algorithm.

3.2.1. General Upper Bound for Weighted HOSVD

Theorem 2

(Informal, see Theorem A1). Let

W = w_{1} \otimes \dots \otimes w_{n} \in R^{d_{1} \times \dots \times d_{n}}

have strictly positive entries, and fix

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

. Suppose that

T \in R^{d_{1} \times \dots \times d_{n}}

has Tucker rank

r = [r_{1}, \dots, r_{n}]

. Suppose that

Z_{i_{1} \dots i_{n}} \sim N (0, σ^{2})

and let

\hat{T}

be the estimate of the solution of (3) via the HOSVD process. Then

\begin{matrix} {∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F} ≲ (\sum_{k = 1}^{n} \sqrt{r_{k} log (d_{k} + \prod_{j \neq k} d_{j})} μ_{k}) σ \\ + (\sum_{k = 1}^{n} r_{k} {∥{(W^{(- 1 / 2)} ⊡ 1_{Ω} - W^{(1 / 2)})}_{(k)}∥}_{2}) {∥T∥}_{\infty}, \end{matrix}

with high probability over the choice of

Z

, where

μ_{k}^{2} = max \{max_{i_{k}} (\sum_{i_{1}, \dots, i_{k - 1}, i_{k + 1}, \dots, i_{n}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{W_{i_{1} i_{2} \dots i_{n}}}), max_{i_{1}, \dots, i_{k - 1}, i_{k + 1}, \dots, i_{n}} (\sum_{i_{k}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{W_{i_{1} i_{2} \dots i_{n}}})\} .

and

a ≲ b

means that

a \leq c b

for some universal constant

c > 0

.

Remark 2.

The upper bound in [19] suggests

∥ W^{(1 / 2)} ⊡ (M - \hat{M}) ∥_{F} \leq 2 \sqrt{2} r λ {∥ M ∥}_{\infty}

+ 4 \sqrt{2} σ μ_{1} \sqrt{r log (d_{1} + d_{2})}

, where

λ = ∥ W^{(1 / 2)} - W^{(- 1 / 2)} \circ 1_{Ω} ∥

and

μ_{1}^{2} = m a x_{(i, j) \in Ω} \frac{1}{W_{i j}}

, where

\hat{M}

is obtained by considering the truncated SVD decompositions. Notice that in our result, when

n = 2

, the upper bound becomes

2 \sqrt{r log (d_{1} + d_{2})} μ σ + 2 r ∥ W^{(1 / 2)} - W^{(- 1 / 2)} \circ 1_{Ω} {∥ ∥ M ∥}_{\infty}

with

μ^{2} = max {∥ 1_{Ω} \circ W^{(- 1)} ∥_{\infty}, ∥ 1_{Ω} \circ W^{(- 1)} ∥_{1}}

. Since

μ

in our work is much bigger than the

μ_{1}

in [19], the bound in our work is weaker than the one in [19]. The reason is that in order to obtain a general bound for all tensor, the fact that the optimal approximations

\hat{M}

for a given matrix in the spectral norm and Frobenious norm are the same cannot be applied.

3.2.2. Case Study: When $Ω \sim W$

To understand the bounds mentioned above, we also study the case when

Ω \sim W

such that

∥ {(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(k)} ∥_{2}

is small for

k = 1, \dots, n

. Even though the samples are taken randomly in this case, our goal is to understand our upper bounds for deterministic sampling pattern

Ω

. To make sure that

∥ {(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(k)} ∥_{2}

is small, we need to assume that each entry of

W

is not too small. For this case, we have the following main results.

Theorem 3

(Informal, see Theorems A2 and A7). Let

W = w_{1} \otimes \dots \otimes w_{n} \in R^{d_{1} \times \dots \times d_{n}}

be a CP rank-1 tensor so that for all

(i_{1}, \dots, i_{n}) \in [d_{1}] \times \dots \times [d_{n}]

we have

W_{i_{1} \dots i_{n}} \in [\frac{1}{\sqrt{d_{1} \dots d_{n}}}, 1]

. Suppose that

Ω \sim W

.

Upper bound: Then the following holds with high probability.
For our weighted HOSVD algorithm $A$ , for any Tucker $rank- r$ tensor $T$ with ${∥T∥}_{\infty} \leq β$ , $A$ returns $\hat{T} = A (T_{Ω} + Z_{Ω})$ so that with high probability over the choice of $Z$ ,

$\begin{matrix} \frac{{∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F}}{{∥W^{(1 / 2)}∥}_{F}} & ≲ \frac{1}{\sqrt{| Ω |}} (β n^{2} r d^{\frac{n - 1}{2}} log (d) + σ n^{2} r^{1 / 2} d^{\frac{n - 1}{2}}), \end{matrix}$

where $r = {max}_{k} {r_{k}}$ and $d = {max}_{k} {d_{k}}$ .
Lower bound: If additionally, $W$ is flat (the entries of $W$ are close), then for our weighted HOSVD algorithm $A$ , there exists some $T \in K_{r} \cap β B_{\infty}$ so that with probability at least $\frac{1}{2}$ over the choice of $Z$ ,

$\begin{matrix} \frac{{∥W^{(1 / 2)} ⊡ (A (T_{Ω} + Z_{Ω}) - T)∥}_{F}}{{∥W^{(1 / 2)}∥}_{F}} \\ ≳ & min \{\frac{σ}{\sqrt{| Ω |}} {(\frac{\tilde{r} \tilde{d}}{\tilde{d} + 2 C^{' 2} \tilde{r}})}^{\frac{n}{2}}, \frac{σ}{\sqrt{| Ω |}} {(\frac{\tilde{r} \tilde{d}}{{(\sqrt{\tilde{d}} + \sqrt{2 \tilde{r} log (\tilde{r})} C^{'})}^{2}})}^{\frac{n}{2}}, \frac{β}{\sqrt{n log (\tilde{d})}}\}, \end{matrix}$

where $\tilde{r} = {min}_{k} {r_{k}}$ , $\tilde{d} = {min}_{k} {d_{k}}$ , and $C^{'}$ is some constant to measure the “flatness" of $W$ .

Remark 3.

The formal statements in Theorems A2 and A7 are more general than the statements in Theorem 3.

4. Experiments

4.1. Simulations for Uniform Sampling Pattern

In this section, we test the performance of our weighted HOSVD algorithm when the sampling pattern arises from uniform random sampling. Consider a tensor

T

of the form

T = C \times_{1} U_{1} \times_{2} \dots \times_{n} U_{n}

, where

U_{i} \in R^{d_{i} \times r_{i}}

and

C \in R^{r_{1} \times \dots \times r_{n}}

. Let

Z

be a Gaussian random tensor with

Z_{i_{1} \dots i_{n}} \in N (0, σ)

and

Ω

be the sampling pattern set according to uniform sampling. In this simulation, we compare the results of numerical experiments for using the HOSVD algorithm to solve

\hat{T} = \underset{Tucker_rank (X) = r}{argmin} {∥X - Y_{Ω}∥}_{F},

(4)

\hat{T} = \underset{Tucker_rank (X) = r}{argmin} {∥X - \frac{1}{p} Y_{Ω}∥}_{F},

(5)

and

\hat{T} = W^{(- 1 / 2)} ⊡ \underset{Tucker_rank (X) = r}{argmin} {∥X - W^{(- 1 / 2)} ⊡ Y_{Ω}∥}_{F},

(6)

where

p = \frac{| Ω |}{\prod_{k = 1}^{n} d_{k}}

and

Y_{Ω} = T_{Ω} + Z_{Ω}

.

First, we generate a synthetic sampling set

Ω

with sampling rate SR

: = \frac{| Ω |}{\prod_{k = 1}^{n} d_{k}} = 30 %

and find a weight tensor

W

by solving

W = \underset{X ≻ 0, rank (X) = 1}{argmin} {∥ X - 1_{Ω} ∥}_{F}

(7)

via the alternating least squares method for the non-negative CP decomposition. Next, we generate synthetic tensors

T \in R^{d_{1} \times \dots \times d_{n}}

of the form

C \times_{1} U_{1} \times_{2} \dots \times_{n} U_{n}

with

n = 3, 4

with

rank (T_{(i)}) = r

, where

i = 1, \dots, n

, and r varies from 2 to 10. Then we add mean zero Gaussion random noise

Z

with variance

σ = 10^{- 2}

so that a new tensor is generated, which is denoted by

Y = T + Z

. Then we solve the tensor completion problems (4), (5) and (6) by the HOSVD procedure. For each fixed low-rank tensor, we average over 20 tests. We measure error using the weighted Frobenius norm. The simulation results are reported in Figure 1 and Figure 2. Figure 1 shows the results for the tensor of size

100 \times 100 \times 100

and Figure 2 shows the results for the tensor of size

50 \times 50 \times 30 \times 30

, where the weighted error is of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥}

. These figures demonstrate that using our weighted samples performs more efficiently than using the original samples. For the uniform sampling case, the p weighted samples and

W

weighted samples exhibit similar performance.

4.2. Simulation for Non-Uniform Sampling Pattern

To generate a non-uniform sampling pattern with sampling rate

30 %

, we first generate a CP rank 1 tensor of the form

H = 〚 1; h_{1}, \dots, h_{n} 〛

, where

h_{i} = (u_{i} 1_{⌈ d_{i} / 2 ⌉}, v_{i} 1_{⌊ d_{i} / 2 ⌋})

0 < u_{i}, v_{i} \leq 1

. Let

Ω \sim H

. Then we repeat the process as in Section 4.1. The simulation results are shown in Figure 3 and Figure 4. As shown in figures, the results using our proposed weighted samples perform more efficiently than using the p weighted samples.

Remark 4.

When we use the HOSVD procedure to solve (4), (5), and (6), we need (an estimate of) the Tucker rank as input. Instead of inputting the real rank of the true tensor, we could also use the rank that is estimated by considering the decay of the singular values for the unfolded matrices of the sampled tensor

Y_{Ω}

as the input rank, which we call SV-rank. The simulation results for the non-uniform sampling pattern with SV-rank as input are reported in Figure 5. The simulation shows that the weighted HOSVD algorithm performs more efficiently than using the p weighted samples or the original samples. Comparing Figure 5 with Figure 3, we could observe that using the estimated rank as input for HOSVD procedure performs even better than using the real rank as input. This observation motivates a way to find a “good" rank as input for HOSVD procedure.

Remark 5.

We only provide guarantees on the performance in the weighted Frobenius norm, (as we report the weighted error

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

), our procedures exhibit good empirical performance even in the usual relative error

\frac{∥ \hat{T} {- T ∥}_{F}}{{∥ T ∥}_{F}}

when the Tucker rank of the tensor is relatively low. However, we observe that the advantages of weighted HOSVD scheme tend to be diminished in terms of relative error when the Tucker rank increases. This result is not surprising since the entries are treated unequally in scheme (6). Therefore we leave the investigation on relative error and the tensor rank for future work.

4.3. Test for Real Data

In this section, we test our weighted HOSVD algorithm for tensor completion on three videos, see [50]. The dataset is the tennis-serve data from an Olympic Sports Dataset [51]. One can download the dataset from http://vision.stanford.edu/Datasets (accessed date 10 May 2021). There are a lot of videos in the zip file and we only choose three of them: “d2P_zx_JeoQ_00120_00515.seq” (video 1), “gs3sPDfbeg4_00082_00229.seq”(video 2), and “VADoc-AsyXk_00061_ 0019.seq” (video 3). The three videos are color video. In our simulation, we use the same setup as the one in [50], and choose 30 frames evenly from each video. For each frame, the size is scaled to

360 \times 480 \times 3

, so each video is transformed into a 4-D tensor data of size

360 \times 480 \times 3 \times 30

. The first frame of each video after preprocessing is illustrated in Figure 6.

We implement the experiments for different sampling rates of

10 %

,

30 %

,

50 %

, and

80 %

to generate uniform and non-uniform sampling patterns

Ω

. In our implementation, we use the SV-rank of

T_{Ω}

as the input rank. According to the generated sampling pattern, we find a weight tensor

W

and find estimates

{\hat{T}}_{1}

and

{\hat{T}}_{2}

by considering (4) and (6) respectively, using the input Tucker rank

r

. The entries on

T_{1}

and

T_{2}

are forced to be the observed data. The Signal to Noise Ratio (SNR)

S N R (\hat{T}) = - 20 {log}_{10} (\frac{∥ \hat{T} {- T ∥}_{F}}{{∥ T ∥}_{F}})

are computed and the simulation results are reported in Table 1 and Table 2. As shown in the tables, applying HOSVD process to (6) can give a better result than applying HOSVD process to (4) directly regardless of the uniformity of the sampling pattern.

Finally, we test the proposed weighted HOSVD algorithm on real candle video data named “candle_4_A” [52] (The dataset can be downloaded from the Dynamic Texture Toolbox in http://0-www-vision-jhu-edu.brum.beds.ac.uk/code/ (accessed date 10 May 2021). We have tested the relation between the relative errors and the sampling rates using

r = (5, 5, 5)

as the input rank for HOSVD algorithm. The relative errors are presented in Figure 7. The simulation results also show that the proposed weighted HOSVD algorithm can implement tensor completion efficiently.

4.4. The Application of Weighted HOSVD on Total Variation Minimization

As shown in the previous simulations, the weighted HOSVD decomposition can provide better results for tensor completion by comparing with HOSVD. There are a bunch of algorithms that are Sensitive to initialization. Additionally, real applications may have higher requirements for accuracy. Therefore, it is meaningful to combine our weighted HOSVD with other algorithms in order to further improve the performance. In this section, we would consider the application of weighted HOSVD decomposition on the total variation minimization algorithm. As a traditional approach, the total variation minimization (TVM), is broadly applied in studies about image recovery and denoising. While the earliest research could trace back to 1992 [53]. The later studies combined TVM and other low rank approximation algorithms such as Nuclear Norm Minimization (see e.g., [54,55,56]) and HOSVD (e.g., [57,58,59]) in order to achieve better performance in image and video completion tasks.

Motivated by the matrix TV minimization, we proposed the tensor TV minimization which is summarized in Algorithm 1. In Algorithm 1, the Laplacian operator computes the divergence of all-dimension gradients for each entry of the tensor. The shrink operator simply moves the input towards 0 with distance

λ

, or formally defined as:

shrink (x, λ) = sign (x) \cdot max (| x | - λ, 0)

For the initialization of

X^{0}

in Algorithm 1, we assign

X^{0}

to be the output of the result from HOSVD-w.

Applying the same experiment setting as in Section 4.3, we evaluate the performance of the cocktail approach as well as the regular HOSVD approach. We report the simulation results in Table 1 and we measure the performances by considering the signal to noise ratio(SNR). As shown in Table 1, the total variation minimization could be applied to further improve the result of (6). Specifically, the TVM with

0

as initialization performs similar to TVM with HOSVD-w as initialization when the observed rate is high, but the HOSVD-w initialization could improve the performance of TVM when the observed rate is very low (e.g., 10%). Additionally, we compared the decay of relative error for using the weighted HOSVD output as initialization and the default initialization (

X^{0} = 0

). The iterative results are shown in Figure 8, and it shows that using the result from weighted HOSVD as an initialization could notably reduce the iterations of TV-minimization for achieving the convergence threshold (

∥ X^{k} - X^{k - 1} ∥_{F} < 10^{- 4}

).

Algorithm 1: TV Minimization for Tensor.

5. Conclusions

In this paper, we propose a simple but efficient algorithm named the weighted HOSVD algorithm for recovering an underlying low-rank tensor from noisy observations. For this algorithm, we provide upper and lower error bounds that measure the difference between the estimates and the true underlying low-rank tensor. The efficiency of our proposed weighted HOSVD algorithm is also shown by numerical simulations. Additionally, the result of our weighted HOSVD algorithm can be used as an initialization for the total variation minimization algorithm, which shows that using our method as an initialization for the total variation minimization algorithm can increasingly reduce the iterative steps leading to improved overall performance in reconstruction (see our conference paper [60]). It would be interesting for future work to combine the weighted HOSVD algorithm with other algorithms to achieve more accurate results for tensor completion in many settings.

Author Contributions

Conceptualization, L.H. and D.N.; Formal analysis, L.H.; Funding acquisition, D.N.; Methodology, L.H.; Project administration, Z.C., L.H. and D.N.; Supervision, L.H. and D.N.; Validation, Z.C., L.H. and D.N.; Visualization, Z.C. and L.H.; Writing—original draft, Z.C. and L.H.; Writing—review & editing, Z.C., L.H. and D.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by NSF DMS

# 2011140

and NSF BIGDATA

# 1740325

.

Data Availability Statement

In this work, the following the pre-existing reference databases have been used for our evaluations: tennis-seve dataset ([50,51]) and candel video data ([52]).They are publicly available at: http://vision.stanford.edu/Datasets/OlympicSports/ (accessed date 10 May 2021); http://0-www-vision-jhu-edu.brum.beds.ac.uk/code/ (accessed date 10 May 2021).

Acknowledgments

The authors take pleasure in thanking Hanqin Cai, Keaton Hamm, Armenak Petrosyan, Bin Sun, and Tao Wang for comments and suggestions on the manuscript.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Proof for Theorem 1

In this appendix, we provide the proof for Theorem 1.

Proof of Theorem

Let

Y_{Ω} = T_{Ω} + Z_{Ω}

.

\begin{matrix} {∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F} \\ = & {∥W^{(1 / 2)} ⊡ T - W^{(- 1 / 2)} ⊡ Y_{Ω} + W^{(- 1 / 2)} ⊡ Y_{Ω} - W^{(1 / 2)} ⊡ \hat{T}∥}_{F} \\ \leq & {∥W^{(1 / 2)} ⊡ T - W^{(- 1 / 2)} ⊡ Y_{Ω}∥}_{F} + {∥W^{(- 1 / 2)} ⊡ Y_{Ω} - W^{(1 / 2)} ⊡ \hat{T}∥}_{F} \\ \leq & 2 {∥W^{(1 / 2)} ⊡ T - W^{(- 1 / 2)} ⊡ Y_{Ω}∥}_{F} \\ = & 2 {∥W^{(1 / 2)} ⊡ T - W^{(- 1 / 2)} ⊡ (T_{Ω} + Z_{Ω})∥}_{F} \\ \leq & 2 {∥W^{(1 / 2)} ⊡ T - W^{(- 1 / 2)} ⊡ 1_{Ω} ⊡ T∥}_{F} + 2 {∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F} \\ \leq & 2 {∥T ⊡ (W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})∥}_{F} + 2 {∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F} \\ \leq & 2 {∥T∥}_{\infty} {∥W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω}∥}_{F} + 2 {∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F} . \end{matrix}

Thus, we have that

{∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F} \leq 2 {∥T∥}_{\infty} {∥W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω}∥}_{F} + 2 {∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F} .

(A1)

Next, let’s estimate

{∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F}

. Notice that

{∥W^{- (1 / 2)} ⊡ Z_{Ω}∥}_{F}^{2} = \sum_{(i_{1}, \dots, i_{n}) \in Ω} \frac{Z_{i_{1} \dots i_{n}}^{2}}{W_{i_{1} \dots i_{n}}}

\begin{matrix} P \{{∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F} \geq t\} & = & P \{e^{s {∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F}^{2}} \geq e^{s t^{2}}\} \\ \leq & e^{- s t^{2}} E (exp (s {∥W^{(- 1 / 2)} ⊡ Z_{Ω}∥}_{F}^{2})) \\ \leq & e^{- s t^{2}} \prod_{(i_{1}, \dots, i_{n}) \in Ω} E (exp (\frac{s Z_{i_{1} \dots i_{n}}^{2}}{W_{i_{1} \dots i_{n}}})) \\ = & e^{- s t^{2}} \prod_{(i_{1}, \dots, i_{n}) \in Ω} (\frac{1}{\sqrt{1 - 2 σ^{2} s / W_{i_{1} \dots i_{n}}}}) \end{matrix}

Recall that

μ^{2} = {max}_{(i_{1}, \dots, i_{n}) \in Ω} \frac{1}{W_{i_{1}, \dots, i_{n}}}

. By choosing

s = \frac{1}{4 σ^{2} μ^{2}}

, we have that

\begin{matrix} P \{{∥W^{- (1 / 2)} \circ Z_{Ω}∥}_{F} \geq t\} \leq exp (- \frac{t^{2}}{4 σ^{2} μ^{2}}) 2^{| Ω | / 2} . \end{matrix}

We conclude that with probability at least

1 - 2^{- | Ω | / 2}

,

{∥W^{(- 1 / 2)} \circ Z_{Ω}∥}_{F} \leq 2 σ μ \sqrt{| Ω | log (2)} .

Plugging this into (A1) proves the theorem. □

Appendix B. Proof of Theorems 2 and 3

In this appendix, we provide the proofs for the results related with the weighted HOSVD algorithm. The general upper bound for weighted HOSVD in Theorem 2 is restated in Appendix B.1 and its proof is also presented there. If the sampling pattern

Ω

is generated according to the weight tensor

W

, the related results in Theorem 3 are illustrated in Appendix B.2.

Appendix B.1. General Upper Bound for Weighted HOSVD Algorithm

Theorem A1.

Let

W = w_{1} \otimes \dots \otimes w_{n} \in R^{d_{1} \times \dots \times d_{n}}

have strictly positive entries, and fix

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

. Suppose that

T \in R^{d_{1} \times \dots \times d_{n}}

has Tucker rank

r = [r_{1} \dots r_{n}]

. Suppose that

Z_{i_{1} \dots i_{n}} \sim N (0, σ^{2})

and let

\hat{T} = W^{(- 1 / 2)} ⊡ ((W^{(- 1 / 2)} ⊡ Y_{Ω}) \times_{1} {\hat{U}}_{1} {\hat{U}}_{1}^{T} \times_{2} \dots \times_{n} {\hat{U}}_{n} {\hat{U}}_{n}^{T})

where

{\hat{U}}_{1}, \dots, {\hat{U}}_{n}

are obtained by HOSVD approximation process, where

Y_{Ω} = 1_{Ω} ⊡ (T + Z)

. Then with probability at least

1 - \sum_{i = 1}^{n} \frac{1}{d_{i} + \prod_{j \neq i} d_{j}}

over the choice of

Z

,

\begin{matrix} {∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F} \\ \leq & (\sum_{k = 1}^{n} 6 \sqrt{r_{k} log (d_{k} + \prod_{j \neq k} d_{j})} μ_{k}) σ + (\sum_{k = 1}^{n} 3 r_{k} {∥{(W^{(- 1 / 2)} ⊡ 1_{Ω} - W^{(1 / 2)})}_{(k)}∥}_{2}) {∥T∥}_{\infty} . \end{matrix}

where

μ_{k}^{2} = max \{max_{i_{k}} (\sum_{i_{1}, \dots, i_{k - 1}, i_{k + 1}, \dots, i_{n}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{W_{i_{1} i_{2} \dots i_{n}}}), max_{i_{1}, \dots, i_{k - 1}, i_{k + 1}, \dots, i_{n}} (\sum_{i_{k}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{W_{i_{1} i_{2} \dots i_{n}}})\} .

Proof.

Recall that

T_{Ω} = 1_{Ω} ⊡ T

and

Z_{Ω} = 1_{Ω} ⊡ Z

. First we have the following estimations.

\begin{matrix} {∥W^{(1 / 2)} ⊡ (\hat{T} - T)∥}_{F} \\ = & {∥(W^{(- 1 / 2)} ⊡ Y_{Ω}) \times_{1} {\hat{U}}_{1} {\hat{U}}_{1}^{T} \times_{2} \dots \times_{n} {\hat{U}}_{n} {\hat{U}}_{n}^{T} - (W^{(1 / 2)} ⊡ T) \times_{1} U_{1} U_{1}^{T} \times_{2} \dots \times_{n} U_{n} U_{n}^{T}∥}_{F} \\ \leq & {∥((W^{(- 1 / 2)} ⊡ Y_{Ω}) \times_{1} {\hat{U}}_{1} {\hat{U}}_{1}^{T} - (W^{(1 / 2)} ⊡ T) \times_{1} U_{1} U_{1}^{T}) \times_{2} {\hat{U}}_{2} {\hat{U}}_{2}^{T} \times_{3} \dots \times_{n} {\hat{U}}_{n} {\hat{U}}_{n}^{T}∥}_{F} \\ + {∥(W^{(1 / 2)} ⊡ T) (\times_{2} U_{2} U_{2}^{T} \times_{3} \dots \times_{n} U_{n} U_{n}^{T} - \times_{2} {\hat{U}}_{2} {\hat{U}}_{2}^{T} \times_{3} \dots \times_{n} {\hat{U}}_{n} {\hat{U}}_{n}^{T})∥}_{F} \\ \leq & \sqrt{2 r_{1}} {∥{\hat{U}}_{1} {\hat{U}}_{1}^{T} {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(1)} - U_{1} U_{1}^{T} {(W^{(1 / 2)} ⊡ T)}_{(1)}∥}_{2} + \sum_{k = 2}^{n} ∥(W^{(1 / 2)} ⊡ T) \\ {\times_{2} {\hat{U}}_{2} {\hat{U}}_{2}^{T} \times_{3} \dots \times_{k - 1} {\hat{U}}_{k - 1} {\hat{U}}_{k - 1}^{T} \times_{k} (U_{k} U_{k}^{T} - {\hat{U}}_{k} {\hat{U}}_{k}^{T}) \times_{k + 1} U_{k + 1} U_{k + 1}^{T} \times_{k + 2} \dots \times_{n} U_{n} U_{n}^{T}∥}_{F} \\ \leq & \sqrt{2 r_{1}} {∥{\hat{U}}_{1} {\hat{U}}_{1}^{T} {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(1)} - {(W^{(1 / 2)} ⊡ T)}_{(1)}∥}_{2} + \sum_{k = 2}^{n} \sqrt{r_{k}} {∥(U_{k} U_{k}^{T} - {\hat{U}}_{k} {\hat{U}}_{k}^{T}) {(W^{(1 / 2)} ⊡ T)}_{(k)}∥}_{2} \\ \leq & \sqrt{2 r_{1}} ({∥{\hat{U}}_{1} {\hat{U}}_{1}^{T} {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(1)} - {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(1)}∥}_{2} + {∥{(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(1)} - {(W^{(1 / 2)} ⊡ T)}_{(1)}∥}_{2}) \\ + \sum_{k = 2}^{n} \sqrt{r_{k}} {∥(U_{k} U_{k}^{T} - {\hat{U}}_{k} {\hat{U}}_{k}^{T}) {(W^{(1 / 2)} ⊡ T)}_{(k)}∥}_{2} \\ \leq & 2 \sqrt{2 r_{1}} {∥{(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(1)} - {(W^{(1 / 2)} ⊡ T)}_{(1)}∥}_{2} + \sum_{k = 2}^{n} \sqrt{r_{k}} {∥(U_{k} U_{k}^{T} - {\hat{U}}_{k} {\hat{U}}_{k}^{T}) {(W^{(1 / 2)} ⊡ T)}_{(k)}∥}_{2} . \end{matrix}

Notice that

\begin{matrix} {∥(U_{k} U_{k}^{T} - {\hat{U}}_{k} {\hat{U}}_{k}^{T}) {(W^{(1 / 2)} ⊡ T)}_{(k)}∥}_{2} \\ = & {∥{(W^{(1 / 2)} ⊡ T)}_{(k)} - {\hat{U}}_{k} {\hat{U}}_{k}^{T} {(W^{(1 / 2)} ⊡ T)}_{(k)}∥}_{2} \\ \leq & {∥{(W^{(1 / 2)} ⊡ T)}_{(k)} - {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(k)}∥}_{2} + {∥{\hat{U}}_{k} {\hat{U}}_{k}^{T} {(W^{(1 / 2)} ⊡ T - W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(k)}∥}_{2} + \\ {∥{(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(k)} - {\hat{U}}_{k} {\hat{U}}_{k}^{T} {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(k)}∥}_{2} \\ \leq & 3 {∥{(W^{(1 / 2)} ⊡ T)}_{(k)} - {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(k)}∥}_{2} . \end{matrix}

Therefore, we have

\begin{matrix} {∥W^{(1 / 2)} ⊡ (\hat{T} - T)∥}_{F} \leq \sum_{k = 1}^{n} 3 \sqrt{r_{k}} {∥{(W^{(1 / 2)} ⊡ T)}_{(k)} - {(W^{(- 1 / 2)} ⊡ Y_{Ω})}_{(k)}∥}_{2} . \end{matrix}

(A2)

Next, to estimate

{∥{(W^{(- 1 / 2)} ⊡ Y_{Ω} - W^{(1 / 2)} ⊡ T)}_{(k)}∥}_{2}

for

k = 1, \dots, n

.

Let us consider the case when

k = 1

. Other cases can be derived similarly. Using the fact that

T_{(1)}

has rank

r_{1}

and

{∥T_{(1)}∥}_{max} \leq \sqrt{r_{1}} {∥T_{(1)}∥}_{\infty} = \sqrt{r_{1}} {∥T∥}_{\infty}

, we conclude that

\begin{matrix} {∥{(W^{(- 1 / 2)} ⊡ Y_{Ω} - W^{(1 / 2)} ⊡ T)}_{(1)}∥}_{2} \\ = & {∥{(W^{(- 1 / 2)} ⊡ T_{Ω} - W^{(1 / 2)} ⊡ T)}_{(1)} + {(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥}_{2} \\ \leq & {∥{(W^{(- 1 / 2)} ⊡ T_{Ω} - W^{(1 / 2)} ⊡ T)}_{(1)}∥}_{2} + {∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥}_{2} \\ = & {∥{(W^{(- 1 / 2)} ⊡ 1_{Ω} - W^{(1 / 2)})}_{(1)} ⊡ T_{(1)}∥}_{2} + {∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥}_{2} \\ \leq & {∥T_{(1)}∥}_{max} {∥{(W^{(- 1 / 2)} ⊡ 1_{Ω} - W^{(1 / 2)})}_{(1)}∥}_{2} + {∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥}_{2} \\ \leq & \sqrt{r_{1}} {∥T∥}_{\infty} {∥{(W^{(- 1 / 2)} ⊡ 1_{Ω} - W^{(1 / 2)})}_{(1)}∥}_{2} + {∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥}_{2} . \end{matrix}

To bound

{∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥}_{2}

, we consider

{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)} = \sum_{i_{1}, \dots, i_{n}} \frac{1_{(i_{1}, \dots, i_{n}) \in Ω} Z_{i_{1} \dots i_{n}}}{\sqrt{W_{i_{1} \dots i_{n}}}} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T},

where

e_{i_{k}}

is the

i_{k}

-th standard basis vector of

R^{d_{k}}

.

Please note that

\begin{matrix} \sum_{i_{1}, \dots, i_{n}} \frac{1_{(i_{1}, \dots, i_{n}) \in Ω}}{W_{i_{1} \dots i_{n}}} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T} (e_{i_{2}} \otimes \dots \otimes e_{i_{n}}) e_{i_{1}}^{T} \\ = & \sum_{i_{1}, \dots, i_{n}} \frac{1_{(i_{1}, \dots, i_{n}) \in Ω}}{W_{i_{1} \dots i_{n}}} e_{i_{1}} e_{i_{1}}^{T} . \end{matrix}

Therefore,

\begin{matrix} {∥\sum_{i_{1}, \dots, i_{n}} \frac{1_{(i_{1}, \dots, i_{n}) \in Ω}}{W_{i_{1} \dots i_{n}}} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T} (e_{i_{2}} \otimes \dots \otimes e_{i_{n}}) e_{i_{1}}^{T}∥}_{2} \\ = & max_{i_{1}} \sum_{i_{2}, \dots, i_{n}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{W_{i_{1} i_{2} \dots i_{n}}} \leq μ_{1}^{2} . \end{matrix}

Similarly,

\begin{matrix} {∥\sum_{i_{1}, \dots, i_{n}} \frac{1_{(i_{1}, \dots, i_{n}) \in Ω}}{W_{i_{1} \dots i_{n}}} (e_{i_{2}} \otimes \dots \otimes e_{i_{n}}) e_{i_{1}}^{T} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T}∥}_{2} \\ = & max_{i_{2}, \dots, i_{n}} \sum_{i_{1}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{W_{i_{1} i_{2} \dots i_{n}}} \leq μ_{1}^{2} . \end{matrix}

By ([61] Theorem 1.5), for any

t > 0

,

\begin{matrix} P \{∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥ \geq t\} \leq (d_{1} + \prod_{j \neq 1} d_{j}) exp (- \frac{t^{2}}{2 σ^{2} μ_{1}^{2}}) . \end{matrix}

We conclude that with probability at least

1 - \frac{1}{d_{1} + \prod_{j \neq 1} d_{j}}

, we have

∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(1)}∥ \leq 2 σ μ_{1} \sqrt{log (d_{1} + \prod_{j \neq 1} d_{j})} .

Similarly, we have

\begin{matrix} {∥{(W^{(- 1 / 2)} ⊡ Y_{Ω} - W^{(1 / 2)} ⊡ T)}_{(k)}∥}_{2} \\ \leq & \sqrt{r_{k}} {∥T∥}_{\infty} {∥{(W^{(- 1 / 2)} ⊡ 1_{Ω} - W^{(1 / 2)})}_{(k)}∥}_{2} + {∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(k)}∥}_{2}, \end{matrix}

with

{∥{(W^{(- 1 / 2)} ⊡ Z_{Ω})}_{(k)}∥}_{2} \leq 2 σ μ_{k} \sqrt{log (d_{k} + \prod_{j \neq k} d_{j})}

with probability at least

1 - \frac{1}{d_{k} + \prod_{j \neq k} d_{j}}

, for

k = 2, \dots, n

.

Plugging all these into (A2), we can obtain the bound in our theorem. □

Next we are going to study the special case when the sampling set

Ω \sim W

.

Appendix B.2. Case Study: Ω∼W

In this section, we would provide upper and lower bounds for the weighted HOSVD algorithm.

Appendix B.2.1. Upper Bound

First, let us understand the bounds

λ_{ℓ}

and

μ_{ℓ}

in the case when

Ω \sim W

for

ℓ = 1, \dots, n

.

Lemma A1.

Let

W = w_{1} \otimes \dots \otimes w_{n} \in R^{d_{1} \times \dots \times d_{n}}

be a CP rank-1 tensor so that all

(i_{1}, \dots, i_{n}) \in [d_{1}] \times \dots \times [d_{n}]

with

W_{i_{1} \dots i_{n}} \in [\frac{1}{\sqrt{\prod_{j = 1}^{n} d_{j}}}, 1]

. Suppose that

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

so that for each

i_{1} \in [d_{1}], \dots, i_{n} \in [d_{n}]

,

(i_{1}, \dots, i_{n}) \in Ω

with probability

W_{i_{1} \dots i_{n}}

, independently for each

(i_{1}, \dots, i_{n})

. Then with probability at least

1 - \sum_{ℓ = 1}^{n} \frac{2}{d_{ℓ} + \prod_{j \neq ℓ} d_{j}}

over the choice of Ω, we have for

ℓ = 1, \dots, n

λ_{ℓ} = {∥{(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(ℓ)}∥}_{2} \leq 2 \sqrt{d_{ℓ} + \prod_{k \neq ℓ} d_{k}} log (d_{ℓ} + \prod_{k \neq ℓ} d_{k}),

(A3)

and

μ_{ℓ} \leq 2 \sqrt{(d_{ℓ} + \prod_{k \neq ℓ} d_{k}) log (d_{ℓ} + \prod_{k \neq ℓ} d_{k})} .

(A4)

Proof.

Fix

i_{1} \in [d_{1}]

. Bernstein’s inequality yields

\begin{matrix} P \{\sum_{i_{2}, \dots, i_{n}} \frac{1_{(i_{1}, \dots, i_{n}) \in Ω}}{w_{1} (i_{1}) \dots w_{n} (i_{n})} - \prod_{k \neq 1} d_{k} \geq t\} \\ \leq & exp (\frac{- t^{2} / 2}{\sum_{i_{2}, \dots, i_{n}} (\frac{1}{w_{1} (i_{1}) \dots w_{n} (i_{n})} - 1) + \frac{1}{3} \sqrt{\prod_{k = 1}^{n} d_{k}} t}) . \end{matrix}

and

\begin{matrix} P \{\sum_{i_{1}} \frac{1_{(i_{1}, \dots, i_{n}) \in Ω}}{w_{1} (i_{1}) \dots w_{n} (i_{n})} - d_{1} \geq t\} \\ \leq & exp (\frac{- t^{2} / 2}{\sum_{i_{1}} (1 / (w_{1} (i_{1}) \dots w_{n} (i_{n})) - 1) + \frac{1}{3} \sqrt{\prod_{k = 1}^{n} d_{k}} t}) . \end{matrix}

Set

t = 2 \sqrt{2} (d_{1} + \prod_{j \neq 1} d_{j}) log (d_{1} + \prod_{j \neq 1} d_{j})

, then we have

\begin{matrix} P \{\sum_{i_{2}, \dots, i_{n}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{w_{1} (i_{1}) \dots w_{n} (i_{n})} - \prod_{k \neq 1} d_{k} \geq 2 \sqrt{2} (d_{1} + \prod_{j \neq 1} d_{j}) log (d_{1} + \prod_{j \neq 1} d_{j})\} \\ \leq & 1 / {(d_{1} + \prod_{j \neq 1} d_{j})}^{2} \end{matrix}

and

\begin{matrix} P \{\sum_{i_{1}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{w_{1} (i_{1}) w_{2} (i_{2}) \dots w_{n} (i_{n})} - d_{1} \geq 2 \sqrt{2} (d_{1} + \prod_{j \neq 1} d_{j}) log (d_{1} + \prod_{j \neq 1} d_{j})\} \\ \leq & 1 / {(d_{1} + \prod_{j \neq 1} d_{j})}^{2} . \end{matrix}

Hence, by taking a union bound,

\begin{matrix} P \{max \{max_{i_{1}} \sum_{i_{2}, \dots, i_{n}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{w_{1} (i_{1}) w_{2} (i_{2}) \dots w_{n} (i_{n})}, max_{i_{2}, \dots, i_{n}} \sum_{i_{1}} \frac{1_{(i_{1}, i_{2}, \dots, i_{n}) \in Ω}}{w_{1} (i_{1}) w_{2} (i_{2}) \dots w_{n} (i_{n})}\} \\ \geq 4 (d_{1} + \prod_{j \neq 1} d_{j}) log (d_{1} + \prod_{j \neq 1} d_{j})\} \leq \frac{1}{d_{1} + \prod_{j \neq 1} d_{j}} . \end{matrix}

Similarly, we have

\begin{matrix} P \{μ_{k}^{2} \geq 4 (d_{k} + \prod_{j \neq k} d_{j}) log (d_{k} + \prod_{j \neq k} d_{j})\} & \leq & \frac{1}{d_{k} + \prod_{j \neq k} d_{j}}, for all k = 2, \dots, n . \end{matrix}

Combining all these inequalities above, with probability at least

1 - \sum_{ℓ = 1}^{n} \frac{1}{d_{ℓ} + \prod_{j \neq ℓ} d_{j}}

, we have

μ_{ℓ} \leq 2 \sqrt{(d_{ℓ} + \prod_{k \neq ℓ} d_{k}) log (d_{ℓ} + \prod_{k \neq ℓ} d_{k})}, for all ℓ = 1, \dots, n .

Next we would bound

λ_{ℓ}

in (A3). First of all, let’s consider

∥ {(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(1)} ∥_{2}

. Set

γ_{i_{1} \dots i_{n}} = \frac{W_{i_{1} \dots i_{n}} - 1_{(i_{1}, \dots, i_{n}) \in Ω}}{\sqrt{W_{i_{1} \dots i_{n}}}}

. Then

{(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(1)} = \sum_{i_{1}, \dots, i_{n}} γ_{i_{1} \dots i_{n}} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T} .

Notice that

\begin{matrix} \sum_{i_{1}, \dots, i_{n}} E (γ_{i_{1} \dots i_{n}}^{2} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T} (e_{i_{2}} \otimes \dots \otimes e_{i_{n}}) e_{i_{1}}^{T}) \\ = & \sum_{i_{1}} (\sum_{i_{2}, \dots, i_{n}} E (γ_{i_{1} \dots i_{n}}^{2})) e_{i_{1}} e_{i_{1}}^{T} . \end{matrix}

Since

E (γ_{i_{1} \dots i_{n}}^{2}) = 1 - W_{i_{1} \dots i_{n}} \leq 1 - \frac{1}{\sqrt{d_{1} \dots d_{n}}} \leq 1

, then

{∥\sum_{i_{1}, \dots, i_{n}} E (γ_{i_{1} \dots i_{n}}^{2} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T} (e_{i_{2}} \otimes \dots \otimes e_{i_{n}}) e_{i_{1}}^{T})∥}_{2} \leq \prod_{j \neq 1} d_{j} .

Similarly,

{∥\sum_{i_{1}, \dots, i_{n}} E (γ_{i_{1} \dots i_{n}}^{2} (e_{i_{2}} \otimes \dots \otimes e_{i_{n}}) e_{i_{1}}^{T} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T})∥}_{2} \leq d_{1} .

In addition,

{∥γ_{i_{1} \dots i_{n}} e_{i_{1}} {(e_{i_{2}} \otimes \dots \otimes e_{i_{n}})}^{T}∥}_{2} \leq {(\prod_{j = 1}^{n} d_{j})}^{1 / 4} \leq \sqrt{\frac{d_{1} + \prod_{j \neq 1} d_{j}}{2}} .

Then, the matrix Bernstein Inequality ([61] Theorem 1.4) gives

\begin{matrix} P \{{∥{(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(1)}∥}_{2} \geq t\} \\ \leq & (d_{1} + \prod_{j \neq 1} d_{j}) exp (- \frac{t^{2} / 2}{(d_{1} + \prod_{j \neq 1} d_{j}) + \frac{t}{3} \sqrt{(d_{1} + \prod_{j \neq 1} d_{j}) / 2}}) . \end{matrix}

Let

t = 2 \sqrt{d_{1} + \prod_{j \neq 1} d_{j}} log (d_{1} + \prod_{j \neq 1} d_{j})

, then we have

\begin{matrix} P \{{∥{(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(1)}∥}_{2} \geq 2 \sqrt{d_{1} + \prod_{j \neq 1} d_{j}} log (d_{1} + \prod_{j \neq 1} d_{j})\} \leq \frac{1}{d_{1} + \prod_{j \neq 1} d_{j}} . \end{matrix}

Similarly,

\begin{matrix} P \{{∥{(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(k)}∥}_{2} \geq 2 \sqrt{d_{k} + \prod_{j \neq k} d_{k}} log (d_{k} + \prod_{j \neq k} d_{j})\} \leq \frac{1}{d_{k} + \prod_{j \neq k} d_{j}}, \end{matrix}

for all

k = 2, \dots, n

.

Thus, with probability at least

1 - \sum_{ℓ = 1}^{n} \frac{1}{d_{ℓ} + \prod_{j \neq ℓ} d_{j}}

, we have

{∥{(W^{(1 / 2)} - W^{(- 1 / 2)} ⊡ 1_{Ω})}_{(ℓ)}∥}_{2} \leq 2 \sqrt{d_{ℓ} + \prod_{k \neq ℓ} d_{k}} log (d_{ℓ} + \prod_{k \neq ℓ} d_{k}), for all ℓ = 1, \dots, n .

By a union of bounds in (A4) and (A3), we could establish the lemma. □

Lemma A2.

Let

m = {∥W^{(1 / 2)}∥}_{F}^{2}

. Then with probability at least

1 - 2 exp (- 3 m / 104)

, over the choice of Ω

| | Ω | - m | \leq \frac{m}{4} .

Proof.

Please note that

| | Ω | - m | = |\sum_{i_{1}, \dots, i_{n}} (1_{(i_{1}, \dots, i_{n}) \in Ω} - W_{i_{1} \dots i_{n}})| = |\sum_{i_{1}, \dots, i_{n}} (1_{(i_{1}, \dots, i_{n}) \in Ω} - E (1_{(i_{1}, \dots, i_{n}) \in Ω})|,

which is the sum of zero-mean independent random variables. Observe that

| 1_{(i_{1}, \dots, i_{n}) \in Ω} - E (1_{(i_{1}, \dots, i_{n}) \in Ω}) | = | 1_{(i_{1}, \dots, i_{n}) \in Ω} - W_{i_{1} \dots i_{n}} | \leq 1

and

\sum_{i_{1}, \dots, i_{n}} E {(1_{(i_{1}, \dots, i_{n}) \in Ω} - W_{i_{1} \dots i_{n}})}^{2} = \sum_{i_{1}, \dots, i_{n}} (W_{i_{1} \dots i_{n}} - W_{i_{1} \dots i_{n}}^{2}) \leq m .

By Bernstein’s inequality,

\begin{matrix} P (| | Ω | - m | \geq t) & \leq & 2 exp (- \frac{t^{2} / 2}{m + t / 3}) . \end{matrix}

Set

t = m / 4

, then we have

\begin{matrix} P (| | Ω | - m | \geq m / 4) & \leq & 2 exp (- \frac{m^{2} / 32}{m + m / 12}) = 2 exp (- 3 m / 104) . \end{matrix}

□

Next let us give the formal statement for the upper bounds in Theorem 3.

Theorem A2.

Let

W = w_{1} \otimes \dots \otimes w_{n} \in R^{d_{1} \times \dots \times d_{n}}

be a CP rank-1 tensor so that for all

(i_{1}, \dots, i_{n}) \in [d_{1}] \times \dots \times [d_{n}]

we have

W_{i_{1} \dots i_{n}} \in [\frac{1}{\sqrt{d_{1} \dots d_{n}}}, 1]

. Suppose that we choose each

(i_{1}, \dots, i_{n}) \in [d_{1}] \times \dots \times [d_{n}]

independently with probability

W_{i_{1} \dots i_{n}}

to form a set

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

. Then with probability at least

1 - 2 exp (- \frac{3}{104} \sqrt{\prod_{j = 1}^{n} d_{j}}) - \sum_{k = 1}^{n} \frac{2}{d_{k} + \prod_{j \neq k} d_{j}}

For the weighted HOSVD Algorithm named

A

,

A

returns

\hat{T} = A (T_{Ω} + Z_{Ω})

for any Tucker rank

r

tensor

T

with

{∥T∥}_{\infty} \leq β

so that with probability at least

1 - \sum_{k = 1}^{n} \frac{1}{d_{k} + \prod_{j \neq k} d_{j}}

over the choice of

Z

,

\begin{matrix} \frac{{∥W^{(1 / 2)} ⊡ (T - \hat{T})∥}_{F}}{{∥W^{(1 / 2)}∥}_{F}} & \leq & \frac{\sqrt{5} β}{\sqrt{| Ω |}} (\sum_{k = 1}^{n} 3 r_{k} \sqrt{d_{k} + \prod_{j \neq k} d_{j}} log (d_{k} + \prod_{j \neq k} d_{j})) \\ + \frac{\sqrt{5} σ}{| Ω |} (\sum_{k = 1}^{n} 6 \sqrt{r_{k} (d_{k} + \prod_{j \neq k} d_{j})} log (d_{k} + \prod_{j \neq k} d_{j})) \end{matrix}

Proof.

This is directly from Theorem A1, Lemmas A1 and A2. □

Appendix B.2.2. Lower Bound

To deduce the lower bound, we have to construct a finite subset S in the cone

K_{r}

so that we can approximate the minimal distance between two different elements in S. Before we prove the lower bound, we need the following theorems and lemmas.

Theorem A3

(Hanson-Wright inequality). There is some constant

c > 0

so that the following holds. Let

ξ \in {0, \pm 1}^{d}

be a vector with mean-zero, independent entries, and let F be any matrix which has zero diagonal. Then

P \{| ξ^{T} F ξ | > t\} \leq 2 exp (- c \cdot min \{\frac{t^{2}}{{∥ F ∥}_{F}^{2}}, \frac{t}{{∥ F ∥}_{2}}\}) .

Theorem A4

(Fano’s Inequality). Let

F = {f_{0}, \dots, f_{n}}

be a collection of densities on

K

, and suppose that

A : K \to {0, \dots, n}

. Suppose there is some

β > 0

such that for any

i \neq j

,

D_{K L} (f_{i} ∥ f_{j}) \leq β

. Then

max_{i} P_{K \sim f_{i}} \{A (K) \neq i\} \geq 1 - \frac{β + log (2)}{log (n)} .

The following lemma specializes Fano’s Inequality to our setting, which is a generalization of ([19] Lemma 19). In the following lemma, we show that for any reconstruction algorithm on a set

K \subseteq R^{d_{1} \times \dots \times d_{n}}

, with probability no less than

\frac{1}{2}

, there exists some elements in K such that the weighted reconstruction error is bounded below by some quantity, where the quantity is independent of the algorithm.

Lemma A3.

Let

K \subseteq R^{d_{1} \times \dots \times d_{n}}

, and let

S \subseteq K

be a finite subset of K so that

| S | > 16

. Let

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

be a sampling pattern. Let

σ > 0

and choose

κ \leq \frac{σ \sqrt{log | S |}}{4 {max}_{T \in S} {∥T_{Ω}∥}_{F}},

and suppose that

κ S \subseteq K .

Let

Z \in R^{d_{1} \times \dots \times d_{n}}

be a tensor whose entries

Z_{i_{1} \dots i_{n}}

are i.i.d.,

Z_{i_{1} \dots i_{n}} \sim N (0, σ^{2})

. Let

H \subseteq R^{d_{1} \times \dots \times d_{n}}

be any weight tensor.

Then for any algorithm

A : R^{Ω} \to R^{d_{1} \times \dots \times d_{n}}

that takes as input

T_{Ω} + Z_{Ω}

for

T \in K

and outputs an estimate

\hat{T}

to

T

, there is some

X \in K

so that

{∥H ⊡ (A (X_{Ω} + Z_{Ω}) - X)∥}_{F} \geq \frac{κ}{2} min_{T \neq T^{'} \in S} {∥H ⊡ (T - T^{'})∥}_{F}

(A5)

with probability at least

\frac{1}{2}

.

Proof.

Consider the set

S^{'} = κ S = {κ T : T \in S}

which is a scaled version of S. By our assumption,

S^{'} \subseteq K

.

Recall that the Kullback–Leibler (KL) divergence between two multivariate Gaussians is given by

\begin{matrix} D_{K L} (N (μ_{1}, Σ_{1}) ∥ N (μ_{2}, Σ_{2})) \\ = & \frac{1}{2} (log (\frac{det (Σ_{2})}{det (Σ_{1})}) - n + t r (Σ_{2}^{- 1} Σ_{1}) + 〈 Σ_{2}^{- 1} (μ_{2} - μ_{1}), μ_{2} - μ_{1} 〉), \end{matrix}

where

μ

_{1}

,

μ

_{2} \in R^{n}

.

Specializing to

U, V \in S^{'}

, with

I = I_{Ω \times Ω}

\begin{matrix} D_{K L} (U_{Ω} + Z_{Ω} ∥ V_{Ω} + Z_{Ω}) & = & D_{K L} (N (U_{Ω}, σ^{2} I) ∥ N (V_{Ω}, σ^{2} I)) \\ = & \frac{{∥U_{Ω} - V_{Ω}∥}_{F}^{2}}{2 σ^{2}} \\ \leq & max_{T \in S^{'}} \frac{2 {∥T_{Ω}∥}_{F}^{2}}{σ^{2}} = \frac{2 κ^{2}}{σ^{2}} max_{T \in S} {∥T_{Ω}∥}_{F}^{2} . \end{matrix}

Suppose that

A

is as in the statement of the lemma. Define an algorithm

\bar{A} : R^{Ω} \to R^{d_{1} \times \dots \times d_{n}}

so that for any

Y \in R^{Ω}

if there exists

T \in S^{'}

such that

{∥H ⊡ (T - A (Y))∥}_{F} < \frac{1}{2} min_{T \neq T^{'} \in S^{'}} {∥ H ⊡ (T - T^{'}) ∥}_{F} : = \frac{ρ}{2},

then set

\bar{A} (Y) = T

(notice that if such

T

exists, then it is unique), otherwise, set

\bar{A} (Y) = A (Y)

.

Then by the Fano’s inequality, there is some

T \in S^{'}

so that

\begin{matrix} P \{\bar{A} (T_{Ω} + Z_{Ω}) \neq T\} & \geq & 1 - \frac{2 {max}_{T \in S^{'}} {∥ T_{Ω} ∥}_{F}^{2}}{σ^{2} log (| S | - 1)} - \frac{log (2)}{log (| S | - 1)} \\ = & 1 - \frac{2 κ^{2} {max}_{T \in S} {∥ T_{Ω} ∥}_{F}^{2}}{σ^{2} log (| S | - 1)} - \frac{log (2)}{log (| S | - 1)} \\ \geq & 1 - \frac{1}{4} - \frac{1}{4} = \frac{1}{2} . \end{matrix}

If

\bar{A} (T_{Ω} + Z_{Ω}) \neq T

, then

∥ H ⊡ (A (T_{Ω} + Z_{Ω}) - T) ∥_{F} > ρ / 2

, and so

P \{∥ H ⊡ (A (T_{Ω} + Z_{Ω}) - T) ∥_{F} \geq ρ / 2\} \geq P \{\bar{A} (T_{Ω} + Z_{Ω}) \neq T\} \geq 1 / 2 .

Finally, we observe that

\frac{ρ}{2} = \frac{1}{2} min_{T \neq T^{'} \in S^{'}} ∥ H ⊡ (T - T^{'}) ∥_{F} = \frac{κ}{2} min_{T \neq T^{'} \in S} {∥ H ⊡ (T - T^{'}) ∥}_{F},

which completes the proof. □

To understand the lower bound

\frac{κ}{2} {min}_{T \neq T \in S} {∥ H ⊡ (T - T^{'}) ∥}_{F}

in (A5), we construct a specific finite subset S for the cone of Tucker rank

r

tensors in the following lemma.

Lemma A4.

There is some constant c so that the following holds. Let

d_{1}, \dots, d_{n} > 0

and

r_{1}, \dots, r_{n} > 0

be sufficiently large. Let K be the cone of Tucker rank

r

tensors with

r = [r_{1} \dots r_{n}]

,

H

be any CP rank-1 weight tensor, and

B

be any CP rank-1 tensor with

{∥ B ∥}_{\infty} \leq 1

. Write

H = h_{1} \otimes \dots \otimes h_{n}

and

B = b_{1} \otimes \dots \otimes b_{n}

, and

w_{1} = {(h_{1} ⊡ b_{1})}^{(2)}, \dots, w_{n} = {(h_{n} ⊡ b_{n})}^{(2)} .

Let

γ = \sqrt{\frac{1}{2} (\prod_{k = 1}^{n} r_{k}) log (8 \prod_{k = 1}^{n} d_{k})} .

There is a set

S \subseteq K \cap γ B_{\infty}

so that

The set has size $| S | \geq N$ , for

$\begin{matrix} N = C exp (c \cdot min \{\frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 r_{k} (∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1})^{2} + 1)) - 1}, \prod_{k = 1}^{n} r_{k}, \\ \frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} / ∥ w_{k} ∥_{1} r_{k} log (r_{k}) + 1)) - 1}\}) . \end{matrix}$
$∥ T_{Ω} ∥_{F} \leq 2 \sqrt{\prod_{k = 1}^{n} r_{k}} {∥ B_{Ω} ∥}_{F}$ for all $T \in S$ .
${∥H ⊡ (T - \tilde{T})∥}_{F} \geq \sqrt{\prod_{k = 1}^{n} r_{k}} {∥H ⊡ B∥}_{F}$ for all $T \neq \tilde{T} \in S$ .

Proof.

Let

Ψ \subseteq {\pm 1}^{r_{1} \times \dots \times r_{n}}

be a set of random

\pm 1

-valued tensors chosen uniformly at random with replacement, of size

4 N

. Choose

^{i} U \in {\pm 1}^{d_{i} \times r_{i}}

to be determined below for all

i = 1, \dots, n

.

Let

S = \{B ⊡ (C \times_{1}^{1} U \times_{2} \dots \times_{n}^{n} U) : C \in Ψ\} .

First of all, we would estimate

{∥T_{Ω}∥}_{F}

and

{∥T∥}_{\infty}

. Please note that

\begin{matrix} E {∥T_{Ω}∥}_{F}^{2} & = & E \sum_{(i_{1}, \dots, i_{n}) \in Ω} B_{i_{1} \dots i_{n}}^{2} {(\sum_{j_{1}, \dots, j_{n}} C_{j_{1} \dots j_{n}}^{1} U (i_{1}, j_{1}) \dots^{n} U (i_{n}, j_{n}))}^{2} = (\prod_{i = 1}^{n} r_{i}) {∥ B_{Ω} ∥}_{F}^{2}, \end{matrix}

where the expectation is over the random choice of

C

. Then by Markov’s inequality,

P \{∥ T_{Ω} ∥_{F}^{2} \geq (4 \prod_{i = 1}^{n} r_{i}) {∥ B_{Ω} ∥}_{F}^{2}\} \leq \frac{1}{4} .

We also have

{∥ T ∥}_{\infty} = max_{i_{1}, \dots, i_{n}} | B_{i_{1} \dots i_{n}} | |\sum_{j_{1}, \dots, j_{n}} C_{j_{1} \dots j_{n}}^{1} U (i_{1}, j_{1}) \dots^{n} U (i_{n}, j_{n})| .

By Hoeffding’s inequality, we have

\begin{matrix} P \{|\sum_{j_{1}, \dots, j_{n}} C_{j_{1} \dots j_{n}}^{1} U (i_{1}, j_{1}) \dots^{n} U (i_{n}, j_{n})| \geq t\} & \leq & 2 exp (- \frac{2 t^{2}}{\prod_{k = 1}^{n} r_{k}}) . \end{matrix}

Using the fact that

| B_{i_{1} \dots i_{n}} | \leq 1

and a union bound over all

\prod_{k = 1}^{n} d_{k}

values of

i_{1}, \dots, i_{n}

, we conclude that

\begin{matrix} P \{{∥ T ∥}_{\infty} \geq \sqrt{\frac{1}{2} (\prod_{k = 1}^{n} r_{k}) log (8 \prod_{k = 1}^{n} d_{k})}\} \\ \leq & (\prod_{k = 1}^{n} d_{k}) P \{|\sum_{j_{1}, \dots, j_{n}} C_{j_{1} \dots j_{n}}^{1} U (i_{1}, j_{1}) \dots^{n} U (i_{n}, j_{n})| \geq \sqrt{\frac{1}{2} (\prod_{k = 1}^{n} r_{k}) log (8 \prod_{k = 1}^{n} d_{k})}\} \\ \leq & \frac{1}{4} . \end{matrix}

Thus, for a tensor

T \in S

, the probability that both of

{∥ T ∥}_{\infty} \leq \sqrt{\frac{1}{2} (\prod_{k = 1}^{n} r_{k}) log (8 \prod_{k = 1}^{n} d_{k})}

and

∥ T_{Ω} ∥_{F} \leq 2 \sqrt{\prod_{k = 1}^{n} r_{k}} {∥ B_{Ω} ∥}_{F}

hold is at least

\frac{1}{2}

. Thus, by a Chernoff bound it follows that with probability at least

1 - exp (- C N)

for some constant C, there are at least

\frac{| S |}{4}

tensors

T \in S

such that all of these hold. Let

\tilde{S} \subseteq S

be the set of such

T

’s. The set guaranteed in the statement of the lemma will be

\tilde{S}

, which satisfies both item 1 and 2 in the lemma and is also contained in

K \cap γ B_{\infty}

.

Thus, we consider item 3: we are going to show that this holds for S with high probability, thus in particularly it will hold for

\tilde{S}

, and this will complete the proof of the lemma.

Fix

T \neq \tilde{T} \in S

, and write

\begin{matrix} {∥H ⊡ (T - \tilde{T})∥}_{F}^{2} \\ = & {∥H ⊡ B ⊡ ((C - \tilde{C}) \times_{1}^{1} U \times_{2} \dots \times_{n}^{n} U)∥}_{F}^{2} \\ = & \sum_{i_{1}, \dots, i_{n}} H_{i_{1} \dots i_{n}}^{2} B_{i_{1} \dots i_{n}}^{2} {(\sum_{j_{1}, \dots, j_{n}} (C_{j_{1} \dots j_{n}} - {\tilde{C}}_{j_{1} \dots j_{n}})^{1} U (i_{1}, j_{1}) \dots^{n} U (i_{n}, j_{n}))}^{2} \\ = & 4 \sum_{i_{1}, \dots, i_{n}} H_{i_{1} \dots i_{n}}^{2} B_{i_{1} \dots i_{n}}^{2} {〈ξ,^{1} U (i_{1}, :) \otimes \dots \otimes^{n} U (i_{n}, :)〉}^{2}, \end{matrix}

where

ξ

is the vectorization of

\frac{1}{2} (C - \tilde{C})

. Thus, each entry of

ξ

is independently 0 with probability

\frac{1}{2}

or

\pm 1

with probability

\frac{1}{4}

each. Rearranging the terms, we have

\begin{matrix} {∥H ⊡ (T - \tilde{T})∥}_{F}^{2} & = & 4 ξ^{T} {(^{1} U \otimes \dots \otimes^{n} U)}^{T} (D_{1} \otimes \dots \otimes D_{n}) (^{1} U \otimes \dots \otimes^{n} U) ξ \\ = & 4 ξ^{T} ((^{1} U^{T} D_{1}^{1} U) \otimes \dots \otimes (^{n} U^{T} D_{n}^{n} U)) ξ \\ = & 4 ξ^{T} (\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)) ξ, \end{matrix}

(A6)

where

D_{k}

denotes the

d_{k} \times d_{k}

diagonal matrix with

w_{k}

on the diagonal.

To understand (A6), we need to understand the matrix

\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U) \in R^{\prod_{k = 1}^{n} r_{k} \times \prod_{k = 1}^{n} r_{k}}

. The diagonal of this matrix is

(\prod_{k = 1}^{n} {∥w_{k}∥}_{1}) I

. We will choose the matrix

^{k} U

for

k = 1, \dots, n

so that the off-diagonal terms are small. □

Theorem A5.

There are matrices

^{k} U \in {\pm 1}^{d_{k} \times r_{k}}

for

k = 1, \dots, n

such that:

(a): $\begin{matrix} {∥(\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)) - (\prod_{j = 1}^{n} {∥w_{j}∥}_{1}) I∥}_{F}^{2} \leq (\prod_{k = 1}^{n} (2 r_{k}^{2} ∥ w_{k} ∥_{2}^{2} + r_{k} {∥ w_{k} ∥}_{1}^{2})) - \prod_{k = 1}^{n} (r_{k} {∥ w_{k} ∥}_{1}^{2}) . \end{matrix}$
(b): $\begin{matrix} {∥(\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)) - (\prod_{j = 1}^{n} {∥ w_{j} ∥}_{1}) I∥}_{2} \\ \leq & max \{\prod_{k = 1}^{n} (2 {∥w_{k}∥}_{2} \sqrt{r_{k} log (r_{k})} + 2 {∥w_{k}∥}_{\infty} r_{k} log (r_{k}) + {∥w_{k}∥}_{1}) - \prod_{k = 1}^{n} {∥w_{k}∥}_{1}, \prod_{k = 1}^{n} {∥w_{k}∥}_{1}\} . \end{matrix}$

Proof.

By ([19] Claim 22), there exist matrices

^{k} U \in {\pm 1}^{d_{k} \times r_{k}}

such that:

(a): ${∥^{k} U^{T} D_{k}^{k} U∥}_{F}^{2} \leq 2 r_{k}^{2} {∥w_{k}∥}_{2}^{2} + r_{k} {∥w_{k}∥}_{1}^{2}$ and
(b): ${∥^{k} U^{T} D_{k}^{k} U∥}_{2} \leq 2 {∥w_{k}∥}_{2} \sqrt{r_{k} log (r_{k})} + 2 {∥w_{k}∥}_{\infty} r_{k} log (r_{k}) + {∥ w_{k} ∥}_{1}$ .

Using (a) and the fact that

{∥\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)∥}_{F}^{2} = \prod_{k = 1}^{n} {∥^{k} U^{T} D_{k}^{k} U∥}_{F}^{2}

, we have

\begin{matrix} {∥(\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)) - (\prod_{k = 1}^{n} {∥ w_{j} ∥}_{1}) I∥}_{F}^{2} \\ = & {∥\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)∥}_{F}^{2} - {∥(\prod_{k = 1}^{n} {∥ w_{j} ∥}_{1}) I∥}_{F}^{2} \\ \leq & (\prod_{k = 1}^{n} (2 r_{k}^{2} ∥ w_{k} ∥_{2}^{2} + r_{k} {∥ w_{k} ∥}_{1}^{2})) - \prod_{k = 1}^{n} (r_{k} {∥ w_{k} ∥}_{1}^{2}) . \end{matrix}

By (b) and the fact that

{∥\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)∥}_{2} = \prod_{k = 1}^{n} {∥^{k} U^{T} D_{k}^{k} U∥}_{2}

(see [62]), we have

\begin{matrix} {∥(\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U)) - (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) I∥}_{2} \\ \leq & max \{\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}, (\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} r_{k} log (r_{k}) + {∥ w_{k} ∥}_{1})) - \prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}\} . \end{matrix}

□

Having chosen matrices

^{k} U

for

k = 1, \dots, n

, we can now analyze the expression (A6).

Theorem A6.

There are constants

c, c^{'}

so that with probability at least

\begin{matrix} 1 - 2 exp (- c^{''} \prod_{k = 1}^{n} r_{k}) - 2 exp (- c^{'} \cdot min \{\frac{\prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1}^{2})}{\prod_{k = 1}^{n} (2 r_{k} ∥ w_{k} ∥_{2}^{2} + ∥ w_{k} ∥_{1}^{2}) - \prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}^{2}}, \\ \prod_{k = 1}^{n} r_{k}, \frac{\prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1})}{(\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} r_{k} log (r_{k}) + ∥ w_{k} ∥_{1})) - \prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}}\}), \end{matrix}

we have

{∥H ⊡ (T - \tilde{T}) \prod_{k = 1}^{n} {∥w_{k}∥}_{1}∥}_{F}^{2} \geq \prod_{k = 1}^{n} (r_{k} {∥ w_{k} ∥}_{1}) .

Proof.

We break

{∥H ⊡ (T - \tilde{T})∥}_{F}^{2}

into two terms:

\begin{matrix} {∥H ⊡ (T - \tilde{T})∥}_{F}^{2} \\ = & 4 ξ^{T} (\otimes_{k = 1}^{n}^{k} U^{T} D_{k}^{k} U) ξ \\ = & 4 ξ^{T} (\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U) - (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) I) ξ + 4 (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) ξ^{T} ξ \\ : = & (I) + (I I) . \end{matrix}

For the first term (I), we will use the Hanson-Wright Inequality (see Theorem A3). In our case, the matrix

F = 4 (\otimes_{k = 1}^{n} (^{k} U^{T} D_{k}^{k} U) - (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) I)

. The Frobenius norm of this matrix is bounded by

\begin{matrix} {∥ F ∥}_{F}^{2} & \leq & 16 (\prod_{k = 1}^{n} (2 r_{k}^{2} ∥ w_{k} ∥_{2}^{2} + r_{k} {∥ w_{k} ∥}_{1}^{2}) - \prod_{k = 1}^{n} (r_{k} {∥ w_{k} ∥}_{1}^{2})) . \end{matrix}

The operator norm of F is bounded by

\begin{matrix} {∥ F ∥}_{2} \\ \leq & 4 max \{\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} r_{k} log (r_{k}) + ∥ w_{k} ∥_{1}) - \prod_{k = 1}^{n} ∥ w_{k} ∥_{1}, \prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}\} . \end{matrix}

Thus, the Hanson-Wright inequality implies that

\begin{matrix} P \{(I) \geq t\} \\ \leq & 2 exp (- c \cdot min \{\frac{t^{2}}{16 \prod_{k = 1}^{n} (2 r_{k}^{2} ∥ w_{k} ∥_{2}^{2} + r_{k} {∥ w_{k} ∥}_{1}^{2}) - 16 \prod_{k = 1}^{n} (r_{k} {∥ w_{k} ∥}_{1}^{2})}, \frac{t}{4 \prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}}, \\ \frac{t}{4 (\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} r_{k} log (r_{k}) + ∥ w_{k} ∥_{1}) - \prod_{k = 1}^{n} {∥ w_{k} ∥}_{1})}\}) . \end{matrix}

Plugging in

t = \frac{1}{2} \prod_{k = 1}^{n} r_{k} {∥ w_{k} ∥}_{1}

, and replacing the constant c with a different constant

c^{'}

, we have

\begin{matrix} P \{(I) \geq \frac{1}{2} \prod_{k = 1}^{n} r_{k} {∥ w_{k} ∥}_{1}\} \\ \leq & 2 exp (- c^{'} \cdot min \{\frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 r_{k} (∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1})^{2} + 1)) - 1}, \prod_{k = 1}^{n} r_{k}, \\ \frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} / ∥ w_{k} ∥_{1} r_{k} log (r_{k}) + 1)) - 1}\}) . \end{matrix}

(A7)

Next we turn to the second term

(I I)

. We write

(I I) = 4 (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) ξ^{T} ξ = 2 \prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1}) + 4 (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) ({∥ ξ ∥}_{2}^{2} - \frac{1}{2} \prod_{k = 1}^{n} r_{k})

and bound the error term

4 (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) ({∥ ξ ∥}_{2}^{2} - \frac{1}{2} \prod_{k = 1}^{n} r_{k})

with high probability. Observe that

{∥ ξ ∥}_{2}^{2} - \frac{1}{2} \prod_{k = 1}^{n} r_{k}

is a zero-mean subgaussian random variable, and thus satisfies for all t > 0 that

P \{|{∥ ξ ∥}_{2}^{2} - \frac{1}{2} \prod_{k = 1}^{n} r_{k}| \geq t\} \leq 2 exp (\frac{- c^{''} t^{2}}{\prod_{k = 1}^{n} r_{k}})

for some constant

c^{″}

. Thus, for any

t > 0

we have

P \{|4 (\prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}) ({∥ ξ ∥}_{2}^{2} - \frac{1}{2} \prod_{k = 1}^{n} r_{k})| \geq t\} \leq 2 exp (\frac{- c^{^{''}} t^{2}}{16 \prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1}^{2})}) .

Thus,

\begin{matrix} P \{|(I I) - 2 \prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1})| \geq \frac{1}{2} \prod_{k = 1}^{n} r_{k} {∥ w_{k} ∥}_{1}\} & \leq & 2 exp (\frac{- c^{^{''}}}{64} \prod_{k = 1}^{n} r_{k}) . \end{matrix}

(A8)

Combing (A7) and (A8), we can conclude that with probability at least

\begin{matrix} 1 - 2 exp (- c^{″} \prod_{k = 1}^{n} r_{k}) - 2 exp (- c^{'} \cdot min \{\frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 r_{k} (∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1})^{2} + 1)) - 1}, \\ \prod_{k = 1}^{n} r_{k}, \frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} / ∥ w_{k} ∥_{1} r_{k} log (r_{k}) + 1)) - 1}\}), \end{matrix}

the following holds

\begin{matrix} {∥H ⊡ (T - \tilde{T})∥}_{F}^{2} & = & (I) + (I I) \\ \geq & 2 \prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1}) - | I I - 2 \prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1}) | - (I) \\ \geq & \prod_{k = 1}^{n} (r_{k} ∥ w_{k} ∥_{1}) = (\prod_{k = 1}^{n} r_{k}) {∥ H ⊡ B ∥}_{F}^{2} . \end{matrix}

By a union of bound over all of the points in S, we establish items 1 and 3 of the lemma. □

Now we are ready to prove the lower bound in Theorem 3. First we give a formal statement for the lower bound in Theorem 3 by introducing the constant

C^{'}

to characterize the “flatness” of

W

.

Theorem A7

(Lower bound for low-rank tensor when

W

is flat and

Ω \sim W

). Let

W = w_{1} \otimes \dots \otimes w_{n} \in R^{d_{1} \times \dots \times d_{n}}

be a CP rank-1 tensor so that all

(i_{1}, \dots, i_{n}) \in [d_{1}] \times \dots \times [d_{n}]

with

{∥ W ∥}_{\infty} \leq 1

, so that

max_{i_{k}} | w_{k} (i_{k}) | \leq C^{'} min_{i_{k}} | w_{k} (i_{k}) |, for all k = 1, \dots, n .

Suppose that we choose each

(i_{1}, \dots, i_{n}) \in [d_{1}] \times \dots \times [d_{n}]

independently with probability

W_{i_{1} \dots i_{n}}

to form a set

Ω \subseteq [d_{1}] \times \dots \times [d_{n}]

. Then with probability at least

1 - exp (- C \cdot m)

over the choice of Ω, the following holds:

Let

σ, β > 0

and let

K_{r} \subseteq R^{d_{1} \times \dots \times d_{n}}

be the cone of the tensor with Tucker rank

r = [r_{1} \dots r_{n}]

. For any algorithm

A : R^{Ω} \to R^{d_{1} \times \dots \times d_{n}}

that takes as input

T_{Ω} + Z_{Ω}

and outputs a guess

\hat{T}

for

T

, for

T \in K_{r} \cap β B_{\infty}

and

Z_{i_{1} \dots i_{n}} \sim N (0, σ^{2})

, then there is some

T \in K_{r} \cap β B_{\infty}

so that

\begin{matrix} \frac{∥ W^{(1 / 2)} ⊡ (A (T_{Ω} + Z_{Ω}) - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}} \\ \geq & c \cdot min \{\frac{β}{\sqrt{log (8 \prod_{k = 1}^{n} d_{k})}}, \frac{σ}{\sqrt{| Ω |}} \sqrt{\prod_{k = 1}^{n} r_{k}} \cdot min \{\sqrt{\frac{1}{(\prod_{k = 1}^{n} (1 + 2 C^{' 2} r_{k} / d_{k})) - 1}}, \\ 1, \sqrt{\frac{1}{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{r_{k} / d_{k} log (r_{k})} + 2 C^{'} r_{k} / d_{k} log (r_{k}) + 1)) - 1}}\}\}, \end{matrix}

with probability at least

\frac{1}{2}

over the randomness of

A

and the choice of

Z

. Above c, C are constants which depend only on

C^{'}

.

Proof.

Let

m = ∥ W^{(1 / 2)} ∥_{F}^{2} = \prod_{k = 1}^{n} {∥ w_{k} ∥}_{1}

, so that

E | Ω | = m

.

We instantiate Lemma A4 with

H = W^{(1 / 2)}

and

B

being the tensor whose entries are all 1. Let S be the set guaranteed by Lemma A4. We have

max_{T \in S} {∥ T ∥}_{\infty} \leq \sqrt{\frac{1}{2} log (8 \prod_{k = 1}^{n} d_{k}) \prod_{k = 1}^{n} r_{k}} .

and

max_{T \in S} ∥ T_{Ω} ∥_{F} \leq 2 \sqrt{\prod_{k = 1}^{n} r_{k}} {∥ B_{Ω} ∥}_{F} = 2 \sqrt{| Ω | \prod_{k = 1}^{n} r_{k}} .

We also have

∥ W^{(1 / 2)} ⊡ (T - T^{'}) ∥_{F} \geq \sqrt{\prod_{k = 1}^{n} r_{k}} {∥ W^{(1 / 2)} ∥}_{F} = \sqrt{m \prod_{k = 1}^{n} r_{k}}

for

T \neq T^{'} \in S

. Using the assumption that

w_{k}

are flat, the size of the set S is bigger than or equal to

\begin{matrix} N & = & C exp (c \cdot min \{\frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 r_{k} (∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1})^{2} + 1)) - 1}, \prod_{k = 1}^{n} r_{k}, \\ \frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 ∥ w_{k} ∥_{2} / ∥ w_{k} ∥_{1} \sqrt{r_{k} log (r_{k})} + 2 ∥ w_{k} ∥_{\infty} / ∥ w_{k} ∥_{1} r_{k} log (r_{k}) + 1)) - 1}\}) \\ \geq & C exp (c \cdot min \{\frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 C^{' 2} r_{k} / d_{k} + 1)) - 1}, \prod_{k = 1}^{n} r_{k}, \\ \frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{r_{k} log (r_{k}) / d_{k}} + 2 C^{'} r_{k} log (r_{k}) / d_{k} + 1)) - 1}\}) \\ \geq & exp (C^{''} \cdot min \{\frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 C^{' 2} r_{k} / d_{k} + 1)) - 1}, \prod_{k = 1}^{n} r_{k}, \\ \frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{r_{k} log (r_{k}) / d_{k}} + 2 C^{'} r_{k} log (r_{k}) / d_{k} + 1)) - 1}\}), \end{matrix}

where

C^{″}

depends on c and C. Set

\begin{matrix} κ = min \{\frac{β}{\sqrt{\frac{1}{2} log (8 \prod_{k = 1}^{n} d_{k}) \prod_{k = 1}^{n} r_{k}}}, \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω |}} \sqrt{\frac{\prod_{k = 1}^{n} d_{k}}{(\prod_{k = 1}^{n} (d_{k} + 2 C^{' 2} r_{k})) - \prod_{k = 1}^{n} d_{k}}}, \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω |}}, \\ \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω |}} \sqrt{\frac{\prod_{k = 1}^{n} d_{k}}{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{d_{k} r_{k} log (r_{k})} + 2 C^{'} r_{k} log (r_{k}) + d_{k})) - \prod_{k = 1}^{n} d_{k}}}\} . \end{matrix}

Observe that

\frac{σ \sqrt{log | S |}}{4 {max}_{T \in S} {∥ T_{Ω} ∥}_{F}} \geq \frac{σ \sqrt{log (N)}}{4 {max}_{T \in S} {∥ T_{Ω} ∥}_{F}}

and

\begin{matrix} \frac{σ \sqrt{log (N)}}{4 {max}_{T \in S} {∥ T_{Ω} ∥}_{F}} \\ \geq & \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω | \prod_{k = 1}^{n} r_{k}}} \cdot min \{\frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 C^{' 2} r_{k} / d_{k} + 1)) - 1}, \prod_{k = 1}^{n} r_{k}, \\ \frac{\prod_{k = 1}^{n} r_{k}}{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{r_{k} log (r_{k}) / d_{k}} + 2 C^{'} r_{k} log (r_{k}) / d_{k} + 1)) - 1}\} \\ = & \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω |}} \cdot min \{\sqrt{\frac{\prod_{k = 1}^{n} d_{k}}{(\prod_{k = 1}^{n} (d_{k} + 2 C^{' 2} r_{k})) - \prod_{k = 1}^{n} d_{k}}}, 1, \\ \sqrt{\frac{\prod_{k = 1}^{n} d_{k}}{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{d_{k} r_{k} log (r_{k})} + 2 C^{'} r_{k} log (r_{k}) + d_{k})) - \prod_{k = 1}^{n} d_{k}}}\} \geq κ, \end{matrix}

so this is a legitimate choice of

κ

in Lemma A3. Next, we verify that

κ S \subseteq K \cap β B_{\infty}

. Indeed, we have

κ max_{S} {∥ T ∥}_{\infty} \leq κ \sqrt{\frac{1}{2} log (8 \prod_{k = 1}^{n} d_{k}) \prod_{k = 1}^{n} r_{k}} \leq β,

so

κ S \subseteq β B_{\infty}

, and every element of

S

has Tucker rank

r

by construction.

Then Lemma A3 concludes that if

A

works on

K_{r} \cap β B_{\infty}

, then there is a tensor

T \in K_{r} \cap β B_{\infty}

so that

\begin{matrix} ∥ W^{(1 / 2)} ⊡ (A (T_{Ω} + Z_{Ω}) - T) ∥_{F} \\ \geq & \frac{κ}{2} min_{T \neq T^{'} \in S} {∥ W^{(1 / 2)} ⊡ (T - T^{'}) ∥}_{F} \\ \geq & \frac{1}{2} min \{\frac{β}{\sqrt{\frac{1}{2} log (8 \prod_{k = 1}^{n} d_{k}) \prod_{k = 1}^{n} r_{k}}}, \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω |}} \sqrt{\frac{\prod_{k = 1}^{n} d_{k}}{(\prod_{k = 1}^{n} (d_{k} + 2 C^{' 2} r_{k})) - \prod_{k = 1}^{n} d_{k}}}, \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω |}}, \\ \frac{σ \sqrt{C^{''}}}{8 \sqrt{| Ω |}} \sqrt{\frac{\prod_{k = 1}^{n} d_{k}}{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{d_{k} r_{k} log (r_{k})} + 2 C^{'} r_{k} log (r_{k}) + d_{k})) - \prod_{k = 1}^{n} d_{k}}}\} \sqrt{m \prod_{k = 1}^{n} r_{k}} \\ = & min \{\frac{β \sqrt{m}}{\sqrt{2 log (8 \prod_{k = 1}^{n} d_{k})}}, \frac{σ \sqrt{C^{''} m}}{16 \sqrt{| Ω |}} \sqrt{\prod_{k = 1}^{n} r_{k}} \cdot min \{\frac{1}{\sqrt{(\prod_{k = 1}^{n} (1 + 2 C^{' 2} r_{k} / d_{k})) - 1}}, \\ 1, \frac{1}{\sqrt{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{r_{k} / d_{k} log (r_{k})} + 2 C^{'} r_{k} / d_{k} log (r_{k}) + 1)) - 1}}\}\} . \end{matrix}

Additionally, by Lemma A2, we conclude that

\begin{matrix} \frac{∥ W^{(1 / 2)} ⊡ (A (T_{Ω} + Z_{Ω}) - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}} \\ \geq & \tilde{c} \cdot min \{\frac{β}{\sqrt{log (8 \prod_{k = 1}^{n} d_{k})}}, \frac{σ}{\sqrt{| Ω |}} \sqrt{\prod_{k = 1}^{n} r_{k}} \cdot min \{\frac{1}{\sqrt{(\prod_{k = 1}^{n} (1 + 2 C^{' 2} r_{k} / d_{k})) - 1}}, \\ 1, \frac{1}{\sqrt{(\prod_{k = 1}^{n} (2 C^{'} \sqrt{r_{k} / d_{k} log (r_{k})} + 2 C^{'} r_{k} / d_{k} log (r_{k}) + 1)) - 1}}\}\}, \end{matrix}

where

\tilde{c}

depends on the above constants. □

Remark A1.

Consider the special case when

T \in R^{d_{1} \times d_{2}}

with

d_{1} \leq d_{2}

. Then we can consider the reconstruction of S in Lemma A4 with

H = W^{(1 / 2)}

,

B

being the tensor whose entries are all 1,

C \in {\pm 1}^{r \times d_{2}}

,

^{1} U \in {\pm 1}^{d_{1} \times r}

and

^{2} U \in {\pm 1}^{d_{2} \times d_{2}}

which implies that

r_{1} = r

and

r_{2} = d_{2}

. Thus, we have

\begin{matrix} \frac{∥ W^{(1 / 2)} ⊡ (A (T_{Ω} + Z_{Ω}) - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}} \geq \tilde{c} \cdot min \{\frac{σ}{\sqrt{| Ω |}} \sqrt{r d_{2}}, \frac{β}{\sqrt{log (8 d_{1} d_{2})}}\}, \end{matrix}

which has the same bound as the one in ([19] Lemma 28).

References

Hitchcock, F.L. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 1927, 6, 164–189. [Google Scholar] [CrossRef]
Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Zare, A.; Ozdemir, A.; Iwen, M.A.; Aviyente, S. Extension of PCA to higher order data structures: An introduction to tensors, tensor decompositions, and tensor PCA. Proc. IEEE 2018, 106, 1341–1358. [Google Scholar] [CrossRef] [Green Version]
Ge, H.; Caverlee, J.; Zhang, N.; Squicciarini, A. Uncovering the spatio-temporal dynamics of memes in the presence of incomplete information. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 1493–1502. [Google Scholar]
Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 208–220. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Shang, F.; Cheng, H.; Cheng, J.; Tong, H. Factor matrix trace norm minimization for low-rank tensor completion. In Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 24–26 April 2014; pp. 866–874. [Google Scholar]
Song, Q.; Ge, H.; Caverlee, J.; Hu, X. Tensor Completion Algorithms in Big Data Analytics. ACM Trans. Knowl. Discov. Data 2019, 13, 1–48. [Google Scholar] [CrossRef]
Kressner, D.; Steinlechner, M.; Vandereycken, B. Low-rank tensor completion by Riemannian optimization. BIT Numer. Math. 2014, 54, 447–468. [Google Scholar] [CrossRef] [Green Version]
Ermiş, B.; Acar, E.; Cemgil, A.T. Link prediction in heterogeneous data via generalized coupled tensor factorization. Data Min. Knowl. Discov. 2015, 29, 203–236. [Google Scholar] [CrossRef]
Symeonidis, P.; Nanopoulos, A.; Manolopoulos, Y. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, Switzerland, 23–25 October 2008; pp. 43–50. [Google Scholar]
Cai, J.F.; Candès, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optim 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
Cai, T.T.; Zhou, W.X. Matrix completion via max-norm constrained optimization. Electron. J. Stat. 2016, 10, 1493–1525. [Google Scholar] [CrossRef]
Candes, E.J.; Plan, Y. Matrix completion with noise. Proc. IEEE 2010, 98, 925–936. [Google Scholar] [CrossRef] [Green Version]
Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717. [Google Scholar] [CrossRef] [Green Version]
Amit, Y.; Fink, M.; Srebro, N.; Ullman, S. Uncovering shared structures in multiclass classification. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 17–24. [Google Scholar]
Cai, H.; Cai, J.F.; Wang, T.; Yin, G. Accelerated Structured Alternating Projections for Robust Spectrally Sparse Signal Recovery. IEEE Trans. Signal Process. 2021, 69, 809–821. [Google Scholar] [CrossRef]
Gleich, D.F.; Lim, L.H. Rank aggregation via nuclear norm minimization. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 60–68. [Google Scholar]
Liu, Z.; Vandenberghe, L. Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 2009, 31, 1235–1256. [Google Scholar] [CrossRef]
Foucart, S.; Needell, D.; Pathak, R.; Plan, Y.; Wootters, M. Weighted matrix completion from non-random, non-uniform sampling patterns. IEEE Trans. Inf. Theory 2020, 67, 1264–1290. [Google Scholar] [CrossRef]
Bhojanapalli, S.; Jain, P. Universal matrix completion. arXiv 2014, arXiv:1402.2324. [Google Scholar]
Heiman, E.; Schechtman, G.; Shraibman, A. Deterministic algorithms for matrix completion. Random Struct. Algorithms 2014, 45, 306–317. [Google Scholar] [CrossRef]
Li, Y.; Liang, Y.; Risteski, A. Recovery guarantee of weighted low-rank approximation via alternating minimization. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2358–2367. [Google Scholar]
Pimentel-Alarcón, D.L.; Boston, N.; Nowak, R.D. A characterization of deterministic sampling patterns for low-rank matrix completion. IEEE J. Sel. Top. Signal Process. 2016, 10, 623–636. [Google Scholar] [CrossRef] [Green Version]
Shapiro, A.; Xie, Y.; Zhang, R. Matrix completion with deterministic pattern: A geometric perspective. IEEE Trans. Signal Process. 2018, 67, 1088–1103. [Google Scholar] [CrossRef] [Green Version]
Ashraphijuo, M.; Aggarwal, V.; Wang, X. On deterministic sampling patterns for robust low-rank matrix completion. IEEE Signal Process. Lett. 2017, 25, 343–347. [Google Scholar] [CrossRef] [Green Version]
Ashraphijuo, M.; Wang, X.; Aggarwal, V. Rank determination for low-rank data completion. J. Mach. Learn. Res. 2017, 18, 3422–3450. [Google Scholar]
Pimentel-Alarcón, D.L.; Nowak, R.D. A converse to low-rank matrix completion. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 96–100. [Google Scholar]
Chatterjee, S. A deterministic theory of low rank matrix completion. arXiv 2019, arXiv:1910.01079. [Google Scholar]
Király, F.J.; Theran, L.; Tomioka, R. The algebraic combinatorial approach for low-rank matrix completion. arXiv 2012, arXiv:1211.4116. [Google Scholar]
Eftekhari, A.; Yang, D.; Wakin, M.B. Weighted matrix completion and recovery with prior subspace information. IEEE Trans. Inf. Theory 2018, 64, 4044–4071. [Google Scholar] [CrossRef] [Green Version]
Negahban, S.; Wainwright, M.J. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 2012, 13, 1665–1697. [Google Scholar]
Lee, T.; Shraibman, A. Matrix completion from any given set of observations. In Proceedings of the Advances in Neural Information Processing Systems, Red Hook, NY, USA, 5–10 December 2013; pp. 1781–1787. [Google Scholar]
Gandy, S.; Recht, B.; Yamada, I. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl. 2011, 27, 025010. [Google Scholar] [CrossRef] [Green Version]
Ashraphijuo, M.; Wang, X. Fundamental conditions for low-CP-rank tensor completion. J. Mach. Learn. Res. 2017, 18, 2116–2145. [Google Scholar]
Barak, B.; Moitra, A. Noisy tensor completion via the sum-of-squares hierarchy. In Proceedings of the Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; pp. 417–445. [Google Scholar]
Jain, P.; Oh, S. Provable tensor factorization with missing data. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1431–1439. [Google Scholar]
Goldfarb, D.; Qin, Z. Robust low-rank tensor recovery: Models and algorithms. SIAM J. Matrix Anal. Appl. 2014, 35, 225–253. [Google Scholar] [CrossRef] [Green Version]
Mu, C.; Huang, B.; Wright, J.; Goldfarb, D. Square deal: Lower bounds and improved relaxations for tensor recovery. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 73–81. [Google Scholar]
Hitchcock, F.L. Multiple invariants and generalized rank of a p-way matrix or tensor. J. Math. Phys. 1928, 7, 39–79. [Google Scholar] [CrossRef]
Kruskal, J.B. Rank, decomposition, and uniqueness for 3-way and N-way arrays. In Multiway Data Analysis; North-Holland Publishing Co.: Amsterdam, The Netherlands, 1989; pp. 7–18. [Google Scholar]
Acar, E.; Yener, B. Unsupervised multiway data analysis: A literature survey. IEEE Trans. Knowl. Data Eng. 2008, 21, 6–20. [Google Scholar] [CrossRef]
Sidiropoulos, N.D.; De Lathauwer, L.; Fu, X.; Huang, K.; Papalexakis, E.E.; Faloutsos, C. Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 2017, 65, 3551–3582. [Google Scholar] [CrossRef]
Carroll, J.D.; Chang, J.J. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 1970, 35, 283–319. [Google Scholar] [CrossRef]
Bro, R. PARAFAC. Tutorial and applications. Chemom. Intell. Lab. Syst. 1997, 38, 149–172. [Google Scholar] [CrossRef]
Kiers, H.A.; Ten Berge, J.M.; Bro, R. PARAFAC2—Part I. A direct fitting algorithm for the PARAFAC2 model. J. Chemometr. 1999, 13, 275–294. [Google Scholar] [CrossRef]
Tomasi, G.; Bro, R. PARAFAC and missing values. Chemom. Intell. Lab. Syst. 2005, 75, 163–180. [Google Scholar] [CrossRef]
Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279–311. [Google Scholar] [CrossRef] [PubMed]
De Lathauwer, L.; De Moor, B.; Vandewalle, J. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 2000, 21, 1253–1278. [Google Scholar] [CrossRef] [Green Version]
Kroonenberg, P.M.; De Leeuw, J. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 1980, 45, 69–97. [Google Scholar] [CrossRef]
Fang, Z.; Yang, X.; Han, L.; Liu, X. A sequentially truncated higher order singular value decomposition-based algorithm for tensor completion. IEEE Trans. Cybern. 2018, 49, 1956–1967. [Google Scholar] [CrossRef] [PubMed]
Niebles, J.C.; Chen, C.W.; Li, F.F. Modeling temporal structure of decomposable motion segments for activity classification. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; pp. 392–405. [Google Scholar]
Ravichandran, A.; Chaudhry, R.; Vidal, R. Dynamic Texture Toolbox. 2011. Available online: http://www.vision.jhu.edu (accessed on 10 May 2021).
Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
Wu, Z.; Wang, Q.; Jin, J.; Shen, Y. Structure tensor total variation-regularized weighted nuclear norm minimization for hyperspectral image mixed denoising. Signal Process. 2017, 131, 202–219. [Google Scholar] [CrossRef]
Madathil, B.; George, S.N. Twist tensor total variation regularized-reweighted nuclear norm based tensor completion for video missing area recovery. Inf. Sci. 2018, 423, 376–397. [Google Scholar] [CrossRef]
Yao, J.; Xu, Z.; Huang, X.; Huang, J. Accelerated dynamic MRI reconstruction with total variation and nuclear norm regularization. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 635–642. [Google Scholar]
Wang, Y.; Peng, J.; Zhao, Q.; Leung, Y.; Zhao, X.L.; Meng, D. Hyperspectral image restoration via total variation regularized low-rank tensor decomposition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 11, 1227–1243. [Google Scholar] [CrossRef] [Green Version]
Ji, T.Y.; Huang, T.Z.; Zhao, X.L.; Ma, T.H.; Liu, G. Tensor completion using total variation and low-rank matrix factorization. Inf. Sci. 2016, 326, 243–257. [Google Scholar] [CrossRef]
Li, X.; Ye, Y.; Xu, X. Low-rank tensor completion with total variation for visual data inpainting. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Chao, Z.; Huang, L.; Needell, D. Tensor Completion through Total Variation with Initialization from Weighted HOSVD. In Proceedings of the Information Theory and Applications, San Diego, CA, USA, 2–7 February 2020. [Google Scholar]
Tropp, J.A. User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 2012, 12, 389–434. [Google Scholar] [CrossRef] [Green Version]
Lancaster, P.; Farahat, H. Norms on direct sums and tensor products. Math. Comp. 1972, 26, 401–414. [Google Scholar] [CrossRef]

Figure 1. Tensor of size

100 \times 100 \times 100

using the uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 1. Tensor of size

100 \times 100 \times 100

using the uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 2. Tensor of size

50 \times 50 \times 30 \times 30

using the uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 2. Tensor of size

50 \times 50 \times 30 \times 30

using the uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 3. Tensor of size

100 \times 100 \times 100

using the non-uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 3. Tensor of size

100 \times 100 \times 100

using the non-uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 4. Tensor of size

50 \times 50 \times 30 \times 30

using the non-uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 4. Tensor of size

50 \times 50 \times 30 \times 30

using the non-uniform sampling pattern: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

. The lines labeled as HOSVD, HOSVD-p and HOSVD-w represent the results for solving (4), (5) and (6), respectively.

Figure 5. Tensor of size

100 \times 100 \times 100

using the non-uniform sampling pattern and with the SV-rank as the input rank: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

.

Figure 5. Tensor of size

100 \times 100 \times 100

using the non-uniform sampling pattern and with the SV-rank as the input rank: plots the errors of the form

\frac{∥ W^{(1 / 2)} ⊡ (\hat{T} - T) ∥_{F}}{∥ W^{(1 / 2)} ∥_{F}}

.

Figure 6. The first frame of videos [50].

Figure 7. Relation between relative error and sampling rate for the dataset “candle_4_A” using

[5, 5, 5]

as the input rank for HOSVD process. The left figure records the relative error for the uniform sampling pattern and the right figure for the non-uniform sampling pattern. The sampling error stands for the relative error between the original video and the video with masked entries estimated to be zeros, hence should approximately equal to

\sqrt{1 - S R}

, where SR is the sampling rate.

Figure 7. Relation between relative error and sampling rate for the dataset “candle_4_A” using

[5, 5, 5]

as the input rank for HOSVD process. The left figure records the relative error for the uniform sampling pattern and the right figure for the non-uniform sampling pattern. The sampling error stands for the relative error between the original video and the video with masked entries estimated to be zeros, hence should approximately equal to

\sqrt{1 - S R}

, where SR is the sampling rate.

Figure 8. Convergence comparison between total variation minimization (TVM) with HOSVD-w,

0

, and HOSVD as initialization on video 1 with SR = 50%: (a) the relative error

\frac{∥ \hat{T} {- T ∥}_{F}}{{∥ T ∥}_{F}}

vs. number of iterations. (b) the relative error v.s. total computational CPU time(initialization + completion).

Figure 8. Convergence comparison between total variation minimization (TVM) with HOSVD-w,

0

, and HOSVD as initialization on video 1 with SR = 50%: (a) the relative error

\frac{∥ \hat{T} {- T ∥}_{F}}{{∥ T ∥}_{F}}

vs. number of iterations. (b) the relative error v.s. total computational CPU time(initialization + completion).

Table 1. Signal to noise ratio (SNR) and elapsed time (in second) for Higher Order Singular Value Decomposition (HOSVD) and HOSVD-w on video data with uniform sampling pattern. The HOSVD-w and HOSVD-p behave very similar for uniform sampling hence we integrate the results into one column.

SR	Input Rank	HOSVD-w+TV	HOSVD	HOSVD-w/HOSVD-p	TVM
10%	$[7 17 3 5]$	13.29 (16.3 s)	1.27 (3.74 s)	10.15 (11.4 s)	13.04 (41.3 s)
30%	$[18 10 3 6]$	16.96 (14.0 s)	4.26 (4.01 s)	12.05 (7.23 s)	17.05 (29.7 s)
50%	$[26 4 3 11]$	19.60 (12.2 s)	8.21 (2.99 s)	14.59 (7.03 s)	19.68 (23.8 s)
80%	$[47 47 3 22]$	24.90 (11.5 s)	17.29 (6.55 s)	19.75 (8.08 s)	25.01 (18.1 s)
10%	$[28 6 3 7]$	10.98 (13.1 s)	1.19 (4.20 s)	7.88 (8.76 s)	10.89 (42.2 s)
30%	$[34 18 3 15]$	14.44 (16.1 s)	4.11 (3.80 s)	10.40 (7.51 s)	14.50 (31.4 s)
50%	$[35 33 3 9]$	16.95 (15.3 s)	7.85 (5.86 s)	12.84 (7.64 s)	16.96 (26.6 s)
80%	$[56 50 3 21]$	22.21 (15.1 s)	16.51 (7.24 s)	18.64 (8.45 s)	22.19 (18.4 s)
10%	$[12 9 3 10]$	12.34 (16.1 s)	1.22 (2.73 s)	8.46 (9.88 s)	12.23 (45.7 s)
30%	$[20 24 3 11]$	17.10 (15.3 s)	4.24 (3.17 s)	11.62 (7.62 s)	17.19 (35.3 s)
50%	$[25 32 3 14]$	20.44 (12.3 s)	8.20 (3.92 s)	14.54 (5.85 s)	20.49 (28.9 s)
80%	$[50 72 3 30]$	26.80 (12.4 s)	18.03 (8.40 s)	21.38 (8.93s)	26.71 (20.9 s)

Table 2. Signal to noise ratio (SNR) for HOSVD and HOSVD-w on video data with non-uniform sampling pattern.

SR	Input Rank	HOSVD	HOSVD-w	HOSVD-p
10%	$[6 13 3 3]$	1.09	10.07	5.56
30%	$[10 28 3 16]$	3.74	11.81	7.53
50%	$[21 41 3 14]$	7.05	13.22	10.73
80%	$[44 57 3 26]$	15.76	19.60	17.39
10%	$[38 11 3 2]$	1.13	8.04	4.33
30%	$[26 19 3 16]$	3.79	10.13	6.80
50%	$[30 27 3 10]$	7.15	12.57	10.14
80%	$[53 50 3 23]$	14.81	18.55	16.31
10%	$[16 11 3 2]$	1.09	8.31	4.73
30%	$[17 23 3 17]$	3.76	11.05	6.87
50%	$[24 38 3 14]$	7.18	13.78	9.99
80%	$[47 69 3 22]$	15.88	20.82	16.02

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chao, Z.; Huang, L.; Needell, D. HOSVD-Based Algorithm for Weighted Tensor Completion. J. Imaging 2021, 7, 110. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging7070110

AMA Style

Chao Z, Huang L, Needell D. HOSVD-Based Algorithm for Weighted Tensor Completion. Journal of Imaging. 2021; 7(7):110. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging7070110

Chicago/Turabian Style

Chao, Zehan, Longxiu Huang, and Deanna Needell. 2021. "HOSVD-Based Algorithm for Weighted Tensor Completion" Journal of Imaging 7, no. 7: 110. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging7070110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HOSVD-Based Algorithm for Weighted Tensor Completion

Abstract

1. Introduction

1.1. Contributions

1.2. Organization

2. Related Work, Background, and Problem Statement

2.1. Matrix Completion

2.2. Tensor Completion Problem

2.2.1. Preliminaries and Notations

2.2.2. CP-Based Method for Tensor Completion

2.2.3. HOSVD-Based Method for Tensor Completion

2.2.4. Tensor Completion Problem under Study

3. Main Results

3.1. General Upper Bound

3.2. Results for Weighted HOSVD Algorithm

3.2.1. General Upper Bound for Weighted HOSVD

3.2.2. Case Study: When Ω ∼ W

4. Experiments

4.1. Simulations for Uniform Sampling Pattern

4.2. Simulation for Non-Uniform Sampling Pattern

4.3. Test for Real Data

4.4. The Application of Weighted HOSVD on Total Variation Minimization

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof for Theorem 1

Appendix B. Proof of Theorems 2 and 3

Appendix B.1. General Upper Bound for Weighted HOSVD Algorithm

Appendix B.2. Case Study: Ω∼W

Appendix B.2.1. Upper Bound

Appendix B.2.2. Lower Bound

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.2. Case Study: When $Ω \sim W$