Next Article in Journal
stigLD: Stigmergic Coordination in Linked Systems
Next Article in Special Issue
Generalized Three-Step Numerical Methods for Solving Equations in Banach Spaces
Previous Article in Journal
Border Irrigation Modeling with the Barré de Saint-Venant and Green and Ampt Equations
Previous Article in Special Issue
An Iterative Algorithm for Approximating the Fixed Point of a Contractive Affine Operator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gradient-Based Optimization Algorithm for Solving Sylvester Matrix Equation

1
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
2
Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
*
Author to whom correspondence should be addressed.
Submission received: 14 February 2022 / Revised: 17 March 2022 / Accepted: 22 March 2022 / Published: 24 March 2022
(This article belongs to the Special Issue Numerical Methods for Solving Nonlinear Equations)

Abstract

:
In this paper, we transform the problem of solving the Sylvester matrix equation into an optimization problem through the Kronecker product primarily. We utilize the adaptive accelerated proximal gradient and Newton accelerated proximal gradient methods to solve the constrained non-convex minimization problem. Their convergent properties are analyzed. Finally, we offer numerical examples to illustrate the effectiveness of the derived algorithms.

1. Introduction

Matrix equations are ubiquitous in signal processing [1], control theory [2], and linear systems [3]. Most time-dependent models accounting for the prediction, simulation, and control of real-world phenomena may be represented as linear or nonlinear dynamical systems. Therefore, the relevance of matrix equations within engineering applications largely explains the great effort put forth by the scientific community into their numerical solution. Linear matrix equations have an important role in the stability analysis of linear dynamical systems and the theoretical development of the nonlinear system. The Sylvester matrix equation was first proposed by Sylvester and produced from the research of relevant fields in applied mathematical cybernetics. It is a famous matrix equation that occurs in linear and generalized eigenvalue problems for the computation of invariant subspaces using Riccati equations [4,5,6]. The Sylvester matrix equation takes part in linear algebra [7,8,9], image processing [10], model reduction [11], and numerical methods for differential equations [12,13].
We consider the Sylvester matrix equation of the form
A X + X B = C ,
where A R m × m , B R n × n , C R m × n are given matrices, and X R m × n is an unknown matrix to be solved. We discuss a special form of the Sylvester matrix equation, in which A and B are symmetric positive definite.
Recently, there has been a lot of discussion on the solution and numerical calculation of the Sylvester matrix equation. The standard methods for solving this equation are the Bartels–Stewart method [14] and the Hessenberg–Schur method [15], which are efficient for small and dense system matrices. When system matrices are small, the block Krylov subspace methods [16,17] and global Krylov subspace methods [18] are proposed. These methods use the global Arnoldi process, block Arnoldi process, or nonsymmetric block Lanczos process to produce low-dimensional Sylvester matrix equations. More feasible methods for solving large and sparse problems are iterative methods. When system matrices are large, there are some effective methods such as the alternating direction implicit (ADI) method [19], global full orthogonalization method, global generalized minimum residual method [20], gradient-based iterative method [21], and global Hessenberg and changing minimal residual with Hessenberg process method [22]. When system matrices are low-rank, the ADI method [23], block Arnoldi method [17], preconditioned block Arnoldi method [24], and extended block Arnoldi method [25] and its variants [26,27], including the global Arnoldi method [28,29] and extended global Arnoldi method [25], are proposed to obtain the low-rank solution.
The adaptive accelerated proximal gradient (A-APG) method [30] is an efficient numerical method for calculating the steady states of the minimization problem, motivated by the accelerated proximal gradient (APG) method [31], which has wide applications in image processing and machine learning. In each iteration, the A-APG method takes the step size by using a line search initialized with the Barzilai–Borwein (BB) step [32] to accelerate the numerical speed. Moreover, as the traditional APG method is proposed for the convex problem and its oscillation phenomenon slows down the convergence, the restart scheme has been used for speeding up the convergence. For more details, one can refer to [30] and the references therein.
The main contribution is to study gradient-based optimization methods such as the A-APG and Newton-APG methods for solving the Sylvester matrix equation through transforming this equation into an optimization problem by using Kronecker product. The A-APG and Newton-APG methods are theoretically guaranteed to converge to a global solution from an arbitrary initial point and achieve high precision. These methods are especially efficient for large and sparse coefficient matrices.
The rest of this paper is organized as follows. In Section 2, we transform this equation into an optimization problem by using the Kronecker product. In Section 3, we apply A-APG and Newton-APG algorithms to solve the optimization problem and compare them with other methods. In Section 4, we focus on the convergence analysis of the A-APG method. In Section 5, the computational complexity of these algorithms is analyzed exhaustively. In Section 6, we offer corresponding numerical examples to illustrate the effectiveness of the derived methods.
Throughout this paper, let R n × m be the set of all n × m real matrices. I n is the identity matrix of order n. If A R n × n , the symbols A T , A 1 , A and t r ( A ) express the transpose, the inverse, the 2-norm, and the trace of A, respectively. The inner product in matrix space E is x , y = t r ( x , y ) , x , y E .

2. The Variant of an Optimization Problem

In this section, we transform the Sylvester equation into an optimization problem. We recall some definitions and lemmas.
Definition 1.
Let Y = ( y i j ) R m × n , Z R p × q , the Kronecker product of Y and Z be defined by
Y Z = y 11 Z y 12 Z y 1 n Z y 21 Z y 22 Z y 2 n Z y m 1 Z y m 2 Z y m n Z .
Definition 2.
If Y R m × n , then the straightening operator v e c : R m × n R m n of Y is
v e c ( Y ) = ( y 1 T , y 2 T , , y n T ) T .
Lemma 1.
Let Y R l × m , Z R m × n , W R n × k , then
v e c ( Y Z W ) = ( W T Y ) v e c ( Z ) .
From Lemma 1, the Sylvester Equation (1) can be rewritten as
( I n A + B T I m ) v e c ( X ) = v e c ( C ) .
Lemma 2.
Let A be a symmetric positive matrix; solving the equation A x = b is equivalent to obtaining the minimum of φ ( x ) = x T A x 2 b T x .
According to Lemma 2 and Equation (2), define
A ¯ = ( I n A + B T I m ) , x ¯ = v e c ( X ) , b ¯ = v e c ( C ) .
Therefore, Equation (2) should be A ¯ x ¯ = b ¯ . Obviously, if A and B are symmetric positive, then A ¯ is symmetric positive. The variant of the Sylvester Equation (2) reduces to the optimization problem:
min φ ( x ) = min x ¯ T A ¯ x ¯ 2 b ¯ T x ¯ = min v e c ( X ) T ( I n A + B T I m ) v e c ( X ) 2 v e c ( X ) T v e c ( C ) = min v e c ( X ) T · v e c ( A X ) + v e c ( X ) T · v e c ( X B ) 2 v e c ( X ) T · v e c ( C ) = min t r ( X T A X ) + t r ( X T X B ) 2 t r ( X T C ) .
Using the calculation of the matrix differential from [33], we have the following propositions immediately.
Proposition 1.
If A = ( a i j ) R m × n , X = ( x i j ) R m × n , then t r ( A T X ) X = t r ( X T A ) X = A .
Proposition 2.
If A = ( a i j ) R m × m , X = ( x i j ) R m × n , then t r ( X T A X ) X = A X + A T X .
Proposition 3.
If B = ( b i j ) R n × n , X = ( x i j ) R m × n , then t r ( X X T B ) X = X B + X B T .
Using Propositions 2 and 3, the gradient of the objective function (3) is
φ ( X ) = A X + X B + A T X + X B T 2 C .
By (4), the Hessian matrix is
2 φ ( X ) = A + A T + B + B T .

3. Iterative Methods

In this section, we will introduce the adaptive accelerated proximal gradient (A-APG) method and the Newton-APG method to solve the Sylvester equation. Moreover, we compare the A-APG and Newton-APG methods with other existing methods.

3.1. APG Method

The traditional APG method [31] is designed for solving the composite convex problem:
min x H H ( x ) = g ( x ) + f ( x ) ,
where H is the finite-dimensional Hilbert space equipped with the inner product < · , · > , g and f are both continuously convex, and f has a Lipschitz constant L. Given initializations x 1 = x 0 and t 0 = 1 , the APG method is
t k = ( 4 ( t k 1 ) 2 + 1 ) / 2 , Y k = X k + t k 1 1 t k ( X k X k 1 ) , X k + 1 = Prox g α ( Y k α f ( Y k ) ) ,
where α ( 0 , L ] and the mapping Prox g α ( · ) : R n R n is defined as
Prox g α ( x ) = argmin y g ( y ) + 1 2 α y x 2 .
Since our minimization problem is linear, we choose the explicit scheme. The explicit scheme is a simple but effective approach for the minimization problem. Given an initial value Y 0 and the step α k , the explicit scheme is
Y k + 1 = Y k α k φ ( Y k ) ,
where Y k is the approximation solution. The explicit scheme satisfies the sufficient decrease property using the gradient descent (GD) method.
Let X k and X k 1 be the current and previous states and the extrapolation weight be w k . Using the explicit method (6), the APG iterative scheme is
w k = k 2 / k + 1 , Y k = ( 1 + w k ) X k w X k 1 , Y k + 1 = Y k α k φ ( Y k ) .
Together with the standard backtracking, we adopt the step size α k when the following condition holds:
φ ( Y k ) φ ( Y k + 1 ) η Y k + 1 Y k 2 ,
for some η > 0 .
Combining (7) and (8), the APG algorithm is summarized in Algorithm 1.
Algorithm 1 APG algorithm.
Require:
X 0 , t o l , α 0 , η > 0 , β ( 0 , 1 ) , and k = 1 .
1:
while the stop condition is not satisfied do
2:
    Update Y k via Equation (7);
3:
    if Equation (8) holds then
4:
         b r e a k
5:
    else
6:
         α k = β α k ;
7:
    Calculate Y k + 1 via (7);
8:
     k = k + 1 .

3.2. Restart APG Method

Recently, an efficient and convergent numerical algorithm has been developed for solving a discretized phase-field model by combining the APG method with the restart technique [30]. Unlike the APG method, the restart technique involves choosing X k + 1 = Y k + 1 whenever the following condition holds:
φ ( X k ) φ ( Y k + 1 ) γ X k Y k + 1 2 ,
for some γ > 0 . If the condition is not met, we restart the APG by setting w k = 0 .
The restart APG method (RAPG) is summarized in Algorithm 2.
Algorithm 2 RAPG algorithm.
Require:
X 0 , t o l , α 0 , η > 0 , γ > 0 , β ( 0 , 1 ) , and k = 1 .
1:
while the stop condition is not satisfied do
2:
    Calculate Y k + 1 by APG Algorithm 1;
3:
    if Equation (9) holds then
4:
         X k + 1 = Y k + 1 and update ω k + 1 ;
5:
    else
6:
         X k + 1 = X k and reset ω k + 1 = 0 ;
7:
     k = k + 1 .

3.3. A-APG Method

In RAPG Algorithm 2, we can adaptively estimate the step size α k by using the line search technique. Define
s k : = X k X k 1 , g k : = φ ( X k ) φ ( X k 1 ) .
We initialize the search step by the Barzilai–Borwein (BB) method, i.e.,
α k = t r ( s k T s k ) t r ( s k T g k ) or t r ( g k T s k ) t r ( g k T g k ) .
Therefore, we obtain the A-APG algorithm summarized in Algorithm 3.
Algorithm 3 A-APG algorithm.
Require:
X 0 , t o l , α 0 , η > 0 , γ > 0 , β ( 0 , 1 ) , and k = 1 .
1:
while the stop condition is not satisfied do
2:
    Initialize α k by BB step Equation (10);
3:
    Update X k + 1 by RAPG Algorithm 2.

3.4. Newton-APG Method

Despite the fast initial convergence speed of the gradient-based methods, the tail convergence speed becomes slow. Therefore, we use a practical Newton method to solve the minimization problem. We obtain the initial value from A-APG Algorithm 3, and then choose the Newton direction as the gradient in the explicit scheme in RAPG Algorithm 2. Then we have the Newton-APG method shown in Algorithm 4.
Algorithm 4 Newton-APG algorithm.
Require:
X 0 , α 0 , γ > 0 , η > 0 , β ( 0 , 1 ) , ϵ , t o l and k = 1 .
1:
Obtain the initial value from A-APG Algorithm 3 by the precision ϵ ;
2:
while the stop condition is not satisfied do
3:
    Initialize α k by BB step Equation (10);
4:
    Update X k + 1 by RAPG Algorithm 2 using Newton direction.

3.5. Gradient Descent (GD) and Line Search (LGD) Methods

Moreover, we show gradient descent (GD) and line search (LGD) methods for comparing with the A-APG and Newton-APG methods. The GD and line search LGD methods are summarized in Algorithm 5.
Algorithm 5 GD and LGD algorithms.
Require:
X 0 , t o l , α 0 , η > 0 , β ( 0 , 1 ) , and k = 1 .
1:
while the stop condition is not satisfied do
2:
    if the step size is fixed then
3:
        Calculate X k + 1 via X k + 1 = X k α φ ( X k ) using GD;
4:
    else
5:
        Initialize α k by BB step Equation (10);
6:
        if Equation (8) holds then
7:
            b r e a k
8:
        else
9:
            α k = β α k ;
10:
        Calculate X k + 1 via X k + 1 via X k + 1 = X k α φ ( X k ) using LGD;
11:
     k = k + 1 .

3.6. Computational Complexity Analysis

Further, we analyze the computational complexity of each iteration of the derived algorithms.
The computation of APG is mainly controlled by matrix multiplication and addition operations in three main parts. The iterative scheme needs 4 m 2 n + 4 m n 2 + O ( m n ) computational complexity. The backtracking linear search needs 14 m 2 n + 20 n 2 m + 6 n 3 + O ( m n ) + O ( n 2 ) computational complexity defined by Equation (8). The extrapolation needs O ( m n ) computational complexity defined by the Equation (7). The total computational complexity is 18 m 2 n + 24 n 2 m + 6 n 3 + O ( m n ) + O ( n 2 ) in Algorithm 1.
The computation of RAPG is mainly controlled by matrix multiplication and addition operations in four main parts. The iterative scheme needs 4 m 2 n + 4 m n 2 + O ( m n ) computational complexity. The backtracking linear search defined by Equation (8) needs 14 m 2 n + 20 n 2 m + 6 n 3 + O ( m n ) + O ( n 2 ) computational complexity. The extrapolation defined by Equation (7) needs O ( m n ) computational complexity. The restart defined by Equation (9) needs 4 m 2 n + 14 n 2 m + 4 n 3 + O ( m n ) + O ( n 2 ) computational complexity. The total computational complexity is 22 m 2 n + 38 n 2 m + 10 n 3 + O ( m n ) + O ( n 2 ) in Algorithm 2.
The computation of A-APG is mainly controlled by matrix multiplication and addition operations in four main parts. The iterative scheme needs 4 m 2 n + 4 m n 2 + O ( m n ) computational complexity. The BB step and the backtracking linear search defined by Equations (8) and (10) need m n , 4 m 2 n + 4 m n 2 + 6 m n , 2 n 2 ( 2 m 1 ) + 2 n , and 14 m 2 n + 20 n 2 m + 6 n 3 + O ( m n ) + O ( n 2 ) computational complexity. The extrapolation defined by Equation (7) needs O ( m n ) computational complexity. The restart defined by Equation (9) needs 4 m 2 n + 14 n 2 m + 4 n 3 + O ( m n ) + O ( n 2 ) computational complexity. The total computational complexity is 26 m 2 n + 46 n 2 m + 10 n 3 + O ( m n ) + O ( n 2 ) in Algorithm 3.
The computation of Newton-APG is mainly controlled by matrix multiplication and addition operations in four main parts, different from the A-APG method. The iterative scheme needs 8 n 3 + 3 n 2 + O ( n 2 ) + O ( n 3 ) computational complexity. The BB step and the backtracking linear search defined by Equations (8) and (10) need n 2 , 8 n 3 + 6 n 2 , 2 n 2 ( 2 n 1 ) + 2 n , and 10 n 2 ( 2 n 1 ) + 8 n 3 + 3 n 2 + O ( n 3 ) + O ( n 2 ) computational complexity. The extrapolation defined by Equation (7) needs O ( n 2 ) computational complexity. The restart defined by Equation (9) needs 5 n 2 ( 2 n 1 ) + n 2 + O ( n 3 ) computational complexity. The total computational complexity is 50 n 3 10 n 2 + 2 n + O ( n 2 ) + O ( n 3 ) in Algorithm 4.
The computation of GD is mainly controlled by matrix multiplication and addition operations in Equations (4) and (6). It requires m n ( 2 m 1 ) , m n ( 2 n 1 ) , m n ( 2 m 1 ) , m n ( 2 n 1 ) computational complexity to compute A X , X B , A T X , X B T . The total computational complexity is 4 m 2 n + 4 m n 2 + O ( m n ) in Algorithm 5 using GD.
The computation of LGD is mainly controlled by matrix multiplication and addition operations in the calculation of s, g defined by Equation (8) and (10), and the calculation of GD, which require m n , 4 m 2 n + 4 m n 2 + 6 m n , 2 n 2 ( 2 m 1 ) + 2 n , 14 m 2 n + 20 n 2 m + 6 n 3 + O ( m n ) + O ( n 2 ) , and 4 m 2 n + 4 m n 2 + O ( m n ) , respectively. The total computational complexity is 22 m 2 n + 32 n 2 m + 6 n 3 + O ( m n ) + O ( n 2 ) in Algorithm 5 using GD.

4. Convergent Analysis

In this section, we focus on the convergence analysis of A-APG Algorithm 3. The following proposition is required.
Proposition 4.
Let M be a bounded region that contains { φ φ ( X 0 ) } in R n × n , then φ ( X ) satisfies the Lipschitz condition in M, i.e., there exists L M > 0 such that
φ ( X ) φ ( Y ) L M X Y f o r X , Y M .
Proof. 
Using the continuity of φ ( X ) , note that
2 φ ( X ) = ( A + A T ) + ( B + B T )
defined by (5) is bounded. Then φ ( X ) satisfies the Lipschitz condition in M. □
In recent years, the proximal method based on the Bregman distance has been applied for solving optimization problems. The proximal operator is
Prox φ α ( y ) : = argmin y { φ ( y ) + 1 2 α X X k 2 } .
Basically, given the current estimation X k and step size α k > 0 , update X k + 1 via
X k + 1 = Prox 0 α ( X k α k φ ( X k ) ) = argmin X { 1 2 α k X ( X k α k φ ( X k ) 2 } .
Thus we obtain
1 2 α k ( X k + 1 ( X k α k φ ( X k ) ) ) = 0 ,
which implies that
X k + 1 = X k α k φ ( X k ) .
This is exactly the explicit scheme in our algorithm.

4.1. Linear Search Is Well-Defined

Using the optimization from Equation (11), it is evident that
X k + 1 = argmin X { 1 2 α k X ( X k α k φ ( X k ) 2 } = argmin X { 1 2 α k X X k 2 + X X k , φ ( X k ) } = argmin X { 1 2 α k X X k 2 + X X k , φ ( X k ) + φ ( X k ) } .
Then we obtain
φ ( X k ) 1 2 α k X k + 1 X k 2 + X k + 1 X k , φ ( X k ) + φ ( X k ) φ ( X k + 1 ) + 1 2 α k X k X k + 1 2 2 φ ( X ) 2 X k X k + 1 2 φ ( X k + 1 ) + ( 1 2 α k L M 2 ) X k X k + 1 2 ,
where the second inequality follows from Taylor expansion of φ ( X k + 1 ) . By Equation (12), set
0 < α k < α ¯ : = min { 1 L M + 2 η , 1 L M + 2 γ } ,
the conditions in linear search Equation (8) and non-restart Equation (9) are both satisfied. Therefore, the backtracking linear search is well-defined.

4.2. Sufficient Decrease Property

In this section, we show the sufficient decrease property of the sequence generated by A-APG Algorithm 3. If α k satisfies the condition Equation (13), then
φ ( X k ) φ ( Y k + 1 ) ρ 1 X k Y k + 1 2 ,
where ρ 1 = min { η , γ } > 0 . Since φ is a bounded function, then there exists φ * such that φ ( X k ) φ * and φ ( X k ) φ * as k + . This implies
ρ 1 k = 0 X k + 1 X k 2 φ ( X 0 ) φ * < + ,
which shows that
lim k + X k + 1 X k = 0 .

4.3. Bounded Gradient

Define two sets Ω 2 = { k : k = 2 } and Ω 1 = N \ Ω 2 . Let w k = k 2 / k + 1 , for any k Ω 2 , then X k + 1 = Y k + 1 when w k = 0 . There exists w ¯ = k m a x 2 / k m a x + 1 [ 0 , 1 ) such that w k w ¯ as k increases. If k Ω 1 , since
Y k + 1 = argmin X { 1 2 α k X ( Y k α k φ ( Y k ) ) 2 } ,
we have
0 = φ ( Y k ) + 1 α k ( Y k + 1 Y k ) .
Thus,
φ ( Y k ) = 1 α k ( Y k X k + 1 ) .
Note that Y k = ( 1 + w k ) X k w k X k 1 , then
φ ( Y k ) = 1 α k ( 1 + w k ) X k w k X k 1 X k + 1 = 1 α k w k ( X k X k 1 ) + ( X k X k + 1 ) 1 α m i n ( w ¯ X k X k 1 + X k X k + 1 ) = c 1 ( X k + 1 X k + w ¯ X k X k 1 ) ,
where c 1 = 1 α m i n > 0 .
If k Ω 2 , then
X k + 1 = argmin X { 1 2 α k X ( X k α k φ ( X k ) ) 2 } ,
which implies that
0 = φ ( X k ) + 1 α k ( X k + 1 X k ) .
Thus
φ ( X k ) = 1 α k X k X k + 1 1 α m i n X k X k + 1 = c 1 ( X k X k + 1 ) ,
Combining Equations (14) and (15), it follows that
φ ( X k ) c 1 ( X k + 1 X k + w ¯ X k X k 1 ) .

4.4. Subsequence Convergence

As { X k } M is compact, there exists a subsequence { X k j } M and X * M such that lim j + X k j = X * . Then φ is bounded, i.e., φ ( X ) > and φ keeps decreasing. Hence, there exists φ * such that lim k + φ ( X k ) = φ * . Note that
φ ( X k ) φ ( X k + 1 ) c 0 X k X k + 1 2 , k = 1 , 2 ,
Summation over k yields
c 0 k = 0 X k X k + 1 2 φ ( X 0 ) φ * < + .
Therefore,
lim k + X k X k + 1 = 0 .
Due to the property of the gradient, thus
lim j + φ ( X k j ) = 0 .
Considering the continuity of φ and φ , we have
lim j + φ ( X k j ) = φ ( X * ) , lim j + φ ( X k j ) = φ ( X * ) = 0 ,
which implies that φ ( X * ) = 0 .

4.5. Sequence Convergence

In this section, the subsequence convergence can be strengthened by using the Kurdyka–Lojasiewicz property.
Proposition 5.
For x ¯ dom φ : = { x : φ ( x ) } , there exists η > 0 , an ϵ neighborhood of x ¯ , and ψ Ψ η = { ψ C [ 0 , η ) C ( 0 , η ) , where ψ is concave, ψ ( 0 ) = 0 , ψ > 0 o n ( 0 , η ) } such that for all x Γ η ( x ¯ , ϵ ) : U { x : φ ( x ¯ ) < φ ( x ) < φ ( x ¯ ) + η } , we have
ψ ( φ ( x ) φ ( x ¯ ) ) φ ( x ) 1 .
Then we say φ ( x ) satisfies the Kurdyka–Lojasiewicz property.
Theorem 1.
Assume that Propositions 4 and 5 are met. Let { X k } be the sequence generated by A-APG Algorithm 3. Then, there exists a point X * M so that lim k + X k = X * and φ ( X * ) = 0 .
Proof. 
Let ω ( X 0 ) be the set of limiting points of the sequence { X k } . Based on the boundedness of { X k } and the fact that ω ( X 0 ) = q N k > q { X k } ¯ , it follows that ω ( X 0 ) is a non-empty and compact set. In addition, by Equation (16), we know that φ ( X ) is a constant on ω ( X 0 ) , denoted by φ * . If there exists some k 0 such that φ ( X k 0 ) = φ * , then for k > k 0 , we have φ ( X k ) = φ * . Next, we assume that k , φ ( X k ) > φ * . Therefore, for ϵ , η > 0 , l > 0 , for k > l we have d i s t ( ω ( X 0 ) , X k ) ϵ and φ * < φ ( X k ) < φ * + η i.e., for X * ω ( X 0 ) , X Γ η ( X * , ϵ ) . Applying Proposition 5, for k > l , we have
ψ ( φ ( X k ) φ * ) φ ( X k ) 1 .
Then
ψ ( φ ( X k ) φ * ) 1 c 1 ( X k X k 1 + w ¯ X k 1 X k 2 ) .
By the convexity of ψ , it is obvious that
ψ ( φ ( X k ) φ * ) ψ ( φ ( X k + 1 ) φ * ) ψ ( φ ( X k ) φ * ) ( φ ( X k ) φ ( X k + 1 ) ) .
Define
p , q = ψ ( φ ( X p ) φ * ) ψ ( φ ( X q ) φ * ) , c = ( 1 + w ¯ ) c 1 / c 0 > 0 .
Combining with Equations (16)–(18), for k > l , we obtain
k , k + 1 c 0 X k + 1 X k 2 c 1 ( X k X k 1 + w ¯ X k 1 X k 2 ) X k + 1 X k 2 c ( X k X k 1 + X k 1 X k 2 ) .
Applying the geometric inequality to Equation (19), thus
2 X k + 1 X k 1 2 ( X k X k 1 + X k 1 X k 2 ) + 2 c k , k + 1 .
Therefore, for k > l , summing up the above inequality for i = l + 1 , , k , we obtain
2 i = l + 1 k X i + 1 X i 1 2 i = l + 1 k ( X i X i 1 + X i 1 X i 2 ) + 2 c i = l + 1 k i , i + 1 i = l + 1 k X i + 1 X i + X l + 1 X l + 1 2 X l X l 1 + 2 c l + 1 , k + 1 .
For k > l , ψ 0 , it is evident that
i = l + 1 k X i + 1 X i X l + 1 X l + 1 2 X l X l 1 + 2 c ψ ( φ ( X l ) φ * ) ,
which implies that
k = 1 X k + 1 X k < .
In the end, we have lim k + X k = X * . □

5. Numerical Results

In this section, we offer two corresponding numerical examples to illustrate the efficiency of the derived algorithms. All code is written in Python language. Denote iteration and error by the iteration step and error of the objective function. We take the matrix order “n” as 128, 1024, 2048, and 4096.
Example 1.
Let
A 1 = 2 1 1 2 1 1 1 2 , B 1 = 1 0.5 0.5 1 0.5 0.5 0.5 1
be tridiagonal matrices in the Sylvester Equation (1). Set the matrix C 1 as the identity matrix. The initial step size is 0.01, which is small enough to iterate. The parameters are η 1 = 0.25 , ω 1 = 0.2 taken from (0,1) randomly. Table 1 and Figure 1 show the numerical results of Algorithms 1–5. It can be seen that the LGD, A-APG, and Newton-APG Algorithms are more efficient than other methods. Moreover, the iteration step does not increase when the matrix order increases due to the same initial value. The A-APG method has higher error accuracy compared with other methods. The Newton-APG method takes more CPU time and fewer iteration steps than the A-APG method. The Newton method needs to calculate the inverse of the matrix, while it has quadratic convergence. From Figure 1, the error curves of the LGD, A-APG, and Newton-APG algorithms are hard to distinguish. We offer another example below.
Example 2.
Let A 2 = A 1 A 1 T , B 2 = B 1 B 1 T be positive semi-definite matrices in the Sylvester Equation (1). Set the matrix C 2 as the identity matrix. The initial step size is 0.009. The parameters are η 2 = 0.28 , ω 2 = 0.25 taken from (0,1) randomly. Table 2 and Figure 2 show the numerical results of Algorithms 1–5. It can be seen that the LGD, A-APG, and Newton-APG algorithms take less CPU time compared with other methods. Additionally, we can observe the different error curves of the LGD, A-APG, and Newton-APG algorithms from Figure 2.
Remark 1.
The difference of the iteration step in Examples 1 and 2 emerges due to the given different initial values. It can be seen that the LGD, A-APG, and Newton-APG algorithms have fewer iteration steps. Whether the A-APG method or Newton-APG yields fewer iteration steps varies from problem to problem. From Examples 1 and 2, we observe that the A-APG method has higher accuracy, although it takes more time and more iteration steps than the LGD method.
Remark 2.
Moreover, we compare the performance of our methods with other methods such as the conjugate gradient method (CG) in Table 1 and Table 2. We take the same initial values and set the error to 1 × 10 14 . From Table 1 and Table 2, it can be seen that the LGD and A-APG methods are more efficient for solving the Sylvester matrix equation when the order n is small. When n is large, the LGD and A-APG methods nearly have a convergence rate with the CG method.

6. Conclusions

In this paper, we have introduced the A-APG and Newton-APG methods for solving the Sylvester matrix equation. The key idea is to change the Sylvester matrix equation to an optimization problem by using the Kronecker product. Moreover, we have analyzed the computation complexity and proved the convergence of the A-APG method. Convergence results and preliminary numerical examples have shown that the schemes are promising in solving the Sylvester matrix equation.

Author Contributions

J.Z. (methodology, review, and editing); X.L. (software, visualization, data curation). All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported in part by the National Natural Science Foundation of China (12171412, 11771370), Natural Science Foundation for Distinguished Young Scholars of Hunan Province (2021JJ10037), Hunan Youth Science and Technology Innovation Talents Project (2021RC3110), the Key Project of the Education Department of Hunan Province (19A500, 21A0116).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dooren, P.M.V. Structured Linear Algebra Problems in Digital Signal Processing; Springer: Berlin/Heidelberg, Germany, 1991. [Google Scholar]
  2. Gajic, Z.; Qureshi, M.T.J. Lyapunov Matrix Equation in System Stability and Control; Courier Corporation: Chicago, IL, USA, 2008. [Google Scholar]
  3. Corless, M.J.; Frazho, A. Linear Systems and Control: An Operator Perspective; CRC Press: Boca Raton, FL, USA, 2003. [Google Scholar]
  4. Stewart, G.W.; Sun, J. Matrix Perturbation Theory; Academic Press: London, UK, 1990. [Google Scholar]
  5. Simoncini, V.; Sadkane, M. Arnoldi-Riccati method for large eigenvalue problems. BIT Numer. Math. 1996, 36, 579–594. [Google Scholar]
  6. Demmel, J.W. Three methods for refining estimates of invariant subspaces. Computing 1987, 38, 43–57. [Google Scholar]
  7. Chen, T.W.; Francis, B.A. Optimal Sampled-Data Control Systems; Springer: London, UK, 1995. [Google Scholar]
  8. Datta, B. Numerical Methods for Linear Control Systems; Elsevier Inc.: Amsterdam, The Netherlands, 2004. [Google Scholar]
  9. Lord, N. Matrix computations. Math. Gaz. 1999, 83, 556–557. [Google Scholar]
  10. Zhao, X.L.; Wang, F.; Huang, T.Z. Deblurring and sparse unmixing for hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4045–4058. [Google Scholar]
  11. Obinata, G.; Anderson, B.D.O. Model Reduction for Control System Design; Springer Science & Business Media: London, UK, 2001. [Google Scholar]
  12. Bouhamidi, A.; Jbilou, K. A note on the numerical approximate solutions for generalized Sylvester matrix equations with applications. Appl. Math. Comput. 2008, 206, 687–694. [Google Scholar]
  13. Bai, Z.Z.; Benzi, M.; Chen, F. Modified HSS iteration methods for a class of complex symmetric linear systems. Computing 2010, 87, 93–111. [Google Scholar]
  14. Bartels, R.H.; Stewart, G.W. Solution of the matrix equation AX + XB = C. Commun. ACM 1972, 15, 820–826. [Google Scholar]
  15. Golub, G.H. A Hessenberg-Schur Method for the Problem AX + XB = C; Cornell University: Ithaca, NY, USA, 1978. [Google Scholar]
  16. Robbé, M.; Sadkane, M. A convergence analysis of GMRES and FOM methods for Sylvester equations. Numer. Algorithms 2002, 30, 71–89. [Google Scholar]
  17. Guennouni, A.E.; Jbilou, K.; Riquet, A.J. Block Krylov subspace methods for solving large Sylvester equations. Numer. Algorithms 2002, 29, 75–96. [Google Scholar]
  18. Salkuyeh, D.K.; Toutounian, F. New approaches for solving large Sylvester equations. Appl. Math. Comput. 2005, 173, 9–18. [Google Scholar]
  19. Wachspress, E.L. Iterative solution of the Lyapunov matrix equation. Appl. Math. Lett. 1988, 1, 87–90. [Google Scholar]
  20. Jbilou, K.; Messaoudi, A.; Sadok, H. Global FOM and GMRES algorithms for matrix equations. Appl. Numer. Math. 1999, 31, 49–63. [Google Scholar]
  21. Feng, D.; Chen, T. Gradient Based Iterative Algorithms for Solving a Class of Matrix Equations. IEEE Trans. Autom. Control 2005, 50, 1216–1221. [Google Scholar]
  22. Heyouni, M.; Movahed, F.S.; Tajaddini, A. On global Hessenberg based methods for solving Sylvester matrix equations. Comput. Math. Appl. 2019, 77, 77–92. [Google Scholar]
  23. Benner, P.; Kürschner, P. Computing real low-rank solutions of Sylvester equations by the factored ADI method. Comput. Math. Appl. 2014, 67, 1656–1672. [Google Scholar]
  24. Bouhamidi, A.; Hached, M.; Heyouni, M.J.; Bilou, K. A preconditioned block Arnoldi method for large Sylvester matrix equations. Numer. Linear Algebra Appl. 2013, 20, 208–219. [Google Scholar]
  25. Heyouni, M. Extended Arnoldi methods for large low-rank Sylvester matrix equations. Appl. Numer. Math. 2010, 60, 1171–1182. [Google Scholar]
  26. Agoujil, S.; Bentbib, A.H.; Jbilou, K.; Sadek, E.M. A minimal residual norm method for large-scale Sylvester matrix equations. Electron. Trans. Numer. Anal. Etna 2014, 43, 45–59. [Google Scholar]
  27. Abdaoui, I.; Elbouyahyaoui, L.; Heyouni, M. An alternative extended block Arnoldi method for solving low-rank Sylvester equations. Comput. Math. Appl. 2019, 78, 2817–2830. [Google Scholar]
  28. Jbilou, K. Low rank approximate solutions to large Sylvester matrix equations. Appl. Math. Comput. 2005, 177, 365–376. [Google Scholar]
  29. Liang, B.; Lin, Y.Q.; Wei, Y.M. A new projection method for solving large Sylvester equations. Appl. Numer. Math. 2006, 57, 521–532. [Google Scholar]
  30. Jiang, K.; Si, W.; Chen, C.; Bao, C. Efficient numerical methods for computing the stationary states of phase-field crystal models. SIAM J. Sci. Comput. 2020, 42, B1350–B1377. [Google Scholar]
  31. Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar]
  32. Bao, C.; Barbastathis, G.; Ji, H.; Shen, Z.; Zhang, Z. Coherence retrieval using trace regularization. SIAM J. Imaging Sci. 2018, 11, 679–706. [Google Scholar]
  33. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus; John Willey & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
Figure 1. The error curves when n = 128, 1024, 2048, 4096 for Example 1.
Figure 1. The error curves when n = 128, 1024, 2048, 4096 for Example 1.
Mathematics 10 01040 g001
Figure 2. The error curves when n = 128, 1024, 2048, 4096 for Example 2.
Figure 2. The error curves when n = 128, 1024, 2048, 4096 for Example 2.
Mathematics 10 01040 g002
Table 1. Numerical results for Example 1.
Table 1. Numerical results for Example 1.
AlgorithmnIterationErrorTime(s)
GD1283561.13687 × 10 13 3.30
LGD128151.26477 × 10 12 0.27
APG1283741.4353 × 10 12 4.31
RAPG128691.4353 × 10 12 1.45
A-APG128193.55271 × 10 14 0.38
Newton-APG128189.47438 × 10 11 0.48
CG128193.49364 × 10 14 0.42
GD10243561.02318 × 10 12 806
LGD1024151.06866 × 10 11 69
APG10243741.18803 × 10 11 1261
RAPG1024692.59774 × 10 11 367
A-APG1024192.84217 × 10 13 113
Newton-APG1024188.95682 × 10 10 144
CG1024193.37046 × 10 14 71
GD20483562.04636 × 10 12 6315
LGD2048152.13731 × 10 11 569
APG20483742.38742 × 10 11 9752
RAPG2048695.20686 × 10 11 2994
A-APG2048196.82121 × 10 13 926
Newton-APG2048188.95682 × 10 10 1015
CG2048193.34616 × 10 14 521
GD40963564.09273 × 10 12 66,155
LGD4096154.27463 × 10 11 4199
APG40963744.77485 × 10 11 71,636
RAPG4096691.04365 × 10 10 21,596
A-APG4096191.81899 × 10 12 6829
Newton-APG4096183.64571 × 10 9 7037
CG4096193.33322 × 10 14 3553
Table 2. Numerical results for Example 2.
Table 2. Numerical results for Example 2.
AlgorithmnIterationErrorTime(s)
GD1282431.63425 × 10 13 2.38
LGD128202.45137 × 10 12 0.47
APG1282601.58096 × 10 12 4.51
RAPG128531.90781 × 10 12 1.46
A-APG128323.55271 × 10 15 0.78
Newton-APG128362.30926 × 10 13 1.26
CG128344.13025 × 10 14 0.79
GD10242431.3074 × 10 12 516
LGD1024201.89573 × 10 11 95
APG10242601.25056 × 10 11 835
RAPG1024531.51772 × 10 11 267
A-APG1024324.61569 × 10 14 181
Newton-APG1024364.20641 × 10 12 214
CG1024344.29936 × 10 14 92
GD20482432.6148 × 10 12 4129
LGD2048203.78577 × 10 11 814
APG20482602.48974 × 10 11 6507
RAPG2048533.03544 × 10 11 2193
A-APG2048322.27374 × 10 13 1622
Newton-APG2048368.52651 × 10 12 2125
CG2048344.22694 × 10 14 797
GD40962435.22959 × 10 12 29,859
LGD4096207.54881 × 10 11 6023
APG40962604.97948 × 10 11 48,238
RAPG4096536.07088 × 10 11 16,482
A-APG4096322.27374 × 10 13 12,896
Newton-APG4096367.95808 × 10 12 14,901
CG4096344.18275 × 10 14 5337
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, J.; Luo, X. Gradient-Based Optimization Algorithm for Solving Sylvester Matrix Equation. Mathematics 2022, 10, 1040. https://0-doi-org.brum.beds.ac.uk/10.3390/math10071040

AMA Style

Zhang J, Luo X. Gradient-Based Optimization Algorithm for Solving Sylvester Matrix Equation. Mathematics. 2022; 10(7):1040. https://0-doi-org.brum.beds.ac.uk/10.3390/math10071040

Chicago/Turabian Style

Zhang, Juan, and Xiao Luo. 2022. "Gradient-Based Optimization Algorithm for Solving Sylvester Matrix Equation" Mathematics 10, no. 7: 1040. https://0-doi-org.brum.beds.ac.uk/10.3390/math10071040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop