1. Introduction
Matrix equations are ubiquitous in signal processing [
1], control theory [
2], and linear systems [
3]. Most time-dependent models accounting for the prediction, simulation, and control of real-world phenomena may be represented as linear or nonlinear dynamical systems. Therefore, the relevance of matrix equations within engineering applications largely explains the great effort put forth by the scientific community into their numerical solution. Linear matrix equations have an important role in the stability analysis of linear dynamical systems and the theoretical development of the nonlinear system. The Sylvester matrix equation was first proposed by Sylvester and produced from the research of relevant fields in applied mathematical cybernetics. It is a famous matrix equation that occurs in linear and generalized eigenvalue problems for the computation of invariant subspaces using Riccati equations [
4,
5,
6]. The Sylvester matrix equation takes part in linear algebra [
7,
8,
9], image processing [
10], model reduction [
11], and numerical methods for differential equations [
12,
13].
We consider the Sylvester matrix equation of the form
where
are given matrices, and
is an unknown matrix to be solved. We discuss a special form of the Sylvester matrix equation, in which
A and
B are symmetric positive definite.
Recently, there has been a lot of discussion on the solution and numerical calculation of the Sylvester matrix equation. The standard methods for solving this equation are the Bartels–Stewart method [
14] and the Hessenberg–Schur method [
15], which are efficient for small and dense system matrices. When system matrices are small, the block Krylov subspace methods [
16,
17] and global Krylov subspace methods [
18] are proposed. These methods use the global Arnoldi process, block Arnoldi process, or nonsymmetric block Lanczos process to produce low-dimensional Sylvester matrix equations. More feasible methods for solving large and sparse problems are iterative methods. When system matrices are large, there are some effective methods such as the alternating direction implicit (ADI) method [
19], global full orthogonalization method, global generalized minimum residual method [
20], gradient-based iterative method [
21], and global Hessenberg and changing minimal residual with Hessenberg process method [
22]. When system matrices are low-rank, the ADI method [
23], block Arnoldi method [
17], preconditioned block Arnoldi method [
24], and extended block Arnoldi method [
25] and its variants [
26,
27], including the global Arnoldi method [
28,
29] and extended global Arnoldi method [
25], are proposed to obtain the low-rank solution.
The adaptive accelerated proximal gradient (A-APG) method [
30] is an efficient numerical method for calculating the steady states of the minimization problem, motivated by the accelerated proximal gradient (APG) method [
31], which has wide applications in image processing and machine learning. In each iteration, the A-APG method takes the step size by using a line search initialized with the Barzilai–Borwein (BB) step [
32] to accelerate the numerical speed. Moreover, as the traditional APG method is proposed for the convex problem and its oscillation phenomenon slows down the convergence, the restart scheme has been used for speeding up the convergence. For more details, one can refer to [
30] and the references therein.
The main contribution is to study gradient-based optimization methods such as the A-APG and Newton-APG methods for solving the Sylvester matrix equation through transforming this equation into an optimization problem by using Kronecker product. The A-APG and Newton-APG methods are theoretically guaranteed to converge to a global solution from an arbitrary initial point and achieve high precision. These methods are especially efficient for large and sparse coefficient matrices.
The rest of this paper is organized as follows. In
Section 2, we transform this equation into an optimization problem by using the Kronecker product. In
Section 3, we apply A-APG and Newton-APG algorithms to solve the optimization problem and compare them with other methods. In
Section 4, we focus on the convergence analysis of the A-APG method. In
Section 5, the computational complexity of these algorithms is analyzed exhaustively. In
Section 6, we offer corresponding numerical examples to illustrate the effectiveness of the derived methods.
Throughout this paper, let be the set of all real matrices. is the identity matrix of order n. If , the symbols , , and express the transpose, the inverse, the 2-norm, and the trace of A, respectively. The inner product in matrix space is .
2. The Variant of an Optimization Problem
In this section, we transform the Sylvester equation into an optimization problem. We recall some definitions and lemmas.
Definition 1. Let , the Kronecker product of Y and Z be defined by Definition 2. If , then the straightening operator of Y is Lemma 1. Let then From Lemma 1, the Sylvester Equation (
1) can be rewritten as
Lemma 2. Let A be a symmetric positive matrix; solving the equation is equivalent to obtaining the minimum of .
According to Lemma 2 and Equation (
2), define
Therefore, Equation (
2) should be
. Obviously, if
A and
B are symmetric positive, then
is symmetric positive. The variant of the Sylvester Equation (
2) reduces to the optimization problem:
Using the calculation of the matrix differential from [
33], we have the following propositions immediately.
Proposition 1. If , then .
Proposition 2. If , then .
Proposition 3. If , then .
Using Propositions 2 and 3, the gradient of the objective function (
3) is
By (
4), the Hessian matrix is