Next Article in Journal
Efficient Multi-Biometric Secure-Storage Scheme Based on Deep Learning and Crypto-Mapping Techniques
Next Article in Special Issue
A One-Parameter Memoryless DFP Algorithm for Solving System of Monotone Nonlinear Equations with Application in Image Processing
Previous Article in Journal
Smart Traffic Shaping Based on Distributed Reinforcement Learning for Multimedia Streaming over 5G-VANET Communication Technology
Previous Article in Special Issue
Common Fixed Point of Two L2 Operators with Convergence Analysis and Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications

1
PhD Degree Program in Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
2
Research Group in Mathematics and Applied Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
3
Data Science Research Center, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
*
Author to whom correspondence should be addressed.
Submission received: 21 December 2022 / Revised: 26 January 2023 / Accepted: 28 January 2023 / Published: 30 January 2023
(This article belongs to the Special Issue Fixed Point, Optimization, and Applications II)

Abstract

:
A new accelerated common fixed point algorithm is introduced and analyzed for a countable family of nonexpansive mappings and then we apply it to solve some convex bilevel optimization problems. Then, under some suitable conditions, we prove a strong convergence result of the proposed algorithm. As an application, we employ the proposed algorithm for regression and classification problems. Moreover, we compare the performance of our algorithm with others. By numerical experiments, we found that our algorithm has a better performance than the others.

1. Introduction

Let H be a real Hilbert space and f , g : H R be real valued functions. The convex bilevel minimization problem is a special kind of optimization problem where one problem is embedded within another. The outer level is the following constraint minimization problem:
min x X * ω ( x ) ,
where ω : R n R is strongly convex with parameter σ > 0 and a continuously differentiable function such that ω is Lipschitz continuous with constant L ω while X * is the nonempty set of minimizers of the inner level problem, given by
min x R n { f ( x ) + g ( x ) } ,
and sometimes we will use the notation argmin ( f + g ) for X * . The following assumptions are assumed for solving Problem (2).
(i)
g : R n R { } is a proper convex and lower semi-continuous function;
(ii)
f : R n R is a convex and differentiable function such that f is a Lipschitz continuous with constant L f > 0 , that is,
f ( x ) f ( y ) L f x y for all x , y R n .
The solution of (2) can be characterized by Theorem 16.3 of Bauschke and Combettes [1] as follows:
q ˘ is a minimizer of Problem ( 2 ) if and only if 0 g ( q ˘ ) + f ( q ˘ ) ,
where g is the subdifferential of g and f is the gradient of f. Moreover, Problem (2) is also characterized by the following fixed point problem:
q ˘ is a minimizer of ( f + g ) if and only if q ˘ = p r o x c g ( I c f ) ( q ˘ ) ,
where c > 0 and p r o x c g ( x ) = a r g m i n y H ( g ( y ) + 1 2 c x y 2 ) . It is also known that p r o x c g ( I c f ) is a nonexpansive operator when c ( 0 , 2 / L ) and L > 0 . The operator p r o x c g ( I c f ) is called the forward-backward operator of f and g with respect to c. Moreover, it is known that q ˘ X * is a point of minimizer of problem (1) if and only if
ω ( q ˘ ) , x q ˘ 0 for all x X * .
In the past decade, many researchers have proposed methods to find optimal solutions of Problem (2). Lions and Mercier [2] introduced a simple algorithm, called Forward-Backward Splitting (FBS), for solving Problem (2). Their algorithm was given by
x n + 1 = p r o x c n g ( I c n f ) ( x n ) ,
where c n is the step-size and c n ( 0 , 2 / L ) .
In 1964, Polyak [3] firstly introduced the inertial technique for accelerating the rate of convergence of the algorithm. Since then, this technique was widely used for this purpose.
In [4], Beck and Teboulle employed the inertial technique to introduce a fast iterative shrinkage-thresholding algorithm (FISTA) for solving Problem (2) as follows:
x 1 = y 0 C , t 1 = 1 , y n = p r o x α g ( I α f ) ( x n ) , α > 0 , t n + 1 = 1 + 4 t n 2 + 1 2 , θ n = t n 1 t n + 1 , x n + 1 = y n + θ n ( y n y n 1 ) .
They showed that the convergence behavior of FISTA is better than the others.
Recently, some authors, for instance, Bussaban et al. [5], Puangpee and Suantai [6] and Jailoka et al. [7], employed the inertial technique to introduce common fixed point algorithms for a countable family of nonexpansive operators and established some convergence results under NST-condition (I), NST -condition, and the condition (Z). They also applied their algorithms to solving some convex minimization problems.
In 2017, Sabach and Shtern [8] introduced a new method, called Sequential Averaging Method (SAM), for solving a bilevel optimization problem. Such an algorithm was developed from [9] for solving a certain class of fixed point problems. To solve the bilevel optimization Problems (1) and (2), the Bilevel Gradient sequential Averaging Method (BiG-SAM) was proposed in [8]. Their algorithm was defined by Algorithm 1.
Algorithm 1 Bilevel Gradient sequential Averaging Method (BiG-SAM)
(1)
Input: s ( 0 , 2 / ( σ + L ω ) ) , c ( 0 , 1 / L f ) , and  { α k } k N ( 0 , 1 ] .
(2)
Initialization: x 1 R n .
(3)
General step: ( k = 1 , 2 , . . . ) :
y k = p r o x c g ( x k c f ( x k ) ) , z k = x k s ω ( x k ) , x k + 1 = α k z k + ( 1 α k ) y k ,
where ω is the gradient of ω .
They proved a strong convergence theorem of the sequence { x k } k N generated by BiG-SAM under some control conditions.
After that, Shehu et al. [10] used the inertial technique for improving the convergence behavior of BiG-SAM. Their algorithm is known as the inertial Bilevel Gradient Sequential Averaging Method (iBiG-SAM). It was defined as follows (Algorithm 2):
Algorithm 2 Inertial Bilevel Gradient sequential Averaging Method (iBiG-SAM)
(1)
Input: α 3 , s ( 0 , 2 / ( σ + L ω ) ) , and  c ( 0 , 1 / L f ) .
(2)
Initialization: x 0 , x 1 R n .
(3)
Step 1 For ( k = 1 , 2 , . . . ) :
θ n = min { k k + α 1 , γ n x k x k 1 } , if x k x k 1 , k k + α 1 , otherwise .
(4)
Step 2 Compute:
w k = x k + μ k ( x k x k 1 ) , y k = p r o x c g ( w k c f ( w k ) ) , z k = w k s ω ( w k ) , x k + 1 = α k z k + ( 1 α k ) y k ,
where ω is the gradient of ω .
In 2022, Duan and Zhang [11] introduced a new algorithm based on the proximal gradient algorithm for solving a bilevel optimization problem. This algorithm is known as the alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM). It was defined as follows (Algorithm 3):
Algorithm 3 The alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM)
(1)
Input: α 3 , s ( 0 , 2 / ( σ + L ω ) ) and c ( 0 , 1 / L f ) . Set λ > 0 .
(2)
Initialization: x 0 , x 1 R n .
(3)
Step 1 For ( k = 1 , 2 , . . . ) :
w k = x k + μ k ( x k x k 1 ) ) , if k is odd , x k , if k is even .
where k is odd, choose μ k such that 0 | μ k | θ n with θ n defined by
θ n = min { k k + α 1 , γ k x k x k 1 } , if x k x k 1 , k k + α 1 , otherwise .
(4)
Step 2 Compute:
y k = p r o x c g ( w k c f ( w k ) ) , z k = w k s ω ( w k ) , x k + 1 = α k z k + ( 1 α k ) y k .
(5)
Step 3 If x k x k 1 < λ , then stop.
They also proved a strong convergence result of the proposed method.
Motivated by these works, we are interested in proposing a new efficient algorithm for convex bilevel Problems (1) and (2). We establish and prove a convergence theorem of the proposed algorithm under some suitable conditions. We employ it for data prediction and classification. The paper is organized as follows. In Section 2, we describe some notations and useful lemmas for the later sections. In Section 3, we discuss and analyze the convergence of our proposed algorithm. In Section 4, we present applications of the obtained fixed-point results in Section 3 for solving regression and classification problems. Moreover, some numerical experiments on regression and classification problems are also given in Section 4. Finally, we also give conclusions of the paper in Section 5.

2. Preliminaries

Let H be a real Hilbert space with norm · and inner product · , · , and C be a nonempty closed convex subset of H. Let L ( 0 , ) . An operator T : C C is said to be L-Lipschitz if T x T y L x y for all x , y C . If T is Lipschitz continuous with a coefficient L ( 0 , 1 ) , then T is called a contraction. The operator T is said to be nonexpansive if L = 1 . We use F ( T ) to denote the set of fixed points of T, that is, F ( T ) : = { x C : x = T x } . The set of all common fixed points of a sequence of nonexpansive operators { T n } of C into itself is n = 1 F ( T n ) . For finding a common fixed point of { T n } , Takahashi [12] introduced the NST-condition as the following. Let { T n } and Ψ be families of nonexpansive operators of C into itself with F ( Ψ ) n = 1 F ( T n ) , where F ( Ψ ) = T Ψ F ( T ) . A sequence { T n } satisfies the NST-condition (I) with Ψ , if for any bounded sequence { v n } in C,
lim n v n T n v n = 0 implies lim n v n T v n = 0
for all T Ψ . The sequence { T n } satisfies the NST-condition (I) with T if Ψ = { T } .
The following Lemma is useful for proving our main result.
Lemma 1 
([5]). Let f be a convex and differentiable function from H into R with f as a Lipschitz continuous with constant L > 0 , and g is a proper convex and lower semi-continuous function from H into R { } . Let T n : = p r o x c n g ( I c n f ) and T : = p r o x c g ( I c f ) , where c n , c ( 0 , 2 / L ) with c n c as n . Then { T n } satisfies the NST-condition (I) with T.
Definition 1 
([13,14]). A sequence { T n : H H } with a nonempty common fixed point set is said to satisfy the condition (Z) if { x n } is a bounded sequence in H such that
lim n x n T n x n = 0 ,
it follows that every weak cluster point of { x n } belongs to n = 1 F ( T n ) .
The following remark is obtained by demicloseness of I T where T : H H is the nonexpansive operator.
Remark 1. 
If { T n } is nonexpansive, the operator satisfies the NST-condition (I) with respect to T where T is the nonexpansive operator. Then { T n } satisfies the condition (Z).
Note that if g : R n R { } is a proper, lower semi-continuous and convex function, then the p r o x g ( x ) exists and is unique for all x R n ; see [15]. We end this part with the following useful lemmas, which will be used in the later section.
Lemma 2 
([16,17]). For any m , n H and μ [ 0 , 1 ] , the following statements hold:
(1) 
m ± n 2 = m 2 ± 2 m , n + n 2 for all m , n H ;
(2) 
m + n 2 m 2 + 2 n , m + n for all m , n H ;
(3) 
μ m + ( 1 μ ) n 2 = μ m 2 + ( 1 μ ) n 2 μ ( 1 μ ) m n 2 .
The identity in Lemma 2(3) implies that the following equality holds:
κ m + α n + η z 2 = κ m 2 + α n 2 + η z 2 κ α m n 2 α η n z 2 κ η m z 2 ,
for all m , n , z H and κ , α , η [ 0 , 1 ] with κ + α + η = 1 .
Lemma 3 
([18]). Let { a n } , { ρ n } R + , { s n } R and { ζ n } [ 0 , 1 ] such that
a n + 1 ( 1 ζ n ) a n + ζ n s n + ρ n
for all n N . If the following conditions hold:
(i) 
n = 1 ζ n = ;
(ii) 
n = 1 ρ n < ;
(iii) 
lim sup n s n 0 ,
then lim n a n = 0 .
Lemma 4 
([19]). Let { Φ n } be a sequence of real numbers which does not decrease at infinity in the sense that there exists a subsequence { Φ n i } of { Φ n } , satisfying Φ n i < Φ n i + 1 for all i N . Define the sequence { λ ( n ) } n n 0 of integers as follows:
λ ( n ) : = max { k n : Φ k < Φ k + 1 } ,
where n 0 N such that { k n 0 : Φ k < Φ k + 1 } . Then the following statements hold:
(i) 
λ ( n 0 ) λ ( n 0 + 1 ) and λ ( n ) ;
(ii) 
Φ λ ( n ) Φ λ ( n ) + 1 and Φ n Φ λ ( n ) + 1 for all n n 0 .
Let C be a nonempty closed convex subset of a Hilbert space H and let P : H C be a mapping. The metric projection onto C, denoted by P C , is defined for each x H , P C x and is the unique element in C such that
x P C x = inf y C x y .
It is known that
q ˘ = P C x x q ˘ , y q ˘ 0 ,
for all y C ; see [16].

3. Results

Throughout this section, we let { T n } and Ψ be families of nonexpansive operators on a real Hilbert space H such that F ( Ψ ) Γ : = n = 1 F ( T n ) and S : H H are a contraction mapping with a constant k ( 0 , 1 ) .
To find a common fixed point of a countable family of nonexpansive operators in a real Hilbert space, we first propose a new accelerated algorithm. Then, under certain conditions, we show a strong convergence theorem. Now, we are ready to introduce our accelerated algorithm as follows:
Theorem 1. 
Suppose that { T n } satisfies the condition (Z). Let { x n } be a sequence generated by Algorithm 4 which satisfies the following conditions:
(i) 
0 < a κ n b < 1 for some a , b R ;
(ii) 
lim n α n = 0 and n = 1 α n = + ;
(iii) 
lim n γ n = 0 .
Then { x n } converges strongly to an element q ˘ Γ , where q ˘ = P Γ S ( q ˘ ) .
Algorithm 4 An Inertial Viscosity Modified Picard (IVMP)
Initial. Take x 0 , x 1 H arbitrarily and t 1 = 0 .
For n 1 , set
θ n = min { t n 1 t n + 1 , γ n α n x n x n 1 } , if x n x n 1 , t n 1 t n + 1 , otherwise ,
 where t n + 1 = 1 + 1 + 4 t n 2 2 .
Step 1. Calculate y n , z n and x n + 1 using:
y n = x n + θ n ( x n x n 1 ) , z n = ( 1 α n κ n ) y n + α n S ( y n ) + κ n T n y n , x n + 1 = T n z n ,
 where { α n } , { κ n } [ 0 , 1 ] with α n + κ n 1 .
Then, update n : = n + 1 and return to Step 1.
Proof. 
Let q ˘ Γ be such that q ˘ = P Γ S ( q ˘ ) . By the definition of y n and z n in Algorithm 4, for each n N , we have
y n q ˘ = x n + θ n ( x n x n 1 ) q ˘ x n q ˘ + θ n x n x n 1
and
z n q ˘ = ( 1 α n κ n ) y n + α n S ( y n ) + κ n T n y n q ˘ = ( 1 α n κ n ) ( y n q ˘ ) + α n ( S ( y n ) q ˘ ) + κ n ( T n y n q ˘ ) ( 1 α n κ n ) ( y n q ˘ ) + α n ( S ( y n ) q ˘ ) + κ n ( T n y n q ˘ ) ( 1 α n κ n ) ( y n q ˘ ) + α n k y n q ˘ + α n S ( q ˘ ) q ˘ + κ n y n q ˘ = ( 1 ( 1 k ) α n ) y n q ˘ + α n S ( q ˘ ) q ˘ ( 1 ( 1 k ) α n ) ( x n q ˘ + θ n x n x n 1 ) + α n S ( q ˘ ) q ˘ ( 1 ( 1 k ) α n ) x n q ˘ + α n θ n α n x n x n 1 + S ( q ˘ ) q ˘ .
From (7) and (8), we obtain
x n + 1 q ˘ = T n z n q ˘ z n q ˘ ( 1 ( 1 k ) α n ) x n q ˘ + α n θ n α n x n x n 1 + S ( q ˘ ) q ˘ = ( 1 ( 1 k ) α n ) x n q ˘ + ( 1 k ) α n θ n α n x n x n 1 + S ( q ˘ ) q ˘ 1 k max x n q ˘ , α n θ n α n x n x n 1 + S ( q ˘ ) q ˘ 1 k .
Since lim n γ n = 0 , by (6), we get that θ n α n x n x n 1 0 as n . Thus, there is a constant M 1 > 0 such that θ n α n x n x n 1 M 1 for all n 1 . This implies
x n + 1 q ˘ max x n q ˘ , M 1 + S ( q ˘ ) q ˘ 1 k for all n 1 .
Let M = max x 1 q ˘ , M 1 + S ( q ˘ ) q ˘ 1 k . We show x n q ˘ M . For  n = 1 , we get
x n + 1 q ˘ max x 1 q ˘ , M 1 + S ( q ˘ ) q ˘ 1 k = M .
Suppose x n q ˘ M for some n N . It follows from (10) that
x n + 1 q ˘ max x 1 q ˘ , M 1 + S ( q ˘ ) q ˘ 1 k , max M , M 1 + S ( q ˘ ) q ˘ 1 k .
Since M = max x 1 q ˘ , M 1 + S ( q ˘ ) q ˘ 1 k , we obtain M 1 + S ( q ˘ ) q ˘ 1 k M which implies
x n + 1 q ˘ max M , M 1 + S ( q ˘ ) q ˘ 1 k = M .
By mathematical induction, we conclude that x n q ˘ M for all n N . Thus, x n x n q ˘ + q ˘ M + q ˘ for all n N . It follows that { x n } is bounded, and so are { y n } , { z n } , { T n y n } , { T n z n } and { S ( y n ) } .
For each n 1 , we have
y n q ˘ 2 = x n + θ n ( x n x n 1 ) q ˘ 2 x n q ˘ 2 + 2 x n q ˘ , θ n ( x n x n 1 ) + θ n 2 x n x n 1 2 .
By Lemma 2, we get
z n q ˘ 2 = ( 1 α n κ n ) y n + α n S ( y n ) + κ n T n y n q ˘ 2 = ( 1 α n κ n ) ( y n q ˘ ) + α n ( S ( y n ) q ˘ ) + κ n ( T n y n q ˘ ) 2 ( 1 α n κ n ) ( y n q ˘ ) + α n ( S ( y n ) S ( q ˘ ) ) + κ n ( T n y n q ˘ ) 2 + 2 α n ( S ( q ˘ ) q ˘ ) , z n q ˘ ( 1 α n κ n ) y n q ˘ 2 + α n S ( y n ) S ( q ˘ ) 2 + κ n T n y n q ˘ 2 + 2 α n S ( q ˘ ) q ˘ , z n q ˘ ( 1 α n ) y n q ˘ 2 + α n k 2 y n q ˘ 2 + 2 α n S ( q ˘ ) q ˘ , z n q ˘ for all n 1 .
It follows from (7) with 0 < k < 1 that
z n q ˘ 2 ( 1 α n ) y n q ˘ 2 + α n k y n q ˘ 2 + 2 α n S ( q ˘ ) q ˘ , z n q ˘ for all n 1 . ( 1 α n + α n k ) ( x n q ˘ 2 + 2 x n q ˘ , θ n ( x n x n 1 ) ) + ( 1 α n + α n k ) ( θ n 2 x n x n 1 2 ) + 2 α n S ( q ˘ ) q ˘ , z n q ˘ .
Using (12),
x n + 1 q ˘ 2 = T n z n q ˘ 2 z n q ˘ 2 ( 1 ( 1 k ) α n ) x n q ˘ 2 + ( 1 k ) α n 2 θ n ( 1 k ) α n x n q ˘ x n x n 1 + ( 1 k ) α n θ n 2 ( 1 k ) α n x n x n 1 + 2 1 k S ( q ˘ ) q ˘ , z n q ˘ .
From the above inequality, we get
a n : = x n q ˘ 2 , ζ n : = ( 1 k ) α n ,
and
s n : = ( 1 k ) α n 2 θ n α n ( 1 k ) x n q ˘ x n x n 1 + θ n 2 ( 1 k ) α n x n x n 1 + 2 1 k S ( q ˘ ) q ˘ , z n q ˘ .
So, we obtain
a n + 1 ( 1 ζ n ) a n + ζ n s n .
Now, we consider two cases for the covergence of the sequence { x n } generated by Algorithm 4.
Case 1. 
There exists a n 0 N such that the sequence { x n q ˘ } n n 0 is nonincreasing. Since { x n q ˘ } is bounded from below by zero, lim n x n q ˘ exists. Using assumption lim n α n = 0 and n = 1 α n = + , we get that n = 1 ( 1 k ) α n = ( 1 k ) n = 1 α n = + . For applying this in Lemma 4, we need to show that
lim sup n S ( q ˘ ) q ˘ , z n q ˘ 0 .
Using the fact of Lemma 2(3), we get
x n + 1 q ˘ 2 = T n z n q ˘ 2 z n q ˘ 2 = ( 1 α n κ n ) y n + α n S ( y n ) + κ n T n y n q ˘ 2 = ( 1 α n κ n ) ( y n q ˘ ) + α n ( S ( y n ) q ˘ ) + κ n ( T n y n q ˘ ) 2 = ( 1 α n κ n ) y n q ˘ 2 + α n S ( y n ) q ˘ 2 + κ n T n y n q ˘ 2 ( 1 α n κ n ) α n y n S ( y n ) 2 α n κ n S ( y n ) T n y n 2 ( 1 α n κ n ) κ n y n T n y n 2 ( 1 α n κ n ) y n q ˘ 2 + α n S ( y n ) q ˘ 2 + κ n y n q ˘ 2 ( 1 α n κ n ) κ n y n T n y n 2 .
This implies that
( 1 α n κ n ) κ n y n T n y n 2 ( 1 α n ) y n q ˘ 2 + α n S ( y n ) q ˘ 2 x n + 1 q ˘ 2 ( 1 α n ) ( x n q ˘ 2 + 2 x n q ˘ , θ n ( x n x n 1 ) ) + ( 1 α n ) ( θ n 2 x n x n 1 2 ) + α n S ( y n ) q ˘ 2 x n + 1 q ˘ 2 ( 1 α n ) ( x n q ˘ 2 + 2 x n q ˘ θ n ( x n x n 1 ) + ( 1 α n ) ( θ n 2 x n x n 1 2 ) + α n S ( y n ) q ˘ 2 x n + 1 q ˘ 2 .
It follows from the assumption and the convergence of the sequence { x n q ˘ } and { θ n x n x n 1 } that lim n y n T n y n = 0 . For each n 1 , we have
z n T n z n = z n y n + y n T n y n + T n y n T n z n z n y n + y n T n y n + T n y n T n z n z n y n + y n T n y n + y n z n 2 z n y n + y n T n y n = 2 ( 1 α n κ n ) y n + α n S ( y n ) + κ n T n y n y n + y n T n y n = 2 α n ( S ( y n ) y n ) + κ n ( T n y n y n ) + y n T n y n 2 α n S ( y n ) y n + ( 1 + 2 κ n ) y n T n y n .
This implies that lim n z n T n z n = 0 . Let
v = lim sup n S ( q ˘ ) q ˘ , z n q ˘ .
Since { z n } is bounded, we can choose a subsequence { z n k } of { z n } such that
v = lim k S ( q ˘ ) q ˘ , z n k q ˘
and z m k w for some w H . It follows from the condition (Z) of { T n } that w Γ .
Moreover, using q ˘ = P Γ S ( q ˘ ) , we obtain
v = lim k S ( q ˘ ) q ˘ , z n k q ˘ = S ( q ˘ ) q ˘ , w q ˘ = S ( q ˘ ) P Γ S ( q ˘ ) , w q ˘ 0 .
Thus,
lim sup n S ( q ˘ ) q ˘ , z n q ˘ 0 .
It implies by (15) and the fact of θ n α n x n x n 1 0 that lim sup n t n 0 . From (14), using Lemma 4, we can conclude that x n q ˘ .
Case 2. 
Suppose that sequence { x n q ˘ } n n 0 is not a monotone decreasing sequence for all n 0 large enough. Set
Φ n : = x n q ˘ 2 .
So, there exists a subsequence { Φ n j } of { Φ n } such that Φ n j Φ n j + 1 for all j N . In this case, we define λ : { n : n n 0 } N by
λ ( n ) : = max { k N : k n , Φ k < Φ k + 1 } .
By Lemma 4, we obtain Φ λ ( n ) Φ λ ( n ) + 1 for all n n 0 . Then,
x λ ( n ) q ˘ x λ ( n ) + 1 for all n n 0 .
The same as the argument in Case 1, we obtain
z λ ( n ) T λ ( n ) z λ ( n ) 2 α λ ( n ) S ( y λ ( n ) ) y λ ( n ) + ( 1 + 2 κ λ ( n ) ) y λ ( n ) T λ ( n ) y λ ( n )
for all n n 0 . Hence z λ ( n ) T λ ( n ) z λ ( n ) 0 as n 0 .
Similary, we have lim sup n S ( q ˘ ) q ˘ , z λ ( n ) q ˘ 0 .
Since Φ λ ( n ) Φ λ ( n ) + 1 and 0 < ( 1 k ) α λ ( n ) , we obtain
x λ ( n ) q ˘ 2 x λ ( n ) + 1 q ˘ 2 ( 1 ( 1 k ) α λ ( n ) ) x λ ( n ) q ˘ 2 + α λ ( n ) ( 1 k ) 2 Φ λ ( n ) α λ ( n ) ( 1 k ) x λ ( n ) q ˘ x λ ( n ) x λ ( n ) 1 + α λ ( n ) ( 1 k ) ; Φ λ ( n ) 2 α λ ( n ) ( 1 k ) x λ ( n ) x λ ( n ) 1 + 2 1 k S ( q ˘ ) q ˘ , z λ ( n ) q ˘ 2 Φ λ ( n ) α λ ( n ) ( 1 k ) x λ ( n ) q ˘ x λ ( n ) x λ ( n ) 1 + Φ λ ( n ) 2 α λ ( n ) ( 1 k ) x λ ( n ) x λ ( n ) 1 + 2 1 k S ( q ˘ ) q ˘ , z λ ( n ) q ˘ .
Since Φ n α n x n x n 1 0 and lim sup n S ( q ˘ ) q ˘ , z λ ( n ) q ˘ 0 , it follows that
lim sup n x λ ( n ) q ˘ 2 0 ,
and hence x λ ( n ) q ˘ 0 as n . It implies by
x λ ( n ) + 1 q ˘ 2 ( 1 ( 1 k ) α λ ( n ) ) x λ ( n ) q ˘ 2 + α λ ( n ) ( 1 k ) 2 Φ λ ( n ) α λ ( n ) ( 1 k ) x λ ( n ) q ˘ x λ ( n ) x λ ( n ) 1 + α λ ( n ) ( 1 k ) Φ λ ( n ) 2 α λ ( n ) ( 1 k ) x λ ( n ) x λ ( n ) 1 + α λ ( n ) ( 1 k ) 2 1 k S ( q ˘ ) q ˘ , z λ ( n ) q ˘
that x λ ( n ) + 1 q ˘ 0 as n . Using Lemma 4, we obtain x n q ˘ x λ ( n ) + 1 q ˘ 0 as n . Hence x n q ˘ . The proof is complete.    □
Now, we employ Algorithm 4 for solving Problem (1). We obtain the following result as a consequence of Theorem 1.
Theorem 2. 
Let ω : R n R be strongly convex with parameter σ > 0 and a continuously differentiable function such that ω is Lipschitz continuous with constant L ω . Suppose that f and g satisfy the assumptions of (2). Let { c n } be a sequence of positive real numbers in ( 0 , 2 / L f ) such that c n c as n where c ( 0 , 2 / L f ) and let { x n } be a sequence generated by Algorithm 5. Then { x n } converges strongly to q ˘ Γ .
Algorithm 5 An Inertial Bilevel Gradient Modified Picard (IBiG-MP)
Initial. Take x 0 , x 1 H arbitrarily and t 1 = 0 .
For n 1 . Set
θ n = min { t n 1 t n + 1 , γ n α n x n x n 1 } , if x n x n 1 , t n 1 t n + 1 , otherwise ,
 where t n + 1 = 1 + 1 + 4 t n 2 2 .
Step 1. Calculate y n , z n and x n + 1 as follows:
y n = x n + θ n ( x n x n 1 ) , z n = ( 1 α n κ n ) y n + α n ( I s ω ) ( y n ) + κ n p r o x c n g ( I c n f ) y n , x n + 1 = p r o x c n g ( I c n f ) z n ,
 where { α n } , { κ n } [ 0 , 1 ] with α n + κ n 1 .
Then, update n : = n + 1 and return to Step 1.
Proof. 
Let T n = p r o x c n g ( I c n f ) , n N and T = p r o x c g ( I c f ) . By Lemma 1 and Remark 1, we know that { T n } satisfies the condition (Z). From Theorem 1, we get that { x n } converges to q ˘ Γ = a r g min x R n ( f ( x ) + g ( x ) ) . Notice also that S = I s ω ( x ) is a k-contraction with parameter k = 1 2 s σ L ω σ + L ω , whenever s ( 0 , 2 / ( σ + L ω ) ) . It remains to show that the variational inequality holds true. By using q ˘ = P Γ S ( q ˘ ) and (5), for all z Γ , we obtain
q ˘ = P Γ S ( q ˘ ) S ( q ˘ ) q ˘ , z q ˘ 0 q ˘ s ω ( q ˘ ) q ˘ , z q ˘ 0 s ω ( q ˘ ) , z q ˘ 0 s ω ( q ˘ ) , z q ˘ 0 ω ( q ˘ ) , z q ˘ 0 for all z Γ = X * .
Thus, q ˘ is an optimal solution for the Problem (1). □

4. Application

In this section, we employ Algorithm 5 as a machine learning algorithm for regression, a graph of the Sine function and classification of some data by using a model of SLFNs (Single Hidden Layer Feedforward Neural Networks ) and Extreme Learning Machine. The MATLAB computing environment and an Intel Core-i5 gen 8 with 8 GB RAM(Integrated Electronics Corporation, Santa Clara, CA, USA) are used to perform all results.
We first recall a basic knowledge of the extreme learning machine for regression and classification problems. Moreover, we use the propose algorithm for solving these problems and compare the performance among Big-SAM, iBig-SAM and aiBig-SAM.
Extreme learning machine (ELM) [20] is defined as follows: Let D = { ( x d , q d ) : x d R n , q d R m , d = 1 , 2 , , N } be a training set of N distinct samples, x d is an input data and q d is a target. A standard SLFNs with M hidden nodes and activation function φ ( x ) is given by
j = 1 M ξ j φ ( p j , x d + c j ) = o d , d = 1 , , N ,
where ξ j is the weight vector connected between the j-th hidden node and the output node, p j is the weight vector connected between the j-th hidden node and the input node, and c j is the bias. The aim of SLFNs is to predict these N outputs such that d = 1 N o d q d = 0 . That is,
j = 1 M ξ j φ ( p j , x d + c j ) = q d , d = 1 , , N .
We can rewrite the above system of linear equation by the following matrix equation:
R ξ = Q ,
where
R = φ ( p 1 , x 1 + c 1 ) φ ( p M , x 1 + c M ) φ ( p 1 , x N + c 1 ) φ ( p M , x N + c M ) N × M ξ = [ ξ 1 T , , ξ M T ] m × M T , Q = [ q 1 T , , q N T ] m × N T .
The objective of an SLFNs is estimating ξ j , p j and c j for solving (18) while ELM aims to find only ξ j with randomly p j and c j .
The Problem (19) can be considered as the following convex minimization problem:
min ξ R ξ Q 2 2 + λ ξ 1 ,
where λ > 0 is called the regularization parameter. In Algorithm 5, we set f ( ξ ) = R ξ Q 2 2 and g ( ξ ) = λ ξ 1 . We employ Algorithm 5 to solve convex bilevel optimization Problems (1) and (2) while the outer level function is defined by ω ( ξ ) = 1 2 ξ 2 2 .

4.1. Regression a Sine Function

In our experiment for the regression of a graph of the Sine function, we construct a training set by randomly selecting 10 distinct data. we use sigmoid as our activation function. We also set the number of hidden nodes M = 100 , and regularization parameter λ = 1 × 10 5 . In Algorithm 5, we set T n = p r o x c n g ( I c n f ) . The Lipschitz constant L f of gradient f is computed by 2 R 2 . The values indicated in Table 1 are used for all control settings. We evaluate the result by
M e a n s q u a r e d e r r o r ( M S E ) = 1 n d = 1 n o d q d 2 .
We then obtain results compared to BIG-SAM, iBIG-SAM and aiBIG-SAM as in Figure 1 and Table 2.
Table 1 and Figure 1 show that Algorithm 5 gives a better performance to predict a sine function than others while there is little difference in processing time.

4.2. Data Classification

In order to classify datasets, we use four datasets from “https://www.kaggle.com/, accessed on 20 June 2020” and “https://archive.ics.uci.edu/, accessed on 20 June 2020” as follows:
Breast Cancer dataset [21] The dataset contains 11 attributes. In this dataset, we classify 2 classes of data.
Heart Disease UCI dataset [22] The dataset contains 14 attributes. There are two classes of this dataset.
Diabetes dataset [23] The dataset contains 9 attributes. In this dataset, we classify 2 classes of data.
Iris dataset [24] This dataset contains 3 classes of iris plant. The dataset contains 4 attributes. We aim to clasify each type of iris plant (Iris versicolour, Iris virginica and Iris setosa).
Table 3 shows the number of attributes of each dataset and the number of the training set (around 70 % of data) and testing set (remainder 30 % of data).
We set all control parameters the same as in Table 1 in Section 4.1, the number of hidden nodes M = 100 , and activation function is sigmoid. Given a training set for each dataset as mentioned in Table 3, An accuracy of the output data is calculated by
accuracy = correct predicted data all data × 100 .
We compare the iteration number, accuracy train and accuracy test of Algorithm 5 with the others on each dataset as seen in Table 4.
From Table 4, Algorithm 5 has a better performance of accuracy than BIG-SAM, iBIG-SAM, aiBIG-SAM in all experiments conducted.

5. Conclusions

We propose a new common fixed point algorithm for a countable family of nonexpansive operators and apply it to solve some convex bilevel optimization problems. We then prove a strong convergence theorem of the proposed algorithm under some suitable conditions. Moreover, we apply our algorithm to solve classification and regression problems. We also give numerical experiments for comparison of the performance of our proposed algorithm with the existing algorithms, the proposed algorithm is more efficient than the existing algorithms in the literature.

Author Contributions

Conceptualization, S.S.; Formal analysis, P.T. and S.S.; Investigation, P.T.; Methodology, S.S.; Supervision, S.S.; Validation, S.S. and B.P.; Writing—original draft, P.T.; Writing—review and editing, S.S. and B.P. All authors have read and agreed to the published version of the manuscript.

Funding

NSRF program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Acknowledgments

The authors would like to thank the referees for valuable comments and suggestions for improving this work. This research has received funding support from the NSRF program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. The first author would like to thank Science Achievement Scholarship of Thailand (SAST) for the financial support. The second author was partially supported by Chiang Mai University under Fundamental Fund 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
  2. Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. Siam J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
  3. Polyak, B. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
  4. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
  5. Bussaban, L.; Suantai, S.; Kaewkhao, A. A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpathian J. Math. 2020, 36, 35–44. [Google Scholar] [CrossRef]
  6. Puangpee, J.; Suantai, S. A New Accelerated Viscosity Iterative Method for an Infinite Family of Nonexpansive Mappings with Applications to Image Restoration Problems. Mathematics 2020, 8, 615. [Google Scholar] [CrossRef] [Green Version]
  7. Jailoka, P.; Suantai, S.; Hanjing, A. A fast viscosity forward-backward algorithm for convex minimization problems with an application in image recovery. Carpathian J. Math. 2021, 37, 449–461. [Google Scholar] [CrossRef]
  8. Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef] [Green Version]
  9. Xu, H.K. Viscosity approximation methods for nonexpansive mappings. J. Math. Anal. Appl. 2004, 298, 279–291. [Google Scholar] [CrossRef] [Green Version]
  10. Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim Methods Softw. 2019, 2019, 1–20. [Google Scholar] [CrossRef]
  11. Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2022, 2022, 1–29. [Google Scholar] [CrossRef]
  12. Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence to common fixed points of families of nonexpansive mappings in Banach spaces. J. Nonlinear Convex Anal. 2007, 8, 11–34. [Google Scholar]
  13. Aoyama, K.; Kimura, Y. Strong convergence theorems for strongly nonexpansive sequences. Appl. Math. Comput. 2011, 217, 7537–7545. [Google Scholar] [CrossRef]
  14. Aoyama, K.; Kohsaka, F.; Takahashi, W. Strong convergence theorems by shrinking and hybrid projection methods for relatively nonexpansive mappings in Banach spaces. In Nonlinear Analysis and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009; pp. 2–7. [Google Scholar]
  15. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Pass: New York, NY, USA, 2004. [Google Scholar]
  16. Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
  17. Takahashi, W. Nonlinear Functional Analysis; Yokohama Publishers: Yokohama, Japan, 2000. [Google Scholar]
  18. Xu, H.K. Another control condition in an iterative method for nonexpansive mappings. Bull. Aust. Math. Soc. 2002, 65, 109–113. [Google Scholar] [CrossRef] [Green Version]
  19. Mainge, P.E. Strong convergence of projected subgradient methods for nonsmooth and nostrictly convex minimization. Set-Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
  20. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  21. Wolberg, W.H.; Mangasarian, O.L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 1990, 87, 9193–9196. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef] [PubMed]
  23. Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proc. Symp. Comput. Appl. Med. Care 1998, 1998, 261–265. [Google Scholar]
  24. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Figure 1. (a) A regression of the sine function at 100th step; (b) A regression of the sine function at 500th step.
Figure 1. (a) A regression of the sine function at 100th step; (b) A regression of the sine function at 500th step.
Mathematics 11 00702 g001
Table 1. Setting parameters of each Algorithm.
Table 1. Setting parameters of each Algorithm.
MethodsSetting
Algorithm 5 s = 0.01 , c n = 1 L f , t 1 = 0 , α n = 1 55 n , κ n = 0.499 ( n + 1 ) n , γ n = 55 · 10 20 n
BIG-SAM s = 0.01 , α = 3 , c n = 1 L f , α n = 2 ( 0.1 ) 1 2 + c n L f 4 , γ n = α n n 0.01
iBIG-SAM s = 0.01 , α = 3 , c n = 1 L f , α n = 2 ( 0.1 ) 1 2 + c n L f 4 , γ n = α n n 0.01
aiBIG-SAM s = 0.01 , α = 3 , c n = 1 L f , α n = 1 k + 2 , γ n = α n n 0.01
Table 2. Numerical results for regression of a sine function with 500 iterations.
Table 2. Numerical results for regression of a sine function with 500 iterations.
MethodsMSEComputational Time
Algorithm 50.00110320.0216
BIG-SAM0.37379400.0162
iBIG-SAM0.37380940.0182
aiBIG-SAM0.36855500.0172
Table 3. Training and testing sets of each dataset.
Table 3. Training and testing sets of each dataset.
DatasetAttributesSample
TrainTest
Breast Cancer11489210
Heart Disease1421390
Diabetes9538230
Iris410545
Table 4. The iteration number of each algorithm with the best accuracy on each dataset.
Table 4. The iteration number of each algorithm with the best accuracy on each dataset.
DatasetAlgorithmIteration No.Accuracy TrainAccuracy Test
Algorithm 557796.5599.50
BreastBIG-SAM150096.5598.49
CanceriBIG-SAM150096.5598.49
aiBIG-SAM150096.5598.49
Algorithm 518587.1482.80
HeartBIG-SAM179786.1982.80
DiseaseiBIG-SAM175686.1982.80
aiBIG-SAM250186.6782.80
DiabetesAlgorithm 510077.1181.53
BIG-SAM69076.0181.08
iBIG-SAM69576.3781.08
aiBIG-SAM128076.9281.18
IrisAlgorithm 519098.10100.00
BIG-SAM78194.2995.56
iBIG-SAM77794.2995.56
aiBIG-SAM77194.2995.56
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Thongsri, P.; Panyanak, B.; Suantai, S. A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications. Mathematics 2023, 11, 702. https://0-doi-org.brum.beds.ac.uk/10.3390/math11030702

AMA Style

Thongsri P, Panyanak B, Suantai S. A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications. Mathematics. 2023; 11(3):702. https://0-doi-org.brum.beds.ac.uk/10.3390/math11030702

Chicago/Turabian Style

Thongsri, Piti, Bancha Panyanak, and Suthep Suantai. 2023. "A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications" Mathematics 11, no. 3: 702. https://0-doi-org.brum.beds.ac.uk/10.3390/math11030702

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop