Parallel Variants of Broyden’s Method

Bistran, Ioan; Maruster, Stefan; Mafteiu-Scai, Liviu Octavian

doi:10.3390/a8030774

Open AccessArticle

Parallel Variants of Broyden’s Method

by

Ioan Bistran

,

Stefan Maruster

^* and

Liviu Octavian Mafteiu-Scai

West University of Timisoara, B-dul V. Parvan No.4, Timisoara 300223, Romania

^*

Author to whom correspondence should be addressed.

Algorithms 2015, 8(3), 774-785; https://0-doi-org.brum.beds.ac.uk/10.3390/a8030774

Submission received: 23 June 2015 / Revised: 27 August 2015 / Accepted: 1 September 2015 / Published: 15 September 2015

(This article belongs to the Special Issue Numerical Algorithms for Solving Nonlinear Equations and Systems)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper we investigate some parallel variants of Broyden’s method and, for the basic variant, we present its convergence properties. The main result is that the behavior of the considered parallel Broyden’s variants is comparable with the classical parallel Newton method, and significantly better than the parallel Cimmino method, both for linear and nonlinear cases. The considered variants are also compared with two more recently proposed parallel Broyden’s method. Some numerical experiments are presented to illustrate the advantages and limits of the proposed algorithms.

Keywords:

systems of equations; Broyden’s method; parallel algorithms

1. Introduction

Let

F : R^{m} \to R^{m}

be a (nonlinear) mapping and let

F (x) = 0

be the corresponding system of equations. In [1], Xu proposed the following partially asynchronous block quasi-Newton method for such a nonlinear system. Let

P

be a partition of F and x in p partitions,

F = (F_{1}, . . ., F_{p})

,

x = (x_{1}, . . ., x_{p})

, where

F_{i}

and

x_{i}

are block functions and block variables, respectively, of the same dimensions

m_{i}

. The variable x is the common memory of the multiprocessor system, every processor i can read the content of x and can write its new update of

x_{i}

. Let

B_{i}

,

i = 1, . . ., p

, be square matrices of dimension

m_{i}

. The main steps of the process i are :

Algorithm 1.

Input:

x, B_{i}

Output: x

1. Compute

x_{i}^{+} : = x_{i} - B_{i}^{- 1} F_{i} (x);

2. Get

x^{+}

from x by replacing

x_{i}

with

x_{i}^{+};

3. Compute

s = x_{i}^{+} - x_{i}

and

y : = F_{i} (x^{+}) - F_{i} (x);

4. Update

B_{i}, B_{i} : = B_{i} +

\frac{(y - B_{i} s) s^{T}}{s^{T} s}

;

5.

x_{i} : = x_{i}^{+}, x : = x^{+};

Applying the Sherman–Morrison lemma (Step 4, Algorithm 1) to inverse the matrix

B_{i}

, the polynomial complexity of order three is reduced to polynomial complexity of order two.

More recently, in [2,3] Jiang and Yang proposed several preconditioners for the block Broyden’s method and discussed the implementation process. The main steps of parallel variant of Broyden’s method considered in [2,3] (Algorithm 2 below) are (

B_{0}

is a block diagonal matrix,

B_{k}^{i}

is the

i^{t h}

diagonal block of

B_{k}

):

Algorithm 2.

Input:

x_{0}, B_{0}, k m a x

Output:

x_{k}

1. For

k = 0, 1, . .

Until

c o n v e r g e n c e

Or

k > k m a x

Do

2.

x_{k + 1} = x_{k} - B_{k}^{- 1} F (x_{k})

;

3.

s = x_{k + 1} - x_{k}

;

4. For

i = 1, . . ., p

5.

B_{k}^{i} = B_{k}^{i} +

\frac{F_{i} (x_{k + 1}) s_{i}^{T}}{s^{T} s}

.

6. End

7. End

In the case of a linear system, the sequential Broyden’s method has global convergence provided that the system matrix is invertible [4], i.e., the sequence generated by this method converges to the solution of the system for any starting point

x_{0}

and for any starting matrix

B_{0}

(or

H_{0}

). Moreover, in this case the convergence is finite [5], and the solution is obtained in at most

2 n

steps; conditions in which this method requires a full

2 n

steps to converge are also given. In the same paper [5], it is shown that Broyden’s method has

2 n

-step Q-quadratic local convergence for the nonlinear case (the initial starting point

x_{0}

and the initial starting matrix

B_{0}

must be close to the solution point of the system and to the Jacobian matrix in the solution point, respectively). Note that

2 n

-step Q-quadratic convergence means that the subsequence

{x_{2 n}}

of

{x_{n}}

has Q-quadratic convergence.

In this paper we propose some new variants of parallel Broyden’s method that are suitable both for linear and nonlinear systems. We prove the convergence of our basic algorithm in the linear case. Numerical experiments are presented for linear and nonlinear cases, and some remarks are made concerning the behavior of the generated sequence.

2. The Parallel Variants and Preliminaries

In the case of a linear mapping,

F (x) = A x - b

, the Broyden’s “good formula” [6,7],

B^{+} = B + θ (y - B s) s^{T} / s^{T} s

(B and

B^{+}

are the current and the next iterates, respectively, and θ is a positive number chosen such that

B^{+}

is invertible [4]) can be written in the following form:

B^{+} = B - θ (B - A) s s^{T} / s^{T} s

. Using the notation

E = B - A

, the Broyden’s method becomes

x^{+} = x - {(E + A)}^{- 1} F (x), E^{+} = E - θ E s s^{T} / s^{T} s

. The main idea of the proposed variants is to use instead of A the block diagonal matrix of A, denoted by D throughout the paper, partitioned according to

P

. The computer routine

d g (A)

will produce such a block diagonal matrix, i.e.,

D : = d g (A)

.

Based on this idea we can consider several parallel variants of Broyden’s algorithm. The first variant is just the above mentioned algorithm, to which we added a supplementary step to successively block-diagonalize the matrix E (the Step 5, Algorithm 3 below). We consider also the following slight modification of update formula

{\tilde{E}}^{+} : = E - θ \frac{(E + D) s_{n} s_{n}^{T}}{∥ s_{n} ∥^{2}} .

By adding D to both sides of this formula, the algorithm becomes more homogeneous (the same matrix is used in both Steps 2 and 4, Algorithm 3 below). Moreover (and not in the least) for this variant, a simple and elegant proof can be given for the convergence of the generated sequence (Theorem 1). Therefore the first our variant, which will be considered as the basic algorithm, is (

E_{0}

is a block diagonal matrix):

Algorithm 3.

Input:

x_{0}, E_{0}, n m a x

Output:

x_{n}

1: For

n = 0, 1, . .

Until

c o n v e r g e n c e

Or

n > n m a x

Do

2:

x_{n + 1} : = x_{n} - {(E_{n} + D)}^{- 1} F (x_{n});

3:

s_{n} : = x_{n + 1} - x_{n};

4:

{\tilde{E}}_{n + 1} : = E_{n} - θ

\frac{(E_{n} + D) s_{n} s_{n}^{T}}{∥ s_{n} ∥^{2}};

5:

E_{n + 1} : = d g ({\tilde{E}}_{n + 1}) .

6. End

A variant of Algorithm 3 can be obtained by applying the Sherman–Morrison lemma to avoid the inversion of a matrix in Step 2. It results the following Algorithm 4 (

B_{0}

is a block diagonal matrix):

Algorithm 4.

Input:

x_{0}, B_{0}, n m a x

Output:

x_{n}

1: For

n = 0, 1, . .

Until

c o n v e r g e n c e

Or

n > n m a x

Do

2:

x_{n + 1} : = x_{n} - B_{n} F (x_{n});

3:

s_{n} : = x_{n + 1} - x_{n};

4:

{\tilde{B}}_{n + 1} : = (I +

\frac{θ}{1 - θ} \frac{s_{n} s_{n}^{T}}{∥ s_{n} ∥^{2}}

)

B_{n}

:

5:

B_{n + 1} : = d g ({\tilde{B}}_{n + 1}) .

6. End

The simplest way to design a similar parallel algorithm for the nonlinear case is to replace D in the basic algorithm with the diagonal block of the Jacobian of F,

D = D (x) = d g (J (x))

. It results the following Algorithm 5 for the nonlinear case:

Algorithm 5.

Input:

x_{0}, E_{0}, n m a x

Output:

x_{n}

1: For

n = 0, 1, . .

Until

c o n v e r g e n c e

Or

n > n m a x

Do

2:

x_{n + 1} : = x_{n} - {(E_{n} + D (x_{n}))}^{- 1} F (x_{n});

3:

s_{n} : = x_{n + 1} - x_{n};

4:

{\tilde{E}}_{n + 1} : = E_{n} - θ

\frac{(E_{n} + D (x_{n})) s_{n} s_{n}^{T}}{∥ s_{n} ∥^{2}};

5:

E_{n + 1} : = d g ({\tilde{E}}_{n + 1}) .

6. End

Remark 1. The formal replacement of D with

D (x_{n})

is based on the mean value theorem

F (x_{n + 1}) \approx F (x_{n}) + J (x_{n}) (x_{n + 1} - x_{n})

. However, the convergence properties of Algorithm 5, including its specific conditions, rate of convergence, etc., must be further analyzed as well. The numerical experiments (Section 5) performed so far for certain nonlinear systems show a comparable behavior of the two Algorithms (3 and 5) for the linear and nonlinear case.

The main processing of the Algorithms 3, 5, is the computation of

{(E_{n} + D)}^{- 1}

or the solution of a corresponding linear system. Taking into account that

E_{n} + D

is a block diagonal matrix,

E_{n} + D = d i a g ({(E_{n} + D)}_{1}, . . ., {(E_{n} + D)}_{p})

, where

{(E_{n} + D)}_{i}

,

i = 1, . . ., p

, are matrices of dimension

m_{i}

, we have to perform p subtasks, whose main processing is

{(E_{n} + D)}_{i}^{- 1}

. It is obvious that these subtasks can be done in parallel. The successive updates of the matrix E, Steps 4 and 5, can also be executed in parallel. The concrete parallel implementation of this algorithm is an interesting problem and is left for a future research.

As usual

R^{m \times m}

denotes the set of square

m \times m

real matrices. We use

∥ \cdot ∥

and

{∥ \cdot ∥}_{F}

to denote the spectral and Frobenius norm respectively. The properties of Algorithm 3 are based on the following two lemmas.

Lemma 1. Let

D \in R^{m \times m}

be a block diagonal matrix (conforming with

P

). Then, for any matrix

E \in R^{m \times m}

, there holds

{∥ d g (E) + D ∥}_{p} \leq {∥ E + D ∥}_{p},

where

{∥ \cdot ∥}_{p}

is the spectral or Frobenius norm. If

θ \in [0, 2]

then the sequence

{∥ E_{n} ∥}

as defined in Algorithm 3 is bounded,

∥ E_{n} ∥ \leq a

.

Proof. Using the inequality

{∥ d g (A) ∥}_{p} \leq {∥ A ∥}_{p}, A \in R^{m \times m}

(which is true for both spectral and Frobenius norm), we have

{∥ d g (E) + D ∥}_{p} = {∥ d g (E + D) ∥}_{p} \leq {∥ E + D ∥}_{p} .

To prove the second part of Lemma 1, observe that if

θ \in [0, 2]

then

∥ I - θ \frac{s s^{T}}{{∥ s ∥}^{2}} ∥ = 1

for any

s \in R^{m}

. Therefore

∥ E_{n + 1} + D ∥ \leq ∥ {\tilde{E}}_{n + 1} + D ∥ = ∥ E_{n} + D - θ \frac{(E_{n} + D) s_{n} s_{n}^{T}}{∥ s_{n} ∥^{2}} ∥ \leq ∥ E_{n} + D ∥ ∥ I - θ \frac{s_{n} s_{n}^{T}}{∥ s_{n} ∥^{2}} ∥ = ∥ E_{n} + D ∥ .

Thus

∥ E_{n} + D ∥ \leq ∥ E_{0} + D ∥, \forall n

and

{∥ E_{n} ∥}

is bounded. ☐

Lemma 2. For any matrix

M \in R^{m \times m}

, any

s \in R^{m}

, any

θ \in R

, and

\tilde{M} : = M - θ M s s^{T} / {∥ s ∥}^{2}

we have

∥ \tilde{M} ∥_{F}^{2} = {∥ M ∥}_{F}^{2} - θ (2 - θ) \frac{{∥ M s ∥}^{2}}{{∥ s ∥}^{2}} .

(1)

(The Formula (1) appears also in [4]; a short proof is given here for completeness).

Proof. By trivial computation we obtain

\begin{array}{l} {∥ \tilde{M} ∥}_{F}^{2} = ∥ M - θ \frac{M s s^{T}}{{∥ s ∥}^{2}} ∥_{F}^{2} \\ = {∥ M ∥}_{F}^{2} - 2 \frac{θ}{{∥ s ∥}^{2}} {〈 M, M s s^{T} 〉}_{F} + \frac{θ^{2}}{{∥ s ∥}^{4}} ∥ M s s^{T} ∥_{F}^{2} \end{array}

.

Use

{〈 M, M x x^{T} 〉}_{F} = {∥ M x ∥}^{2}

,

∥ M x x^{T} ∥_{F} = ∥ M x ∥ ∥ x ∥

, and the desired equality results. ☐

3. The Convergence of Algorithm 3 in the Linear Case

In the following, we suppose that the sequence

{x_{n}}

given by Algorithm 3, with starting point

x_{0}

, starting matrix

E_{0}

and certain θ, satisfies the condition:

∥ x_{n + 1} - x_{n} ∥ \leq β, \forall n \in N .

(2)

Algorithm 3 defines the next iteration as a function of the current iteration,

x_{n + 1} = G (x_{n})

, G being the iteration function or generation function. It is clear that the condition (2) is weaker than the condition of asymptotic regularity of G or asymptotic regularity of

x_{0}

under G, (

∥ G^{n + 1} x_{0} - G^{n} x_{0} ∥ \to 0, n \to \infty

). The fulfillment of condition (2) depends not only on the characteristics of A (like the condition number) but also on

x_{0}, E_{0}

and θ. Note also that usually, the sequence

{E_{n}}

is bounded, and if F is also bounded on

R^{m}

then the condition (2) is satisfied; in Section 5 an example of mapping is given which is bounded on

R^{m}

and

∥ D {(x_{n})}^{- 1} ∥ ∥ E_{n} ∥ < 1

.

Theorem 1. Let

A \in R^{m \times m}

be a nonsingular matrix,

D = d g (A)

, and let θ be a real number,

0 < θ < 2

. Suppose that D is invertible and that

a ∥ D^{- 1} ∥ < 1

, where a is defined in Lemma 1. Suppose also that condition (2) is satisfied and that

E_{0} + D

is invertible. Then the parallel variant of Broyden method (Algorithm 3) converges to the solution of the equation

A x - b = 0

.

Proof. Since

∥ E_{n} ∥ ∥ D^{- 1} ∥ \leq ∥ D^{- 1} ∥ a < 1

, from perturbation Lemma, it results that

{(E_{n} + D)}^{- 1}

exists for all n and the sequence

{x_{n}}

is well defined by Algorithm 3. By taking

M = E_{k} + D, s = s_{k}

and applying Lemmas 1 and 2, we obtain

∥ E_{k + 1} {+ D ∥}_{F}^{2} \leq {∥ E_{k} + D ∥}_{F}^{2} - θ (2 - θ) \frac{∥ (E_{k} + D) s_{k} ∥^{2}}{∥ s_{k} ∥^{2}} .

By summing on k,

k = 0, 1, . . ., n

, we have

0 \leq ∥ E_{n + 1} {+ D ∥}_{F}^{2} \leq {∥ E_{0} + D ∥}_{F}^{2} - θ (2 - θ) \sum_{k = 0}^{\infty} \frac{∥ (E_{k} + D) s_{k} ∥^{2}}{∥ s_{k} ∥^{2}} .

This implies that

\frac{∥ (E_{n} + D) s_{n} ∥}{∥ s_{n} ∥} \to 0, n \to \infty .

Now, because

(E_{n} + D) s_{n} = - F (x_{n})

, we have

\frac{∥ F (x_{n}) ∥}{β} \leq \frac{∥ F (x_{n}) ∥}{∥ s_{n} ∥} = \frac{∥ (E_{n} + D) s_{n} ∥}{∥ s_{n} ∥} \to 0,

and

∥ A^{- 1} ∥^{- 1} ∥ x_{n} - x^{*} ∥ \leq ∥ A (x_{n} - x^{*}) ∥ = ∥ F (x_{n}) ∥ \to 0 .

☐

4. Remarks on the Parallel Implementation

The parallelization of the product

{(E_{n} + D)}^{- 1} F (x_{n})

(Step 2 in Algorithm 3) is obvious, because

E_{n} + D

is block diagonal. Concerning the product

(E_{n} + D) s_{n} s_{n}^{T}

, observe first that the element

p_{i j}

of the product

s s^{T}

is

p_{i j} = s_{i} s_{j}

and

s_{n} s_{n}^{T}

can be computed by a very simple algorithm. Then, because

E_{n} + D

is block diagonal,

E_{n} + D = d i a g ({(E_{n} + D)}_{1}, . . ., {(E_{n} + D)}_{p})

with dimensions

m_{1}, . . ., m_{p}

, the product

{(E_{n} + D)}_{i} s_{n} s_{n}^{T}

gives the

m_{i}

lines of

(E_{n} + D) s_{n} s_{n}^{T}

. These products are independent from each other and therefore can be computed in parallel. The same observations can be made for Algorithms 4 and 5.

In order to implement this method, let us suppose that the multiprocessor system has p processors. A simple idea is to set the system into p partitions and assign a partition to every processor. In this case the following issue arises: How large should the partitions

m_{i}, i = 1, . . ., p

be in order to obtain a low cumulative computation cost per iteration? In the case of Algorithms 3 and 5, the main processing of processor i is to inverse a matrix of dimension

m_{i}

. Therefore every processor has to perform a computation that has a polynomial complexity of order

m_{i}^{3}

. The cumulative computation cost for all p processors is

m_{1}^{3} + m_{2}^{3} + . . . + m_{p}^{3}

, and, if there is not overlapping,

m_{1} + m_{2} + . . . + m_{p} = m

. Further, we use the following elementary inequality

\sum_{i = 1}^{p} x_{i}^{q} \leq (p - 1) μ^{q} + {(\sum_{i = 1}^{p} x_{i} - (p - 1) μ)}^{q},

where

x_{i}

are positive numbers and

μ = min {x_{1}, . . ., x_{p}}

. If

q = 3

and

x_{i} = m_{i}

, then

\sum_{i = 1}^{p} m_{i}^{3} \leq (p - 1) μ^{3} + {(m - (p - 1) μ)}^{3} .

We will prove now that if

\frac{m}{{(p^{2} - p + 1)}^{1 / 3} + p - 1} \leq μ \leq \frac{m}{p},

(3)

then

(p - 1) μ^{3} + {(m - (p - 1) μ)}^{3} \leq \frac{m^{3}}{p} .

Because

p μ \leq m

, it is sufficient to show that

(p - 1) μ^{3} + {(m - (p - 1) μ)}^{3} \leq p^{2} μ^{3}

, and this is equivalent with the first part of (3). We obtain the following

Propositon 1. Suppose that the smallest size of partitions,

μ = min {m_{1}, . . ., m_{p}}

, satisfies the condition (3). Then the cumulative computation cost in Algorithms 3 and 5 is less than

m^{3} / p

.

A similar result can be established for Algorithm 4. The condition (3) becomes

\frac{m}{{(p + 1)}^{1 / 2} + p - 1} \leq μ \leq \frac{m}{p},

and the cumulative computation cost satisfies

\sum_{i = 1}^{p} m_{i}^{2} \leq 2 m^{2} / p

.

5. Numerical Experiments and Conclusions

The main purpose of this section is to highlight the convergence properties of the proposed parallel Broyden’s algorithms in the case of a synchronous model. The communication costs between processors, scheduling and loads balancing, optimal processors assignment, and so on, both for synchronous and asynchronous models, are important issues in themselves. However they are not the subject of the present study.

The behavior of the sequence

{x_{n}}

obtained by the proposed algorithms and for various linear or nonlinear cases is shown in our experiments by the graph of

l n ε_{n}

, where

ε_{n}

is the error at step n,

ε_{n} = ∥ F (x_{n}) ∥

or

ε_{n} = ∥ x_{n} - x^{*} ∥

(n on horizontal axis and

l n ε_{n}

on vertical axis). The convergence of the process entails a decreasing graph until negative values (for example,

l n ε_{n} = - 30

means

ε_{n} \approx 10^{- 15}

). If the convergence is linear (

r = 1

) then

l n ε_{n} \approx n l n ϱ + l n ε_{0}

;

l n ε_{n}

depends linearly on n and the graph of

l n ε_{n}

is close to a straight line. For comparison reasons, the proposed parallel Broyden’s method is compared with the parallel Newton method and the parallel Cimmino method. We consider the following parallel Newton method [8]:

x_{n + 1} = x_{n} - D {(x_{n})}^{- 1} F (x_{n}), D (x_{n}) = d g (J (x_{n})),

where

J (x_{n})

is the Jacobian of F. Note that in the case of a linear system, by applying this form of Newton method, the direct computation of

x^{*}

(in one iteration) is lost and the convergence is linear. Consequently, the graph of

l n (ε_{n})

is a straight line.

The block Cimmino method is considered only for the linear case,

F (x) = A x + b

, and it can be described as follows (see, for example, [9]). The system is partitioned into p strips of rows,

A^{i}

and

b^{i}

,

1 \leq i \leq p

. Supposing that

A^{i}

has full row rank, the Moore–Penrose pseudo inverse of

A^{i}

is defined by

A^{i^{+}} = A^{i^{T}} {(A^{i} A^{i^{T}})}^{- 1}

. The block Cimmino algorithm is

Step 1: Compute in parallel $Q_{n}^{i} = A^{i^{+}} (A^{i} x_{n} + b^{i});$
Step 2: $x_{n + 1} = x_{n} + ω \sum_{i = 1}^{p} Q_{n}^{i},$

where ω is a relaxation parameter.

Experiment 1.

We applied Algorithm 3 to linear systems of medium dimensions,

m = 20, 50, 100, 500, 600

, with different values of condition numbers. A program “genA” generates a square, sparse matrix A of dimension m with random elements

a_{i j} \in (- 1, 1), i \neq j

and

a_{i i} \in (d 1, d 2)

, where

d 1, d 2

are given (positive) numbers. The percent of nonzero elements is an entry data of “genA” and in this experiment we considered it to be about 50%. The free term b and the starting point

x_{0}

were also randomly generated with elements between some limits. Thus the function F is randomly defined by

F (x) = A x - b

.

Figure 1. The graphs of

l n (ε_{n})

generated by Algorithm 3, parallel Newton method and parallel Cimmino method in the linear case; (a) system well conditioned; (b) system medium conditioned; (c) system poorly conditioned.

Figure 1. The graphs of

l n (ε_{n})

generated by Algorithm 3, parallel Newton method and parallel Cimmino method in the linear case; (a) system well conditioned; (b) system medium conditioned; (c) system poorly conditioned.

For every considered system, we applied Algorithm 3, the parallel Newton method and the parallel Cimmino method, and we drew the resulting graphs in the same plot; for the graph of Algorithm 3 we used a continuous line, for the parallel Newton method a dashed line, and for the parallel Cimmino method a dotted line. The number of partitions and the dimensions of every partition are entry data of the routine “diag(A)”. The test systems were automatically generated by routine “genA"; the three graphs of Figure 1 are examples from a large number of cases, corresponding to certain systems of dimensions 50, 100, 200 respectively; other parameters like c (the condition number), θ (the factor in Step 4),

p (n_{1}, . . ., n_{p})

(the number of partitions and their dimensions), were chosen as follows:

(a): $m = 50, c = 3.569, θ = 0.02, p (. . .) = 5 (11, 9, 13, 11, 6), E_{0} = I$ ,
(b): $m = 100, c = 71.69, θ = 0.02, p (. . .) = 5 (21, 19, 23, 21, 16), E_{0} = I$ ,
(c): $m = 200, c = 785, θ = 0.02, p (. . .) = 5 (41, 39, 43, 41, 36), E_{0} = I$ .

The following conclusions can be drawn. (1) The proposed parallel Broyden’s algorithm works well if the system is relatively well conditioned (the condition number should be around ten), and in this case the behavior of Algorithm 3 is very similar to the parallel Newton method and significantly better than the parallel Cimmino method. (Figure 1a); (2) For medium conditioned systems, the behavior of Algorithm 3 is unpredictable, sometimes it works better than the Newton method (Figure 1b); (3) If the system is poorly conditioned (the condition number is greater than 300) the proposed parallel Broyden’s algorithm fails to converge (the Newton method has the same behavior) (Figure 1c).

Figure 2a presents the behavior of the sequence generated by the Algorithm 3 for a linear system of dimension

m = 500

,

c = 4.366, p (. . .) = 5 (101, 51, 151, 101, 96), θ = 0.2, E_{0} = I

. The condition (2) of Theorem 1 is not satisfied and the sequence generated by this algorithm (in this case) does not converge. However, the behavior is interesting, the generated sequence becomes very close to the solution, the error decreases until a very small value (in our example until

10^{- 12}

), and then the good behavior is broken. The graph corresponding to the sequence generated by Algorithm 4 for a system of dimension

m = 600

and

c = 3.942, p (. . .) = 6 (101, 51, 151, 101, 101, 95), θ = 0.03, E_{0} = {(I + D)}^{- 1}

is presented in Figure 2b. We can observe the similarities between this sequence and the sequence obtained by the parallel Newton method.

Figure 2. The behavior of the sequence generated by Algorithm 3, Parallel Newton method and Parallel Cimmino method (the graphs (a)) and by Algorithm 4, Parallel Newton method and Parallel Cimmino method (the graphs (b)); (a) system which does not satisfy the condition (2); (b) system well conditioned.

Remark 2. The numerical experiments presented here, for the most part, have been performed with the basic Algorithm 3; for Algorithm 4 similar conclusions can be drawn. A special characteristics of Algorithm 4, in comparison with Algorithm 3, is that Algorithm 4 is less sensitive to θ; this parameter can take its values on a much larger interval,

θ \in (0, 2)

.

Experiment 2.

This experiment is devoted to show the behavior of the sequences generated by Algorithm 5 in the case of nonlinear systems. We considered several nonlinear systems of low dimensions and of certain sparsity (every nonlinear equation depends on a few numbers of unknowns). Figure 3 presents the results for the following three nonlinear systems:

(1) \{\begin{matrix} 4 x_{1}^{2} - x_{2} x_{6} - 3 = 0, \\ x_{1} - 5 x_{2} + x_{3}^{2} + x_{7} = 0, \\ - x_{1} + 5 x_{3} - x_{4}^{2} - 3 = 0, \\ x_{1} x_{2} - 4 x_{4} + x_{5}^{2} + 2 = 0, \\ x_{4} - 5 x_{5} + 2 x_{6} = 0, \\ x_{1} - 6 x_{6} + 4 = 0, \\ x_{2} + x_{3}^{2} - 4 x_{7} + 2 = 0 . \end{matrix} (2) \{\begin{matrix} 3 x_{1} + x_{2}^{2} x_{3} = 0, \\ 2 x_{2} + x_{3} - 2 x_{7}^{2} = 0, \\ x_{1} - 0.2 x_{2} x_{4} + 3 x_{3} = 0, \\ 3 x_{4} - x_{6}^{2} = 0, \\ x_{2} + 2 x_{5} = 0, \\ x_{3} x_{7} + 5 x_{6} = 0, \\ x_{1}^{2} + x_{2} + x_{5} + 4 x_{7} = 0 . \end{matrix} (3) \{\begin{matrix} 5 \frac{x_{1}^{2} + x_{2}^{2}}{x_{1}^{2} + x_{2}^{2} + 1} - \frac{10}{3} = 0, \\ 4 \frac{x_{3}^{2}}{x_{3}^{2} + 1} - 2 = 0, \\ 3 \frac{x_{2}^{2} + x_{3}^{2}}{x_{2}^{2} + x_{3}^{2} + 1} = 0 . \end{matrix}

Figure 3. The graphs of

l n (ε_{n})

generated by Algorithm 5 and parallel Newton method; (a) the graph corresponding to systems (1); (b) the graph corresponding to systems (2) given by Algorithm 5 modified in accordance with Sherman-Morrison lemma; (c) example of a system verifying the condition

∥ D {(x_{n})}^{- 1} ∥ ∥ E_{n} ∥ < 1

.

Figure 3. The graphs of

l n (ε_{n})

generated by Algorithm 5 and parallel Newton method; (a) the graph corresponding to systems (1); (b) the graph corresponding to systems (2) given by Algorithm 5 modified in accordance with Sherman-Morrison lemma; (c) example of a system verifying the condition

∥ D {(x_{n})}^{- 1} ∥ ∥ E_{n} ∥ < 1

.

The proposed parallel Broyden’s method is compared with the parallel Newton method described above. The conclusion is that, generally, the parallel Broyden’s method has similar characteristics with the considered parallel Newton method, convergence properties, attraction basin, etc. This similarity can be seen in the case of the first system (Figure 3a). The third system is an example for which F is bounded on

R^{3}

, as it is required in Section 3. Note that the condition

∥ D {(x_{n})}^{- 1} ∥ ∥ E_{n} ∥ < 1

is also satisfied.

Experiment 3.

This experiment is devoted to compare Algorithm 3 with the parallel variants proposed in [1] and [2,3]. Three linear systems of low dimensions (m = 5) with condition numbers equal to

8.598, 4.624, 3.701

were considered as test systems. Every system was set into two partitions of dimensions 3 and 2 respectively.

Figure 4. The comparison of Algorithms 3, 1 and 2 in the care of linear systems of low dimensions (m = 5); (a) condition number = 8.598; (b) condition number=4.624; (c) condition number = 3.624.

The results are presented in Figure 4. We can conclude that the behavior of Algorithm 3 is satisfactory, as in all cases the generated sequence converges to the solution of the system and the convergence is linear. It is worthwhile to note the very good behavior of Algorithm 3 in comparison with the other two variants for cases (b) and (c): in case (b) the Algorithms 1 and 2 appear to have a very slow convergence, and in case (c) these algorithms fail to converge.

Acknowledgment

The authors are very grateful to the anonymous referees for valuable remarks and comments, which significantly contributed to the quality of the paper.

This research was supported of West University of Timisoara, Romania, Agreement of Academic and Research Activities, No. 12910/21.05.2015.

Author Contributions

Stefan Maruster theoretical approach; Ioan Bistran and Liviu Octavian Mafteiu-Scai designed and performed the experiments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xu, J.J. Convergence of partially asynchronous block quasi-Newton methods for nonlinear systems of equations. J. Comput. Appl. Math. 1999, 103, 307–321. [Google Scholar] [CrossRef]
Jiang, P.; Yang, G. Performance analysis of preconditioners based on Broyden method. Appl. Math. Comput. 2006, 178, 295–308. [Google Scholar] [CrossRef]
Yang, G.; Jiang, P. SSOR and ASSOR preconditioners for block Broyden method. Appl. Math. Comput. 2007, 188, 194–205. [Google Scholar] [CrossRef]
More, J.J.; Trangenstein, J.A. On the global convergence of Broyden’s method. Math. Comput. 1976, 30, 523–540. [Google Scholar] [CrossRef]
Gay, D.M. Some convergence properties of Broyden’s method. SIAM J. Numer. Anal. 1979, 16, 623–630. [Google Scholar] [CrossRef]
Dennis, J.E., Jr.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1983. [Google Scholar]
Martinez, J.M. Algorithms for Solving Nonlinear Systems of Equations; Springer: Heidelberg, Germany, 1994. [Google Scholar]
Lazar, I. On the convergence of asynchronous block Newton method for nonlinear systems of equations. Informatica 2002, 47, 75–84. [Google Scholar]
Balsa, C.; Guiverch, R.; Raimundo, J.; Ruiz, D. MUMPS Based Approach to Parallelize the Block Cimmino Algorithm. In Proceedings of the 8th International Meeting High Performance Computing for Computational Science, Toulouse, France, 1 Junuary 2008.

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bistran, I.; Maruster, S.; Mafteiu-Scai, L.O. Parallel Variants of Broyden’s Method. Algorithms 2015, 8, 774-785. https://0-doi-org.brum.beds.ac.uk/10.3390/a8030774

AMA Style

Bistran I, Maruster S, Mafteiu-Scai LO. Parallel Variants of Broyden’s Method. Algorithms. 2015; 8(3):774-785. https://0-doi-org.brum.beds.ac.uk/10.3390/a8030774

Chicago/Turabian Style

Bistran, Ioan, Stefan Maruster, and Liviu Octavian Mafteiu-Scai. 2015. "Parallel Variants of Broyden’s Method" Algorithms 8, no. 3: 774-785. https://0-doi-org.brum.beds.ac.uk/10.3390/a8030774

Article Menu

Parallel Variants of Broyden’s Method

Abstract

1. Introduction

2. The Parallel Variants and Preliminaries

3. The Convergence of Algorithm 3 in the Linear Case

4. Remarks on the Parallel Implementation

5. Numerical Experiments and Conclusions

Acknowledgment

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI