# Parallel Variants of Broyden’s Method

^{*}

*Keywords:*systems of equations; Broyden’s method; parallel algorithms

Next Article in Journal / Special Issue

Previous Article in Journal

Previous Article in Special Issue

Previous Article in Special Issue

West University of Timisoara, B-dul V. Parvan No.4, Timisoara 300223, Romania

Author to whom correspondence should be addressed.

Academic Editors: Alicia Cordero, Juan R. Torregrosa and Francisco I. Chicharro

Received: 23 June 2015 / Revised: 27 August 2015 / Accepted: 1 September 2015 / Published: 15 September 2015

(This article belongs to the Special Issue Numerical Algorithms for Solving Nonlinear Equations and Systems)

In this paper we investigate some parallel variants of Broyden’s method and, for the basic variant, we present its convergence properties. The main result is that the behavior of the considered parallel Broyden’s variants is comparable with the classical parallel Newton method, and significantly better than the parallel Cimmino method, both for linear and nonlinear cases. The considered variants are also compared with two more recently proposed parallel Broyden’s method. Some numerical experiments are presented to illustrate the advantages and limits of the proposed algorithms.

Let $F:{\mathcal{R}}^{m}\to {\mathcal{R}}^{m}$ be a (nonlinear) mapping and let $F\left(x\right)=0$ be the corresponding system of equations. In [1], Xu proposed the following partially asynchronous block quasi-Newton method for such a nonlinear system. Let $\mathcal{P}$ be a partition of F and x in p partitions, $F=({F}_{1},...,{F}_{p})$, $x=\phantom{\rule{3.33333pt}{0ex}}({x}_{1},...,{x}_{p})$, where ${F}_{i}$ and ${x}_{i}$ are block functions and block variables, respectively, of the same dimensions ${m}_{i}$. The variable x is the common memory of the multiprocessor system, every processor i can read the content of x and can write its new update of ${x}_{i}$. Let ${B}_{i}$, $i=1,...,p$, be square matrices of dimension ${m}_{i}$. The main steps of the process i are :

Algorithm 1. |

Input: $x,\phantom{\rule{4pt}{0ex}}{B}_{i}$ |

Output: x |

1. Compute ${x}_{i}^{+}:={x}_{i}-{B}_{i}^{-1}{F}_{i}\left(x\right);$ |

2. Get ${x}^{+}$ from x by replacing ${x}_{i}$ with ${x}_{i}^{+};$ |

3. Compute $s={x}_{i}^{+}-{x}_{i}\phantom{\rule{4pt}{0ex}}$ and $y:={F}_{i}\left({x}^{+}\right)-{F}_{i}\left(x\right);$ |

4. Update ${B}_{i},\phantom{\rule{4pt}{0ex}}{B}_{i}:={B}_{i}+$$\frac{(y-{B}_{i}s){s}^{T}}{{s}^{T}s}$; |

5. ${x}_{i}:={x}_{i}^{+},\phantom{\rule{4pt}{0ex}}x:={x}^{+};$ |

Applying the Sherman–Morrison lemma (Step 4, Algorithm 1) to inverse the matrix ${B}_{i}$, the polynomial complexity of order three is reduced to polynomial complexity of order two.

More recently, in [2,3] Jiang and Yang proposed several preconditioners for the block Broyden’s method and discussed the implementation process. The main steps of parallel variant of Broyden’s method considered in [2,3] (Algorithm 2 below) are (${B}_{0}$ is a block diagonal matrix, ${B}_{k}^{i}$ is the ${i}^{th}$ diagonal block of ${B}_{k}$):

Algorithm 2. |

Input: ${x}_{0},\phantom{\rule{4pt}{0ex}}{B}_{0},\phantom{\rule{4pt}{0ex}}kmax$ |

Output: ${x}_{k}$ |

1. For $k=0,1,..$ Until $convergence$ Or $k>kmax$ Do |

2. ${x}_{k+1}={x}_{k}-{B}_{k}^{-1}F\left({x}_{k}\right)$; |

3. $s={x}_{k+1}-{x}_{k}$; |

4. For $i=1,...,p$ |

5. ${B}_{k}^{i}={B}_{k}^{i}+$$\frac{{F}_{i}\left({x}_{k+1}\right){s}_{i}^{T}}{{s}^{T}s}$. |

6. End |

7. End |

In the case of a linear system, the sequential Broyden’s method has global convergence provided that the system matrix is invertible [4], i.e., the sequence generated by this method converges to the solution of the system for any starting point ${x}_{0}$ and for any starting matrix ${B}_{0}$ (or ${H}_{0}$). Moreover, in this case the convergence is finite [5], and the solution is obtained in at most $2n$ steps; conditions in which this method requires a full $2n$ steps to converge are also given. In the same paper [5], it is shown that Broyden’s method has $2n$-step Q-quadratic local convergence for the nonlinear case (the initial starting point ${x}_{0}$ and the initial starting matrix ${B}_{0}$ must be close to the solution point of the system and to the Jacobian matrix in the solution point, respectively). Note that $2n$-step Q-quadratic convergence means that the subsequence $\left\{{x}_{2n}\right\}$ of $\left\{{x}_{n}\right\}$ has Q-quadratic convergence.

In this paper we propose some new variants of parallel Broyden’s method that are suitable both for linear and nonlinear systems. We prove the convergence of our basic algorithm in the linear case. Numerical experiments are presented for linear and nonlinear cases, and some remarks are made concerning the behavior of the generated sequence.

In the case of a linear mapping, $F\left(x\right)=Ax-b$, the Broyden’s “good formula” [6,7], ${B}^{+}=B+\theta (y-Bs){s}^{T}/{s}^{T}s$ (B and ${B}^{+}$ are the current and the next iterates, respectively, and θ is a positive number chosen such that ${B}^{+}$ is invertible [4]) can be written in the following form: ${B}^{+}=B-\theta (B-A)s{s}^{T}/{s}^{T}s$. Using the notation $E=B-A$, the Broyden’s method becomes ${x}^{+}=x-{(E+A)}^{-1}F\left(x\right),\phantom{\rule{4pt}{0ex}}{E}^{+}=E-\theta Es{s}^{T}/{s}^{T}s$. The main idea of the proposed variants is to use instead of A the block diagonal matrix of A, denoted by D throughout the paper, partitioned according to $\mathcal{P}$. The computer routine $dg\left(A\right)$ will produce such a block diagonal matrix, i.e., $D:=dg\left(A\right)$.

Based on this idea we can consider several parallel variants of Broyden’s algorithm. The first variant is just the above mentioned algorithm, to which we added a supplementary step to successively block-diagonalize the matrix E (the Step 5, Algorithm 3 below). We consider also the following slight modification of update formula
By adding D to both sides of this formula, the algorithm becomes more homogeneous (the same matrix is used in both Steps 2 and 4, Algorithm 3 below). Moreover (and not in the least) for this variant, a simple and elegant proof can be given for the convergence of the generated sequence (Theorem 1). Therefore the first our variant, which will be considered as the basic algorithm, is (${E}_{0}$ is a block diagonal matrix):

$${\tilde{E}}^{+}:=E-\theta \frac{(E+D){s}_{n}{s}_{n}^{T}}{\parallel {s}_{n}{\parallel}^{2}}.$$

Algorithm 3. |

Input: ${x}_{0},\phantom{\rule{4pt}{0ex}}{E}_{0},\phantom{\rule{4pt}{0ex}}nmax$ |

Output: ${x}_{n}$ |

1: For $n=0,1,..$ Until $convergence$ Or $n>nmax$ Do |

2: ${x}_{n+1}:={x}_{n}-{({E}_{n}+D)}^{-1}F\left({x}_{n}\right);$ |

3: ${s}_{n}:={x}_{n+1}-{x}_{n};$ |

4: ${\tilde{E}}_{n+1}:={E}_{n}-\theta $$\frac{({E}_{n}+D){s}_{n}{s}_{n}^{T}}{\parallel {s}_{n}{\parallel}^{2}};$ |

5: ${E}_{n+1}:=dg\left({\tilde{E}}_{n+1}\right).$ |

6. End |

A variant of Algorithm 3 can be obtained by applying the Sherman–Morrison lemma to avoid the inversion of a matrix in Step 2. It results the following Algorithm 4 (${B}_{0}$ is a block diagonal matrix):

Algorithm 4. |

Input: ${x}_{0},\phantom{\rule{4pt}{0ex}}{B}_{0},\phantom{\rule{4pt}{0ex}}nmax$ |

Output: ${x}_{n}$ |

1: For $n=0,1,..$ Until $convergence$ Or $n>nmax$ Do |

2: ${x}_{n+1}:={x}_{n}-{B}_{n}F\left({x}_{n}\right);$ |

3: ${s}_{n}:={x}_{n+1}-{x}_{n};$ |

4: $\phantom{\rule{4pt}{0ex}}{\tilde{B}}_{n+1}:=(I+$$\frac{\theta}{1-\theta}\frac{{s}_{n}{s}_{n}^{T}}{\parallel {s}_{n}{\parallel}^{2}}$)${B}_{n}$: |

5: ${B}_{n+1}:=dg\left({\tilde{B}}_{n+1}\right).$ |

6. End |

The simplest way to design a similar parallel algorithm for the nonlinear case is to replace D in the basic algorithm with the diagonal block of the Jacobian of F, $D\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}D\left(x\right)=\phantom{\rule{3.33333pt}{0ex}}dg\left(J\right(x\left)\right)$. It results the following Algorithm 5 for the nonlinear case:

Algorithm 5. |

Input: ${x}_{0},\phantom{\rule{4pt}{0ex}}{E}_{0},\phantom{\rule{4pt}{0ex}}nmax$ |

Output: ${x}_{n}$ |

1: For $n=0,1,..$ Until $convergence$ Or $n>nmax$ Do |

2: ${x}_{n+1}:={x}_{n}-{({E}_{n}+D\left({x}_{n}\right))}^{-1}F\left({x}_{n}\right);$ |

3: ${s}_{n}:={x}_{n+1}-{x}_{n};$ |

4: ${\tilde{E}}_{n+1}:={E}_{n}-\theta $$\frac{({E}_{n}+D\left({x}_{n}\right)){s}_{n}{s}_{n}^{T}}{\parallel {s}_{n}{\parallel}^{2}};$ |

5: ${E}_{n+1}:=dg\left({\tilde{E}}_{n+1}\right).$ |

6. End |

The main processing of the Algorithms 3, 5, is the computation of ${({E}_{n}+D)}^{-1}$ or the solution of a corresponding linear system. Taking into account that ${E}_{n}+D$ is a block diagonal matrix, ${E}_{n}+D=diag({({E}_{n}+D)}_{1},...,{({E}_{n}+D)}_{p})$, where ${({E}_{n}+D)}_{i}$, $i=1,...,p$, are matrices of dimension ${m}_{i}$, we have to perform p subtasks, whose main processing is ${({E}_{n}+D)}_{i}^{-1}$. It is obvious that these subtasks can be done in parallel. The successive updates of the matrix E, Steps 4 and 5, can also be executed in parallel. The concrete parallel implementation of this algorithm is an interesting problem and is left for a future research.

As usual ${\mathcal{R}}^{m\times m}$ denotes the set of square $m\times m$ real matrices. We use $\parallel \xb7\parallel $ and ${\parallel \xb7\parallel}_{F}$ to denote the spectral and Frobenius norm respectively. The properties of Algorithm 3 are based on the following two lemmas.

$${\parallel dg\left(E\right)+D\parallel}_{p}\le {\parallel E+D\parallel}_{p},$$

$${\parallel dg\left(E\right)+D\parallel}_{p}={\parallel dg(E+D)\parallel}_{p}\le {\parallel E+D\parallel}_{p}.$$

To prove the second part of Lemma 1, observe that if $\theta \in [0,2]$ then $\parallel I-\theta \frac{s{s}^{T}}{{\parallel s\parallel}^{2}}\parallel =1$ for any $s\in {\mathcal{R}}^{m}$. Therefore
Thus $\parallel {E}_{n}+D\parallel \le \parallel {E}_{0}+D\parallel ,\phantom{\rule{4pt}{0ex}}\forall n$ and $\{\parallel {E}_{n}\parallel \}$ is bounded. ☐

$$\parallel {E}_{n+1}+D\parallel \le \parallel {\tilde{E}}_{n+1}+D\parallel =\parallel {E}_{n}+D-\theta \frac{({E}_{n}+D){s}_{n}{s}_{n}^{T}}{\parallel {s}_{n}{\parallel}^{2}}\parallel \le \parallel {E}_{n}+D\parallel \parallel I-\theta \frac{{s}_{n}{s}_{n}^{T}}{\parallel {s}_{n}{\parallel}^{2}}\parallel =\parallel {E}_{n}+D\parallel .$$

$$\parallel \tilde{M}{\parallel}_{F}^{2}={\parallel M\parallel}_{F}^{2}-\theta (2-\theta )\frac{{\parallel Ms\parallel}^{2}}{{\parallel s\parallel}^{2}}.$$

(The Formula (1) appears also in [4]; a short proof is given here for completeness).

$$\begin{array}{l}{\parallel \tilde{M}\parallel}_{F}^{2}=\parallel M-\theta \frac{Ms{s}^{T}}{{\parallel s\parallel}^{2}}{\parallel}_{F}^{2}\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}={\parallel M\parallel}_{F}^{2}-2\frac{\theta}{{\parallel s\parallel}^{2}}{\langle M,Ms{s}^{T}\rangle}_{F}+\frac{{\theta}^{2}}{{\parallel s\parallel}^{4}}\parallel Ms{s}^{T}{\parallel}_{F}^{2}\end{array}$$

Use ${\langle M,Mx{x}^{T}\rangle}_{F}={\parallel Mx\parallel}^{2}$, $\parallel Mx{x}^{T}{\parallel}_{F}=\parallel Mx\parallel \parallel x\parallel $, and the desired equality results. ☐

In the following, we suppose that the sequence $\left\{{x}_{n}\right\}$ given by Algorithm 3, with starting point ${x}_{0}$, starting matrix ${E}_{0}$ and certain θ, satisfies the condition:
Algorithm 3 defines the next iteration as a function of the current iteration, ${x}_{n+1}=G\left({x}_{n}\right)$, G being the iteration function or generation function. It is clear that the condition (2) is weaker than the condition of asymptotic regularity of G or asymptotic regularity of ${x}_{0}$ under G, ($\parallel {G}^{n+1}{x}_{0}-{G}^{n}{x}_{0}\parallel \to 0,\phantom{\rule{4pt}{0ex}}n\to \infty $). The fulfillment of condition (2) depends not only on the characteristics of A (like the condition number) but also on ${x}_{0},{E}_{0}$ and θ. Note also that usually, the sequence $\left\{{E}_{n}\right\}$ is bounded, and if F is also bounded on ${\mathcal{R}}^{m}$ then the condition (2) is satisfied; in Section 5 an example of mapping is given which is bounded on ${\mathcal{R}}^{m}$ and $\parallel D{\left({x}_{n}\right)}^{-1}\parallel \parallel {E}_{n}\parallel <1$.

$$\parallel {x}_{n+1}-{x}_{n}\parallel \le \beta ,\phantom{\rule{4pt}{0ex}}\forall n\in \mathcal{N}.$$

$$\parallel {E}_{k+1}{+D\parallel}_{F}^{2}\le {\parallel {E}_{k}+D\parallel}_{F}^{2}-\theta (2-\theta )\frac{\parallel ({E}_{k}+D){s}_{k}{\parallel}^{2}}{\parallel {s}_{k}{\parallel}^{2}}.$$

By summing on k, $k=0,1,...,n$, we have

$$0\le \parallel {E}_{n+1}{+D\parallel}_{F}^{2}\le {\parallel {E}_{0}+D\parallel}_{F}^{2}-\theta (2-\theta )\sum _{k=0}^{\infty}\frac{\parallel ({E}_{k}+D){s}_{k}{\parallel}^{2}}{\parallel {s}_{k}{\parallel}^{2}}.$$

This implies that

$$\frac{\parallel ({E}_{n}+D){s}_{n}\parallel}{\parallel {s}_{n}\parallel}\to 0,\phantom{\rule{4pt}{0ex}}n\to \infty .$$

Now, because $({E}_{n}+D){s}_{n}=-F\left({x}_{n}\right)$, we have
and

$$\frac{\parallel F\left({x}_{n}\right)\parallel}{\beta}\le \frac{\parallel F\left({x}_{n}\right)\parallel}{\parallel {s}_{n}\parallel}=\frac{\parallel ({E}_{n}+D){s}_{n}\parallel}{\parallel {s}_{n}\parallel}\to 0,$$

$$\parallel {A}^{-1}{\parallel}^{-1}\parallel {x}_{n}-{x}^{*}\parallel \le \parallel A({x}_{n}-{x}^{*})\parallel =\parallel F\left({x}_{n}\right)\parallel \to 0.$$

☐

The parallelization of the product ${({E}_{n}+D)}^{-1}F\left({x}_{n}\right)$ (Step 2 in Algorithm 3) is obvious, because ${E}_{n}+D$ is block diagonal. Concerning the product $({E}_{n}+D){s}_{n}{s}_{n}^{T}$, observe first that the element ${p}_{ij}$ of the product $s{s}^{T}$ is ${p}_{ij}={s}_{i}{s}_{j}$ and ${s}_{n}{s}_{n}^{T}$ can be computed by a very simple algorithm. Then, because ${E}_{n}+D$ is block diagonal, ${E}_{n}+D=diag({({E}_{n}+D)}_{1},...,{({E}_{n}+D)}_{p})$ with dimensions ${m}_{1},...,{m}_{p}$, the product ${({E}_{n}+D)}_{i}{s}_{n}{s}_{n}^{T}$ gives the ${m}_{i}$ lines of $({E}_{n}+D){s}_{n}{s}_{n}^{T}$. These products are independent from each other and therefore can be computed in parallel. The same observations can be made for Algorithms 4 and 5.

In order to implement this method, let us suppose that the multiprocessor system has p processors. A simple idea is to set the system into p partitions and assign a partition to every processor. In this case the following issue arises: How large should the partitions ${m}_{i},\phantom{\rule{4pt}{0ex}}i=1,...,p$ be in order to obtain a low cumulative computation cost per iteration? In the case of Algorithms 3 and 5, the main processing of processor i is to inverse a matrix of dimension ${m}_{i}$. Therefore every processor has to perform a computation that has a polynomial complexity of order ${m}_{i}^{3}$. The cumulative computation cost for all p processors is ${m}_{1}^{3}+{m}_{2}^{3}+...+{m}_{p}^{3}$, and, if there is not overlapping, ${m}_{1}+{m}_{2}+...+{m}_{p}=m$. Further, we use the following elementary inequality
where ${x}_{i}$ are positive numbers and $\mu =min\{{x}_{1},...,{x}_{p}\}$. If $q=3$ and ${x}_{i}={m}_{i}$, then

$$\sum _{i=1}^{p}{x}_{i}^{q}\le (p-1){\mu}^{q}+{(\sum _{i=1}^{p}{x}_{i}-(p-1)\mu )}^{q},$$

$$\sum _{i=1}^{p}{m}_{i}^{3}\le (p-1){\mu}^{3}+{(m-(p-1)\mu )}^{3}.$$

We will prove now that if
then

$$\frac{m}{{({p}^{2}-p+1)}^{1/3}+p-1}\le \mu \le \frac{m}{p},$$

$$(p-1){\mu}^{3}+{(m-(p-1)\mu )}^{3}\le \frac{{m}^{3}}{p}.$$

Because $p\mu \le m$, it is sufficient to show that $(p-1){\mu}^{3}+{(m-(p-1)\mu )}^{3}\le {p}^{2}{\mu}^{3}$, and this is equivalent with the first part of (3). We obtain the following

A similar result can be established for Algorithm 4. The condition (3) becomes
and the cumulative computation cost satisfies ${\sum}_{i=1}^{p}{m}_{i}^{2}\le 2{m}^{2}/p$.

$$\frac{m}{{(p+1)}^{1/2}+p-1}\le \mu \le \frac{m}{p},$$

The main purpose of this section is to highlight the convergence properties of the proposed parallel Broyden’s algorithms in the case of a synchronous model. The communication costs between processors, scheduling and loads balancing, optimal processors assignment, and so on, both for synchronous and asynchronous models, are important issues in themselves. However they are not the subject of the present study.

The behavior of the sequence $\left\{{x}_{n}\right\}$ obtained by the proposed algorithms and for various linear or nonlinear cases is shown in our experiments by the graph of $ln{\epsilon}_{n}$, where ${\epsilon}_{n}$ is the error at step n, ${\epsilon}_{n}=\parallel F\left({x}_{n}\right)\parallel $ or ${\epsilon}_{n}=\parallel {x}_{n}-{x}^{*}\parallel $ (n on horizontal axis and $ln{\epsilon}_{n}$ on vertical axis). The convergence of the process entails a decreasing graph until negative values (for example, $ln{\epsilon}_{n}=-30$ means ${\epsilon}_{n}\approx {10}^{-15}$). If the convergence is linear ($r=1$) then $ln{\epsilon}_{n}\approx nln\varrho +ln{\epsilon}_{0}$; $ln{\epsilon}_{n}$ depends linearly on n and the graph of $ln{\epsilon}_{n}$ is close to a straight line. For comparison reasons, the proposed parallel Broyden’s method is compared with the parallel Newton method and the parallel Cimmino method. We consider the following parallel Newton method [8]:
where $J\left({x}_{n}\right)$ is the Jacobian of F. Note that in the case of a linear system, by applying this form of Newton method, the direct computation of ${x}^{*}$ (in one iteration) is lost and the convergence is linear. Consequently, the graph of $ln\left({\epsilon}_{n}\right)$ is a straight line.

$${x}_{n+1}={x}_{n}-D{\left({x}_{n}\right)}^{-1}F\left({x}_{n}\right),\phantom{\rule{4pt}{0ex}}D\left({x}_{n}\right)=dg\left(J\left({x}_{n}\right)\right),$$

The block Cimmino method is considered only for the linear case, $F\left(x\right)=Ax+b$, and it can be described as follows (see, for example, [9]). The system is partitioned into p strips of rows, ${A}^{i}$ and ${b}^{i}$, $1\le i\le p$. Supposing that ${A}^{i}$ has full row rank, the Moore–Penrose pseudo inverse of ${A}^{i}$ is defined by ${A}^{{i}^{+}}={A}^{{i}^{T}}{\left({A}^{i}{A}^{{i}^{T}}\right)}^{-1}$. The block Cimmino algorithm is

- Step 1: Compute in parallel ${Q}_{n}^{i}={A}^{{i}^{+}}({A}^{i}{x}_{n}+{b}^{i});$
- Step 2: ${x}_{n+1}={x}_{n}+\omega {\sum}_{i=1}^{p}{Q}_{n}^{i},$

We applied Algorithm 3 to linear systems of medium dimensions, $m=20,50,100,500,600$, with different values of condition numbers. A program “genA” generates a square, sparse matrix A of dimension m with random elements ${a}_{ij}\in (-1,1),\phantom{\rule{4pt}{0ex}}i\ne j$ and ${a}_{ii}\in (d1,d2)$, where $d1,d2$ are given (positive) numbers. The percent of nonzero elements is an entry data of “genA” and in this experiment we considered it to be about 50%. The free term b and the starting point ${x}_{0}$ were also randomly generated with elements between some limits. Thus the function F is randomly defined by $F\left(x\right)=Ax-b$.

For every considered system, we applied Algorithm 3, the parallel Newton method and the parallel Cimmino method, and we drew the resulting graphs in the same plot; for the graph of Algorithm 3 we used a continuous line, for the parallel Newton method a dashed line, and for the parallel Cimmino method a dotted line. The number of partitions and the dimensions of every partition are entry data of the routine “diag(A)”. The test systems were automatically generated by routine “genA"; the three graphs of Figure 1 are examples from a large number of cases, corresponding to certain systems of dimensions 50, 100, 200 respectively; other parameters like c (the condition number), θ (the factor in Step 4), $p({n}_{1},...,{n}_{p})$ (the number of partitions and their dimensions), were chosen as follows:

- (a)
- $m=50,\phantom{\rule{4pt}{0ex}}c=3.569,\phantom{\rule{4pt}{0ex}}\theta =0.02,\phantom{\rule{4pt}{0ex}}p(...)=5(11,9,13,11,6),\phantom{\rule{4pt}{0ex}}{E}_{0}=I$,
- (b)
- $m=100,\phantom{\rule{4pt}{0ex}}c=71.69,\phantom{\rule{4pt}{0ex}}\theta =0.02,\phantom{\rule{4pt}{0ex}}p(...)=5(21,19,23,21,16),\phantom{\rule{4pt}{0ex}}{E}_{0}=I$,
- (c)
- $m=200,\phantom{\rule{4pt}{0ex}}c=785,\phantom{\rule{4pt}{0ex}}\theta =0.02,\phantom{\rule{4pt}{0ex}}p(...)=5(41,39,43,41,36),\phantom{\rule{4pt}{0ex}}{E}_{0}=I$.

The following conclusions can be drawn. (1) The proposed parallel Broyden’s algorithm works well if the system is relatively well conditioned (the condition number should be around ten), and in this case the behavior of Algorithm 3 is very similar to the parallel Newton method and significantly better than the parallel Cimmino method. (Figure 1a); (2) For medium conditioned systems, the behavior of Algorithm 3 is unpredictable, sometimes it works better than the Newton method (Figure 1b); (3) If the system is poorly conditioned (the condition number is greater than 300) the proposed parallel Broyden’s algorithm fails to converge (the Newton method has the same behavior) (Figure 1c).

Figure 2a presents the behavior of the sequence generated by the Algorithm 3 for a linear system of dimension $m=500$, $c=4.366,\phantom{\rule{4pt}{0ex}}p(...)=5(101,51,151,101,96),\phantom{\rule{4pt}{0ex}}\theta =0.2,\phantom{\rule{4pt}{0ex}}{E}_{0}=I$. The condition (2) of Theorem 1 is not satisfied and the sequence generated by this algorithm (in this case) does not converge. However, the behavior is interesting, the generated sequence becomes very close to the solution, the error decreases until a very small value (in our example until ${10}^{-12}$), and then the good behavior is broken. The graph corresponding to the sequence generated by Algorithm 4 for a system of dimension $m=600$ and $c=3.942,\phantom{\rule{4pt}{0ex}}p(...)=6(101,51,151,101,101,95),\phantom{\rule{4pt}{0ex}}\theta =0.03,\phantom{\rule{4pt}{0ex}}{E}_{0}={(I+D)}^{-1}$ is presented in Figure 2b. We can observe the similarities between this sequence and the sequence obtained by the parallel Newton method.

This experiment is devoted to show the behavior of the sequences generated by Algorithm 5 in the case of nonlinear systems. We considered several nonlinear systems of low dimensions and of certain sparsity (every nonlinear equation depends on a few numbers of unknowns). Figure 3 presents the results for the following three nonlinear systems:

$$\left(1\right)\left\{\begin{array}{cc}& 4{x}_{1}^{2}-{x}_{2}{x}_{6}-3=0,\hfill \\ & {x}_{1}-5{x}_{2}+{x}_{3}^{2}+{x}_{7}=0,\hfill \\ & -{x}_{1}+5{x}_{3}-{x}_{4}^{2}-3=0,\hfill \\ & {x}_{1}{x}_{2}-4{x}_{4}+{x}_{5}^{2}+2=0,\hfill \\ & {x}_{4}-5{x}_{5}+2{x}_{6}=0,\hfill \\ & {x}_{1}-6{x}_{6}+4=0,\hfill \\ & {x}_{2}+{x}_{3}^{2}-4{x}_{7}+2=0.\hfill \end{array}\right.\phantom{\rule{4pt}{0ex}}\left(2\right)\left\{\begin{array}{cc}& 3{x}_{1}+{x}_{2}^{2}{x}_{3}=0,\hfill \\ & 2{x}_{2}+{x}_{3}-2{x}_{7}^{2}=0,\hfill \\ & {x}_{1}-0.2{x}_{2}{x}_{4}+3{x}_{3}=0,\hfill \\ & 3{x}_{4}-{x}_{6}^{2}=0,\hfill \\ & {x}_{2}+2{x}_{5}=0,\hfill \\ & {x}_{3}{x}_{7}+5{x}_{6}=0,\hfill \\ & {x}_{1}^{2}+{x}_{2}+{x}_{5}+4{x}_{7}=0.\hfill \end{array}\right.\phantom{\rule{4pt}{0ex}}\left(3\right)\left\{\begin{array}{cc}& 5\frac{{x}_{1}^{2}+{x}_{2}^{2}}{{x}_{1}^{2}+{x}_{2}^{2}+1}-\frac{10}{3}=0,\hfill \\ & 4\frac{{x}_{3}^{2}}{{x}_{3}^{2}+1}-2=0,\hfill \\ & 3\frac{{x}_{2}^{2}+{x}_{3}^{2}}{{x}_{2}^{2}+{x}_{3}^{2}+1}=0.\hfill \end{array}\right.$$

The proposed parallel Broyden’s method is compared with the parallel Newton method described above. The conclusion is that, generally, the parallel Broyden’s method has similar characteristics with the considered parallel Newton method, convergence properties, attraction basin, etc. This similarity can be seen in the case of the first system (Figure 3a). The third system is an example for which F is bounded on ${\mathcal{R}}^{3}$, as it is required in Section 3. Note that the condition $\parallel D{\left({x}_{n}\right)}^{-1}\parallel \parallel {E}_{n}\parallel <1$ is also satisfied.

This experiment is devoted to compare Algorithm 3 with the parallel variants proposed in [1] and [2,3]. Three linear systems of low dimensions (m = 5) with condition numbers equal to $8.598,\phantom{\rule{4pt}{0ex}}4.624,\phantom{\rule{4pt}{0ex}}3.701$ were considered as test systems. Every system was set into two partitions of dimensions 3 and 2 respectively.

The results are presented in Figure 4. We can conclude that the behavior of Algorithm 3 is satisfactory, as in all cases the generated sequence converges to the solution of the system and the convergence is linear. It is worthwhile to note the very good behavior of Algorithm 3 in comparison with the other two variants for cases (b) and (c): in case (b) the Algorithms 1 and 2 appear to have a very slow convergence, and in case (c) these algorithms fail to converge.

The authors are very grateful to the anonymous referees for valuable remarks and comments, which significantly contributed to the quality of the paper.

This research was supported of West University of Timisoara, Romania, Agreement of Academic and Research Activities, No. 12910/21.05.2015.

Stefan Maruster theoretical approach; Ioan Bistran and Liviu Octavian Mafteiu-Scai designed and performed the experiments.

The authors declare no conflict of interest.

- Xu, J.J. Convergence of partially asynchronous block quasi-Newton methods for nonlinear systems of equations. J. Comput. Appl. Math.
**1999**, 103, 307–321. [Google Scholar] [CrossRef] - Jiang, P.; Yang, G. Performance analysis of preconditioners based on Broyden method. Appl. Math. Comput.
**2006**, 178, 295–308. [Google Scholar] [CrossRef] - Yang, G.; Jiang, P. SSOR and ASSOR preconditioners for block Broyden method. Appl. Math. Comput.
**2007**, 188, 194–205. [Google Scholar] [CrossRef] - More, J.J.; Trangenstein, J.A. On the global convergence of Broyden’s method. Math. Comput.
**1976**, 30, 523–540. [Google Scholar] [CrossRef] - Gay, D.M. Some convergence properties of Broyden’s method. SIAM J. Numer. Anal.
**1979**, 16, 623–630. [Google Scholar] [CrossRef] - Dennis, J.E., Jr.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1983. [Google Scholar]
- Martinez, J.M. Algorithms for Solving Nonlinear Systems of Equations; Springer: Heidelberg, Germany, 1994. [Google Scholar]
- Lazar, I. On the convergence of asynchronous block Newton method for nonlinear systems of equations. Informatica
**2002**, 47, 75–84. [Google Scholar] - Balsa, C.; Guiverch, R.; Raimundo, J.; Ruiz, D. MUMPS Based Approach to Parallelize the Block Cimmino Algorithm. In Proceedings of the 8th International Meeting High Performance Computing for Computational Science, Toulouse, France, 1 Junuary 2008.

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).