Tải bản đầy đủ - 0 (trang)
2 Newton's Method and Its Variants

# 2 Newton's Method and Its Variants

Tải bản đầy đủ - 0trang

8.2. Newton’s Method and Variants

349

• Replace f by a simple function g that approximates f near x(0) and

whose zero is to be determined.

• Find x(1) as the solution of g(x) = 0.

Newton’s method needs the diﬀerentiability of f , and one chooses the

approximating aﬃne-linear function given by Df (x(0) ), i.e.,

g(x) = f x(0) + Df x(0) x − x(0) .

Under the assumption that Df (x(0) ) is nonsingular, the new iterate x(1) is

determined by solving the system of linear equations

Df x(0) x(1) − x(0) = −f x(0) ,

(8.10)

or formally by

x(1) := x(0) − Df x(0)

−1

f x(0) .

This suggests the following deﬁnition:

Φ(f )(x) = x − Df (x)−1 f (x) .

(8.11)

Here Φ is well-deﬁned only if Df (x) is nonsingular. Then x ∈ Rm is a zero

of f if and only if x is a ﬁxed point of Φ. When executing the iteration,

−1

we do not calculate Df x(k)

but only the system of equations similar

to (8.10).

Thus, the kth iteration of Newton’s method reads as follows: Solve

Df x(k) δ (k) = −f x(k)

(8.12)

x(k+1) := x(k) + δ (k) .

(8.13)

and set

Equation (8.13) has the same form as (5.10) with W = Df (x(k) ), with

the residual at x(k)

d(k) := f x(k) .

Thus the subproblem of the kth iteration is easier in the sense that it consists of a system of linear equations (with the same structure of dependence

as f ; see Exercise 8.6). In the same sense the system of equations (5.10)

in the case of linear stationary methods is “easier” to solve than the original problem of the same type. Furthermore, W is in general diﬀerent for

diﬀerent k.

An application of (8.12), (8.13) to Ax = b, i.e., Df (x) = A for all x ∈ Rm

results in (5.10) with W = A, a method converging in one step, which just

reformulates the original problem:

A x − x(0) = − Ax(0) − b .

350

8. Iterative Methods for Nonlinear Equations

The range of the iteration may be very small, as can be shown already

by one-dimensional examples. But in this neighbourhood of the solution

we have, e.g., for m = 1, the following:

¯ be a simple zero of f (i.e., f (¯

x) =

Corollary 8.9 Let f ∈ C 3 (R) and let x

0). Then Newton’s method converges locally to x¯, of order at least 2.

Proof: There exists an open neighbourhood V of x¯ such that f (x) = 0 for

all x ∈ V ; i.e., Φ is well-deﬁned by (8.11), continuous on V , and x

¯ is a ﬁxed

point of Φ. According to Corollary 8.8 it suﬃces to show that Φ (¯

x) = 0:

Φ (x) = 1 −

f (x)2 − f (x)f (x)

f (x)

= f (x)

=0

f (x)2

f (x)2

for x = x

¯,

and Φ exists continuously, because f ∈ C 3 (R).

In the following we want to develop a general local theorem of convergence for Newton’s method (according to L.V. Kantorovich). It necessitates

only the Lipschitz continuity of Df and ensures the existence of a zero, too.

Here we always suppose a ﬁxed norm on Rm and consider a compatible

norm on Rm,m . As a prerequisite we need the following lemma:

Lemma 8.10 Let C0 ⊂ Rm be convex, open, f : C0 → Rm diﬀerentiable,

and suppose there exists γ > 0 such that

Df (x) − Df (y) ≤ γ x − y

for all x, y ∈ C0 .

(8.14)

Then for all x, y ∈ C0 ,

f (x) − f (y) − Df (y)(x − y) ≤

γ

x−y

2

2

.

Proof: Let ϕ : [0, 1] → Rm be deﬁned by ϕ(t) := f (y + t(x − y)), for

arbitrary, ﬁxed x, y ∈ C0 . Then ϕ is diﬀerentiable on [0, 1] and

ϕ (t) = Df (y + t(x − y))(x − y) .

Thus for t ∈ [0, 1] we have

ϕ (t) − ϕ (0)

=

(Df (y + t(x − y)) − Df (y)) (x − y)

Df (y + t(x − y)) − Df (y)

x−y ≤ γt x−y

2

.

For

1

(ϕ (t) − ϕ (0)) dt

∆ := f (x)−f (y)−Df (y)(x−y) = ϕ(1)−ϕ(0)−ϕ (0) =

0

we also get

1

∆ ≤

1

ϕ (t) − ϕ (0) dt ≤ γ x − y

0

2

t dt =

0

1

γ x−y

2

2

.

8.2. Newton’s Method and Variants

351

Now we are able to conclude local, quadratic convergence:

Theorem 8.11 Let C ⊂ Rm be convex, open and f : C → Rm diﬀerentiable.

For x(0) ∈ C there exist α, β, γ > 0 such that

h := α β γ/2 < 1 ,

r := α/(1 − h) ,

¯r x(0) ⊂ C .

B

Furthermore, we require:

(i) Df is Lipschitz continuous on C0 = Br+ε x(0) for some ε > 0 with

constant γ in the sense of (8.14).

(ii) For all x ∈ Br x(0) there exists Df (x)−1 and Df (x)−1 ≤ β.

Df x(0)

(iii)

−1

f x(0)

≤ α.

Then:

(1) The Newton iteration

x(k+1) := x(k) − Df x(k)

−1

f x(k)

is well-deﬁned and

for all k ∈ N .

x(k) ∈ Br x(0)

(k)

(2) x

(3)

→x

¯ for k → ∞ and f (¯

x) = 0.

(k+1)

x

− x

¯

for k ∈ N .

βγ (k)

− x

¯

x

2

h2 −1

≤ α

1 − h2k

k

2

and

(k)

x

− x

¯

Proof: (1): To show that x(k+1) is well-deﬁned it is suﬃcient to verify

x(k) ∈ Br x(0) (⊂ C)

for all k ∈ N .

By induction we prove the extended proposition

x(k) ∈ Br x(0)

and

k−1

x(k) − x(k−1) ≤ α h2

−1

for all k ∈ N . (8.15)

The proposition (8.15) holds for k = 1, because according to (iii),

x(1) − x(0) = Df x(0)

−1

f x(0)

≤ α < r.

Let (8.15) be valid for l = 1, . . . , k. Then x(k+1) is well-deﬁned, and by the

application of the Newton iteration for k − 1 we get

x(k+1) − x(k) = Df x(k)

−1

f x(k)

≤ β f (x(k) )

= β f x(k) − f x(k−1) − Df x(k−1) x(k) − x(k−1)

βγ (k)

2

x − x(k−1)

2

352

8. Iterative Methods for Nonlinear Equations

according to Lemma 8.10 with C0 = Br x(0) , and

k

βγ (k)

βγ 2 2k −2

2

x − x(k−1) ≤

α h

= αh2 −1 .

2

2

Thus the second part of (8.15) holds for k + 1, and also

x(k+1) − x(k) ≤

≤ x(k+1) − x(k) + x(k) − x(k−1) + · · · + x(1) − x(0)

x(k+1) − x(0)

≤ α h2 −1 + h2

< α/(1 − h) = r .

k

k−1

−1

+ · · · + h7 + h3 + h + 1

Hence (8.15) holds for k + 1.

(2): Using (8.15) we are able to verify that (x(k) )k is a Cauchy sequence,

because for l ≥ k we have

≤ x(l+1) − x(l) + x(l) − x(l−1) + · · · + x(k+1) − x(k)

x(l+1) − x(k)

k

≤ α h2

<

αh

−1

2k −1

k

1 − h2

k

3

k

1 + h2 + h2

→0

+ ···

(8.16)

for k → ∞ ,

¯ ∈ B r x(0) .

since h < 1. Hence there exists x

¯ = limk→∞ x(k) and x

Furthermore, f (¯

x) = 0, because we can conclude from x(k) ∈ Br x(0)

that

Df x(k) − Df x(0)

≤ γ x(k) − x(0) < γr ;

thus

≤ γr + Df x(0)

Df x(k)

(k)

and from f x

= −Df x

(k)

f x(k)

(k+1)

x

−x

(k)

=: K

, we obtain

≤ K x(k+1) − x(k) → 0

for k → ∞. Thus we also have

f (¯

x) = lim f (x(k) ) = 0 .

k→∞

(3): With l → ∞ in (8.16) we can prove the second part in (3); the ﬁrst

part follows from

x(k+1) − x¯

= x(k) − Df x(k)

−1

f x(k) − x¯

= x(k) − x

¯ − Df x(k)

= Df x(k)

−1

−1

f x(k) − f (¯

x)

f (¯

x) − f x(k) − Df x(k) x¯ − x(k)

which implies, according to Lemma 8.10 with C0 = Br+ε x(0) ⊂ C,

γ (k)

2

x(k+1) − x¯ ≤ β

x −x

¯ .

2

,

8.2. Newton’s Method and Variants

353

The termination criterion (5.15), which is oriented at the residual, may

also be used for the nonlinear problem (and not just for the Newton

iteration). This can be deduced in analogy to (5.16):

Theorem 8.12 Let the following be valid:

There exists a zero x

¯ of f such that Df (¯

x) is nonsingular and

Df is Lipschitz continuous in an open neighbourhood C of x¯.

Then for every

f (y)

(8.17)

> 0 there exists a δ > 0 such that for x, y ∈ Bδ (¯

x),

x − x¯ ≤ (1 + ) κ(Df (¯

x)) f (x)

y − x¯ .

Proof: See [22, p. 69, p. 72] and Exercise 8.4.

Here κ is the condition number in a matrix norm that is consistent with

the chosen vector norm. For x = x(k) and y = x(0) we get (locally) the

generalization of (5.16).

8.2.2 Modiﬁcations of Newton’s Method

Modiﬁcations of Newton’s method aim in two directions:

• Reduction of the cost of the assembling and the solution of the system of equations (8.12) (without a signiﬁcant deterioration of the

properties of convergence).

• Enlargement of the range of convergence.

We can account for the ﬁrst aspect by simplifying the matrix in (8.12)

(modiﬁed or simpliﬁed Newton’s method). The extreme case is the replacement of Df x(k) by the identity matrix; this leads us to the ﬁxed-point

iteration (8.9). If the mapping f consists of a nonlinear and a linear part,

f (x) := Ax + g(x) = 0 ,

(8.18)

then the system of equations (8.12) of the Newton iteration reads as

A + Dg x(k)

δ (k) = −f x(k) .

A straightforward simpliﬁcation in this case is the ﬁxed-point iteration

A δ (k) = −f x(k) .

(8.19)

It may be interpreted as the ﬁxed-point iteration (8.9) of the system that

is preconditioned with A, i.e., of

A−1 f (x) = 0.

In (8.19) the matrix is identical in every iteration step; therefore, it has to

be assembled only once, and if we use a direct method (cf. Section 2.5),

the LU factorization has to be carried out only once. Thus with forward

354

8. Iterative Methods for Nonlinear Equations

and backward substitution we have only to perform methods with lower

computational cost. For iterative methods we cannot rely on this advantage,

but we can expect that x(k+1) is close to x(k) , and consequently δ (k,0) = 0

constitutes a good initial guess. Accordingly, the assembling of the matrix

gets more important with respect to the overall computational cost, and

savings during the assembling become relevant.

We get a system of equations similar to (8.19) by applying the chord

method (see Exercise 8.3), where the linear approximation of the initial

iterate is maintained, i.e.,

Df x(0) δ (k) = −f x(k) .

(8.20)

If the matrix B x(k) , which approximates Df x(k) , is changing in each

iteration step, i.e.,

B x(k) δ (k) = −f x(k) ,

(8.21)

then the only advantage can be a possibly easier assembling or solvability of

the system of equations. If the partial derivatives ∂j fi (x) are more diﬃcult

to evaluate than the function fi (y) itself (or possibly not evaluable at all),

then the approximation of Df (x(k) ) by diﬀerence quotients can be taken

into consideration. This corresponds to

B x(k) ej =

1

f (x + hej ) − f (x)

h

(8.22)

for column j of B x(k) with a ﬁxed h > 0. The number of computations for

the assembling of the matrix remains the same: m2 for the full matrix and

analogously for the sparse matrix (see Exercise 8.6). Observe that numerical

diﬀerentiation is an ill-posed problem, which means that we should ideally

choose h ∼ δ 1/2 , where δ > 0 is the error level in the evaluation of f . Even

then we can merely expect

Df x(k) − B x(k)

≤ Cδ 1/2

(see [22, pp. 80 f.]). Thus in the best case we can expect only half of the

signiﬁcant digits of the machine precision. The second aspect of facilitated

solvability of (8.21) occurs if there appear “small” entries in the Jacobian, due to a problem-dependent weak coupling of the components, and

these entries may be skipped. Take, for example, a Df x(k) with a block

structure as in (5.39):

Df x(k) = Aij

ij

,

Aij ∈ Rmi ,mj ,

such that the blocks Aij may be neglected for j > i. Then there results a

nested system of equations of the dimensions m1 , m2 , . . . , mp .

The possible advantages of such simpliﬁed Newton’s methods have to

be weighted against the disadvantage of a deterioration in the order of

convergence: Instead of an estimation like that in Theorem 8.11, (3), we

8.2. Newton’s Method and Variants

355

have to expect an additional term

B x(k) − Df x(k)

x(k) − x .

This means only linear or — by successive improvement of the approximation — superlinear convergence (see [22, pp. 75 ﬀ.]). If we have a good

initial iterate, it may often be advantageous to perform a small number of

steps of Newton’s method. So in the following we will treat again Newton’s

method, although the subsequent considerations can also be transferred to

its modiﬁcations.

If the linear problems (8.12) are solved with an iterative scheme, we

have the possibility to adjust the accuracy of the algorithm in order to

reduce the number of inner iterations, without a (severe) deterioration of

the convergence of the outer iteration of the Newton iteration. So dealing

with such inexact Newton’s methods, we determine instead of δ (k) from

(8.12) only δ˜(k) , which fulﬁls (8.12) only up to an inner residual r(k) , i.e.,

Df x(k) δ˜(k) = −f x(k) + r(k) .

The new iterate is given by

x(k+1) := x(k) + δ˜(k) .

The accuracy of δ˜(k) is estimated by the requirement

r(k) ≤ ηk f x(k)

(8.23)

with certain properties for the sequence (ηk )k that still have to be determined. Since the natural choice of the initial iterate for solving (8.12)

is δ (k,0) = 0, (8.23) corresponds to the termination criterion (5.15).

Conditions for ηk can be deduced from the following theorem:

Theorem 8.13 Let (8.17) hold and consider compatible matrix and vector

x),

norms. Then there exists for every > 0 a δ > 0 such that for x(k) ∈ Bδ (¯

x(k+1) − x¯

x(k) − Df x(k)

−1

¯

f x(k) − x

+ (1 + ) κ Df (¯

x) ηk x(k) − x¯ .

(8.24)

Proof: By the choice of δ we can ensure the nonsingularity of Df (x(k) ).

From

δ˜(k) = −Df x(k)

−1

f x(k) + Df x(k)

−1 (k)

r

it follows that

x(k+1) − x

¯

=

x(k) − x

¯ + δ˜(k)

x(k) − x

¯ − Df x(k)

−1

f x(k)

The assertion can be deduced from the estimation

Df x(k)

−1 (k)

r

≤ (1 + )1/2 Df (¯

x)−1

r(k)

+ Df x(k)

−1 (k)

r

.

356

8. Iterative Methods for Nonlinear Equations

x)−1 ηk (1 + )1/2 Df (¯

x)

(1 + )1/2 Df (¯

x(k) − x

¯ .

Here we used Exercise 8.4 (2), (3) and (8.23).

The ﬁrst part of the approximation corresponds to the error of the exact Newton step, which can be estimated using the same argument as in

Theorem 8.11, (3) (with Exercise 8.4, (2)) by

x(k) − Df x(k)

−1

x)−1

f x(k) − x¯ ≤ (1 + )1/2 Df (¯

γ (k)

x −x

¯

2

2

.

This implies the following result:

Corollary 8.14 Let the assumptions of Theorem 8.13 be satisﬁed. Then

there exist δ > 0 and η¯ > 0 such that for x(0) ∈ Bδ (¯

x) and ηk ≤ η¯ for all

k ∈ N for the inexact Newton’s method the following hold:

(1) The sequence x(k)

k

converges linearly to x

¯.

(2) If ηk → 0 for k → ∞, then x(k)

(3) If ηk ≤ K f x(k)

ly.

k

converges superlinearly.

for a K > 0, then x(k)

k

Proof: Exercise 8.5.

The estimation (8.24) suggests that we carefully choose a very ﬁne level

of accuracy η¯ of the inner iteration to guarantee the above statements of

convergence. This is particularly true for ill-conditioned Df (¯

x) (which is

common for discretization matrices: See (5.34)). In fact, the analysis in the

weighted norm · = Df (¯

x) · shows that only ηk ≤ η¯ < 1 has to be

ensured (cf. [22, pp. 97 ﬀ.]). With this and on the basis of

η˜k = α f x(k)

2

/ f x(k−1)

2

for some α ≤ 1 we can construct ηk in an adaptive way (see [22, p. 105]).

Most of the iterative methods introduced in Chapter 5 do not require the

explicit knowledge of the matrix Df x(k) . It suﬃces that the operation

Df x(k) y be feasible for vectors y, in general for fewer than m of them;

i.e., the directional derivative of f in x(k) in direction y is needed. Thus

in case a diﬀerence scheme for the derivatives of f should be necessary or

reasonable, it is more convenient to choose directly a diﬀerence scheme for

the directional derivative.

Since we cannot expect convergence of Newton’s method in general,

we require indicators for the convergence behaviour of the iteration. The

solution x

¯ is in particular also the solution of

Minimize

f (x)

2

for x ∈ Rm .

8.2. Newton’s Method and Variants

357

¯ ∈ (0, 1), k = 0, i = 0 be given.

Let x(0) , τ > 0, η0 , Θ

(1)

δ˜(k,0) := 0 , i := 1 .

(2) Determine the ith iterate δ˜(k,i) for Df (x(k) )δ˜(k) = −f (x(k) )

and calculate

r(i) := Df (x(k) )δ˜(k,i) + f (x(k) ) .

(3) If r(i) ≤ ηk f (x(k) ) , then go to (4),

else set i := i + 1 and go to (2).

(4)

δ˜(k) := δ˜(k,i) .

(5)

x(k+1) := x(k) + δ˜(k) .

(6) If f (x(k+1) ) > Θ f (x(k) ) , interrupt.

(7) If f (x(k+1) ) ≤ τ f (x(0) ) , end.

Else calculate ηk+1 , set k := k + 1, and go to (1).

Table 8.1. Inexact Newton’s method with monotonicity test.

Thus we could expect a descent of the sequence of iterates (x(k) ) in this

functional, i.e.,

¯ f (x(k) )

f (x(k+1) ) ≤ Θ

¯ < 1.

for a Θ

If this monotonicity test is not fulﬁlled, the iteration is terminated. Such

an example of an inexact Newton’s method is given in Table 8.1.

In order to avoid the termination of the method due to divergence, the

continuation methods have been developed. They attribute the problem

f (x) = 0 to a family of problems to provide successively good initial iterates. The approach presented at the end of Section 8.3 for time-dependent

problems is similar to the continuation methods. Another approach (which

can be combined with the latter) modiﬁes the (inexact) Newton’s method,

so that the range of convergence is enlarged: Applying the damped (inexact) Newton’s method means reducing the step length of x(k) to x(k+1) as

long as we observe a decrease conformable to the monotonicity test. One

strategy of damping, termed Armijo’s rule, is described in Table 8.2 and

replaces the steps (1), (5), and (6) in Table 8.1.

Thus damping Newton’s method means also a relaxation similar to

(5.30), where ω = λk is being adjusted to the iteration step as in (5.41).

In the formulation of Table 8.2 the iteration may eventually not terminate

if in (5) λk is successively reduced. This must be avoided in a practical

implementation of the method. But except for situations where divergence

is obvious, this situation will not appear, because we have the following

theorem:

358

8. Iterative Methods for Nonlinear Equations

Let additionally α, β ∈ (0, 1) be given.

δ˜(k,0) := 0 , i := 1 , λk := 1.

(1)

(5) If f (x(k) + λk δ˜(k) ) ≥ (1 − αλk ) f (x(k) ) , set λk := βλk

and go to (5).

x(k+1) := x(k) + λk δ˜(k) .

(6)

Table 8.2. Damped inexact Newton step according to Armijo’s rule.

Theorem 8.15 Let α, β, γ > 0 exist such that conditions (i), (ii) of Theorem 8.11 on k∈N Br (x(k) ) hold for the sequence (x(k) )k deﬁned according

to Table 8.2. Let ηk ≤ η¯ for an η¯ < 1 − α.

¯ for all k ∈ N. If

¯ > 0 such that λk ≥ λ

Then if f x(0) = 0, there exists a λ

¯, satisfying (8.17)

furthermore x(k) k is bounded, then there exists a zero x

and

x(k) → x¯

for k → ∞ .

There exists a k0 ∈ N such that for k ≥ k0 the relation

λk = 1

holds.

Proof: See [22, pp. 139 ﬀ.].

We see that in the ﬁnal stage of the iteration we again deal with the

(inexact) Newton’s method with the previously described behaviour of

convergence.

Finally, the following should be mentioned: The problem f (x) = 0 and

Newton’s method are aﬃne-invariant in the sense that a transition to

Af (x) = 0 with a nonsingular A ∈ Rm,m changes neither the problem

nor the iteration method, since

D(Af )(x)−1 Af (x) = Df (x)−1 f (x) .

Among the assumptions of Theorem 8.11, (8.14) is not aﬃne-invariant. A

possible alternative would be

Df (y)−1 (Df (x) − Df (y)) ≤ γ x − y ,

which fulﬁls the requirement. With the proof of Lemma 8.10 it follows that

γ

x−y 2.

Df (y)−1 (f (x) − f (y) − Df (y)(x − y)) ≤

2

With this argument a similar variant of Theorem 8.11 can be proven.

8.2. Newton’s Method and Variants

359

The test of monotonicity is not aﬃne-invariant, so probably the natural

test of monotonicity

Df x(k)

−1

f x(k+1)

¯ Df x(k)

≤Θ

−1

f x(k)

has to be preferred. The vector on the right-hand side has already been

calculated, being, except for the sign, the Newton correction δ (k) . But for

the vector in the left-hand side, −δ¯(k+1) , the system of equations

Df x(k) δ¯(k+1) = −f x(k+1)

Exercises

8.3 Consider the chord method as described in (8.20). Prove the

convergence of this method to the solution x

¯ under the following

assumptions:

x) ⊂ C hold,

(1) Let (8.17) with B r (¯

(2)

Df (x(0) )

−1

≤β,

(3) 2βγr < 1 ,

(4) x(0) ∈ B r (¯

x) .

8.4 Let assumption (8.17) hold. Prove for compatible matrix and vector

norms that for every > 0 there exists a δ > 0 such that for every x ∈

x),

Bδ (¯

x) ,

(1) Df (x) ≤ (1 + )1/2 Df (¯

(2) Df (x)−1 ≤ (1 + )1/2 Df (¯

x)−1

(employ

(I − M )−1 ≤ 1/(1 − M )

(3) (1 + )−1/2 Df (¯

x)−1

≤ (1 + )

1/2

−1

Df (¯

x)

for M < 1),

x − x¯ ≤ f (x)

x − x¯ ,

(4) Theorem 8.12.

8.5 Prove Corollary 8.14.

8.6 Let U ⊂ Rm be open and convex. Consider problem (8.2) with continuously diﬀerentiable f : U → Rm . For i = 1, . . . , m let Ji ⊂ {1, . . . , m}

be deﬁned by

/ Ji and every x ∈ U .

∂j fi (x) = 0 for j ∈

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2 Newton's Method and Its Variants

Tải bản đầy đủ ngay(0 tr)

×