Tải bản đầy đủ - 0 (trang)
2 Newton's Method and Its Variants

2 Newton's Method and Its Variants

Tải bản đầy đủ - 0trang

8.2. Newton’s Method and Variants



349



• Replace f by a simple function g that approximates f near x(0) and

whose zero is to be determined.

• Find x(1) as the solution of g(x) = 0.

Newton’s method needs the differentiability of f , and one chooses the

approximating affine-linear function given by Df (x(0) ), i.e.,

g(x) = f x(0) + Df x(0) x − x(0) .

Under the assumption that Df (x(0) ) is nonsingular, the new iterate x(1) is

determined by solving the system of linear equations

Df x(0) x(1) − x(0) = −f x(0) ,



(8.10)



or formally by

x(1) := x(0) − Df x(0)



−1



f x(0) .



This suggests the following definition:

Φ(f )(x) = x − Df (x)−1 f (x) .



(8.11)



Here Φ is well-defined only if Df (x) is nonsingular. Then x ∈ Rm is a zero

of f if and only if x is a fixed point of Φ. When executing the iteration,

−1

we do not calculate Df x(k)

but only the system of equations similar

to (8.10).

Thus, the kth iteration of Newton’s method reads as follows: Solve

Df x(k) δ (k) = −f x(k)



(8.12)



x(k+1) := x(k) + δ (k) .



(8.13)



and set



Equation (8.13) has the same form as (5.10) with W = Df (x(k) ), with

the residual at x(k)

d(k) := f x(k) .

Thus the subproblem of the kth iteration is easier in the sense that it consists of a system of linear equations (with the same structure of dependence

as f ; see Exercise 8.6). In the same sense the system of equations (5.10)

in the case of linear stationary methods is “easier” to solve than the original problem of the same type. Furthermore, W is in general different for

different k.

An application of (8.12), (8.13) to Ax = b, i.e., Df (x) = A for all x ∈ Rm

results in (5.10) with W = A, a method converging in one step, which just

reformulates the original problem:

A x − x(0) = − Ax(0) − b .



350



8. Iterative Methods for Nonlinear Equations



The range of the iteration may be very small, as can be shown already

by one-dimensional examples. But in this neighbourhood of the solution

we have, e.g., for m = 1, the following:

¯ be a simple zero of f (i.e., f (¯

x) =

Corollary 8.9 Let f ∈ C 3 (R) and let x

0). Then Newton’s method converges locally to x¯, of order at least 2.

Proof: There exists an open neighbourhood V of x¯ such that f (x) = 0 for

all x ∈ V ; i.e., Φ is well-defined by (8.11), continuous on V , and x

¯ is a fixed

point of Φ. According to Corollary 8.8 it suffices to show that Φ (¯

x) = 0:

Φ (x) = 1 −



f (x)2 − f (x)f (x)

f (x)

= f (x)

=0

f (x)2

f (x)2



for x = x

¯,



and Φ exists continuously, because f ∈ C 3 (R).







In the following we want to develop a general local theorem of convergence for Newton’s method (according to L.V. Kantorovich). It necessitates

only the Lipschitz continuity of Df and ensures the existence of a zero, too.

Here we always suppose a fixed norm on Rm and consider a compatible

norm on Rm,m . As a prerequisite we need the following lemma:

Lemma 8.10 Let C0 ⊂ Rm be convex, open, f : C0 → Rm differentiable,

and suppose there exists γ > 0 such that

Df (x) − Df (y) ≤ γ x − y



for all x, y ∈ C0 .



(8.14)



Then for all x, y ∈ C0 ,

f (x) − f (y) − Df (y)(x − y) ≤



γ

x−y

2



2



.



Proof: Let ϕ : [0, 1] → Rm be defined by ϕ(t) := f (y + t(x − y)), for

arbitrary, fixed x, y ∈ C0 . Then ϕ is differentiable on [0, 1] and

ϕ (t) = Df (y + t(x − y))(x − y) .

Thus for t ∈ [0, 1] we have

ϕ (t) − ϕ (0)



=



(Df (y + t(x − y)) − Df (y)) (x − y)







Df (y + t(x − y)) − Df (y)



x−y ≤ γt x−y



2



.



For

1



(ϕ (t) − ϕ (0)) dt



∆ := f (x)−f (y)−Df (y)(x−y) = ϕ(1)−ϕ(0)−ϕ (0) =

0



we also get

1



∆ ≤



1



ϕ (t) − ϕ (0) dt ≤ γ x − y

0



2



t dt =

0



1

γ x−y

2



2



.





8.2. Newton’s Method and Variants



351



Now we are able to conclude local, quadratic convergence:

Theorem 8.11 Let C ⊂ Rm be convex, open and f : C → Rm differentiable.

For x(0) ∈ C there exist α, β, γ > 0 such that

h := α β γ/2 < 1 ,

r := α/(1 − h) ,

¯r x(0) ⊂ C .

B

Furthermore, we require:

(i) Df is Lipschitz continuous on C0 = Br+ε x(0) for some ε > 0 with

constant γ in the sense of (8.14).

(ii) For all x ∈ Br x(0) there exists Df (x)−1 and Df (x)−1 ≤ β.

Df x(0)



(iii)



−1



f x(0)



≤ α.



Then:

(1) The Newton iteration

x(k+1) := x(k) − Df x(k)



−1



f x(k)



is well-defined and

for all k ∈ N .



x(k) ∈ Br x(0)

(k)



(2) x

(3)



→x

¯ for k → ∞ and f (¯

x) = 0.



(k+1)



x



− x

¯



for k ∈ N .



βγ (k)



− x

¯

x

2



h2 −1

≤ α

1 − h2k

k



2



and



(k)



x



− x

¯



Proof: (1): To show that x(k+1) is well-defined it is sufficient to verify

x(k) ∈ Br x(0) (⊂ C)



for all k ∈ N .



By induction we prove the extended proposition

x(k) ∈ Br x(0)



and



k−1



x(k) − x(k−1) ≤ α h2



−1



for all k ∈ N . (8.15)



The proposition (8.15) holds for k = 1, because according to (iii),

x(1) − x(0) = Df x(0)



−1



f x(0)



≤ α < r.



Let (8.15) be valid for l = 1, . . . , k. Then x(k+1) is well-defined, and by the

application of the Newton iteration for k − 1 we get

x(k+1) − x(k) = Df x(k)



−1



f x(k)



≤ β f (x(k) )



= β f x(k) − f x(k−1) − Df x(k−1) x(k) − x(k−1)

βγ (k)

2



x − x(k−1)

2



352



8. Iterative Methods for Nonlinear Equations



according to Lemma 8.10 with C0 = Br x(0) , and

k

βγ (k)

βγ 2 2k −2

2

x − x(k−1) ≤

α h

= αh2 −1 .

2

2

Thus the second part of (8.15) holds for k + 1, and also



x(k+1) − x(k) ≤



≤ x(k+1) − x(k) + x(k) − x(k−1) + · · · + x(1) − x(0)



x(k+1) − x(0)



≤ α h2 −1 + h2

< α/(1 − h) = r .

k



k−1



−1



+ · · · + h7 + h3 + h + 1



Hence (8.15) holds for k + 1.

(2): Using (8.15) we are able to verify that (x(k) )k is a Cauchy sequence,

because for l ≥ k we have

≤ x(l+1) − x(l) + x(l) − x(l−1) + · · · + x(k+1) − x(k)



x(l+1) − x(k)



k



≤ α h2

<



αh



−1



2k −1

k



1 − h2



k



3



k



1 + h2 + h2

→0



+ ···



(8.16)



for k → ∞ ,



¯ ∈ B r x(0) .

since h < 1. Hence there exists x

¯ = limk→∞ x(k) and x

Furthermore, f (¯

x) = 0, because we can conclude from x(k) ∈ Br x(0)

that

Df x(k) − Df x(0)



≤ γ x(k) − x(0) < γr ;



thus

≤ γr + Df x(0)



Df x(k)

(k)



and from f x



= −Df x



(k)



f x(k)



(k+1)



x



−x



(k)



=: K



, we obtain



≤ K x(k+1) − x(k) → 0



for k → ∞. Thus we also have

f (¯

x) = lim f (x(k) ) = 0 .

k→∞



(3): With l → ∞ in (8.16) we can prove the second part in (3); the first

part follows from

x(k+1) − x¯



= x(k) − Df x(k)



−1



f x(k) − x¯



= x(k) − x

¯ − Df x(k)

= Df x(k)



−1



−1



f x(k) − f (¯

x)



f (¯

x) − f x(k) − Df x(k) x¯ − x(k)



which implies, according to Lemma 8.10 with C0 = Br+ε x(0) ⊂ C,

γ (k)

2

x(k+1) − x¯ ≤ β

x −x

¯ .

2



,







8.2. Newton’s Method and Variants



353



The termination criterion (5.15), which is oriented at the residual, may

also be used for the nonlinear problem (and not just for the Newton

iteration). This can be deduced in analogy to (5.16):

Theorem 8.12 Let the following be valid:

There exists a zero x

¯ of f such that Df (¯

x) is nonsingular and

Df is Lipschitz continuous in an open neighbourhood C of x¯.

Then for every

f (y)



(8.17)



> 0 there exists a δ > 0 such that for x, y ∈ Bδ (¯

x),

x − x¯ ≤ (1 + ) κ(Df (¯

x)) f (x)



y − x¯ .





Proof: See [22, p. 69, p. 72] and Exercise 8.4.



Here κ is the condition number in a matrix norm that is consistent with

the chosen vector norm. For x = x(k) and y = x(0) we get (locally) the

generalization of (5.16).



8.2.2 Modifications of Newton’s Method

Modifications of Newton’s method aim in two directions:

• Reduction of the cost of the assembling and the solution of the system of equations (8.12) (without a significant deterioration of the

properties of convergence).

• Enlargement of the range of convergence.

We can account for the first aspect by simplifying the matrix in (8.12)

(modified or simplified Newton’s method). The extreme case is the replacement of Df x(k) by the identity matrix; this leads us to the fixed-point

iteration (8.9). If the mapping f consists of a nonlinear and a linear part,

f (x) := Ax + g(x) = 0 ,



(8.18)



then the system of equations (8.12) of the Newton iteration reads as

A + Dg x(k)



δ (k) = −f x(k) .



A straightforward simplification in this case is the fixed-point iteration

A δ (k) = −f x(k) .



(8.19)



It may be interpreted as the fixed-point iteration (8.9) of the system that

is preconditioned with A, i.e., of

A−1 f (x) = 0.

In (8.19) the matrix is identical in every iteration step; therefore, it has to

be assembled only once, and if we use a direct method (cf. Section 2.5),

the LU factorization has to be carried out only once. Thus with forward



354



8. Iterative Methods for Nonlinear Equations



and backward substitution we have only to perform methods with lower

computational cost. For iterative methods we cannot rely on this advantage,

but we can expect that x(k+1) is close to x(k) , and consequently δ (k,0) = 0

constitutes a good initial guess. Accordingly, the assembling of the matrix

gets more important with respect to the overall computational cost, and

savings during the assembling become relevant.

We get a system of equations similar to (8.19) by applying the chord

method (see Exercise 8.3), where the linear approximation of the initial

iterate is maintained, i.e.,

Df x(0) δ (k) = −f x(k) .



(8.20)



If the matrix B x(k) , which approximates Df x(k) , is changing in each

iteration step, i.e.,

B x(k) δ (k) = −f x(k) ,



(8.21)



then the only advantage can be a possibly easier assembling or solvability of

the system of equations. If the partial derivatives ∂j fi (x) are more difficult

to evaluate than the function fi (y) itself (or possibly not evaluable at all),

then the approximation of Df (x(k) ) by difference quotients can be taken

into consideration. This corresponds to

B x(k) ej =



1

f (x + hej ) − f (x)

h



(8.22)



for column j of B x(k) with a fixed h > 0. The number of computations for

the assembling of the matrix remains the same: m2 for the full matrix and

analogously for the sparse matrix (see Exercise 8.6). Observe that numerical

differentiation is an ill-posed problem, which means that we should ideally

choose h ∼ δ 1/2 , where δ > 0 is the error level in the evaluation of f . Even

then we can merely expect

Df x(k) − B x(k)



≤ Cδ 1/2



(see [22, pp. 80 f.]). Thus in the best case we can expect only half of the

significant digits of the machine precision. The second aspect of facilitated

solvability of (8.21) occurs if there appear “small” entries in the Jacobian, due to a problem-dependent weak coupling of the components, and

these entries may be skipped. Take, for example, a Df x(k) with a block

structure as in (5.39):

Df x(k) = Aij



ij



,



Aij ∈ Rmi ,mj ,



such that the blocks Aij may be neglected for j > i. Then there results a

nested system of equations of the dimensions m1 , m2 , . . . , mp .

The possible advantages of such simplified Newton’s methods have to

be weighted against the disadvantage of a deterioration in the order of

convergence: Instead of an estimation like that in Theorem 8.11, (3), we



8.2. Newton’s Method and Variants



355



have to expect an additional term

B x(k) − Df x(k)



x(k) − x .



This means only linear or — by successive improvement of the approximation — superlinear convergence (see [22, pp. 75 ff.]). If we have a good

initial iterate, it may often be advantageous to perform a small number of

steps of Newton’s method. So in the following we will treat again Newton’s

method, although the subsequent considerations can also be transferred to

its modifications.

If the linear problems (8.12) are solved with an iterative scheme, we

have the possibility to adjust the accuracy of the algorithm in order to

reduce the number of inner iterations, without a (severe) deterioration of

the convergence of the outer iteration of the Newton iteration. So dealing

with such inexact Newton’s methods, we determine instead of δ (k) from

(8.12) only δ˜(k) , which fulfils (8.12) only up to an inner residual r(k) , i.e.,

Df x(k) δ˜(k) = −f x(k) + r(k) .

The new iterate is given by

x(k+1) := x(k) + δ˜(k) .

The accuracy of δ˜(k) is estimated by the requirement

r(k) ≤ ηk f x(k)



(8.23)



with certain properties for the sequence (ηk )k that still have to be determined. Since the natural choice of the initial iterate for solving (8.12)

is δ (k,0) = 0, (8.23) corresponds to the termination criterion (5.15).

Conditions for ηk can be deduced from the following theorem:

Theorem 8.13 Let (8.17) hold and consider compatible matrix and vector

x),

norms. Then there exists for every > 0 a δ > 0 such that for x(k) ∈ Bδ (¯

x(k+1) − x¯







x(k) − Df x(k)



−1



¯

f x(k) − x



+ (1 + ) κ Df (¯

x) ηk x(k) − x¯ .



(8.24)



Proof: By the choice of δ we can ensure the nonsingularity of Df (x(k) ).

From

δ˜(k) = −Df x(k)



−1



f x(k) + Df x(k)



−1 (k)



r



it follows that

x(k+1) − x

¯



=



x(k) − x

¯ + δ˜(k)







x(k) − x

¯ − Df x(k)



−1



f x(k)



The assertion can be deduced from the estimation

Df x(k)



−1 (k)



r



≤ (1 + )1/2 Df (¯

x)−1



r(k)



+ Df x(k)



−1 (k)



r



.



356



8. Iterative Methods for Nonlinear Equations







x)−1 ηk (1 + )1/2 Df (¯

x)

(1 + )1/2 Df (¯



x(k) − x

¯ .





Here we used Exercise 8.4 (2), (3) and (8.23).



The first part of the approximation corresponds to the error of the exact Newton step, which can be estimated using the same argument as in

Theorem 8.11, (3) (with Exercise 8.4, (2)) by

x(k) − Df x(k)



−1



x)−1

f x(k) − x¯ ≤ (1 + )1/2 Df (¯



γ (k)

x −x

¯

2



2



.



This implies the following result:

Corollary 8.14 Let the assumptions of Theorem 8.13 be satisfied. Then

there exist δ > 0 and η¯ > 0 such that for x(0) ∈ Bδ (¯

x) and ηk ≤ η¯ for all

k ∈ N for the inexact Newton’s method the following hold:

(1) The sequence x(k)



k



converges linearly to x

¯.



(2) If ηk → 0 for k → ∞, then x(k)

(3) If ηk ≤ K f x(k)

ly.



k



converges superlinearly.



for a K > 0, then x(k)



k



converges quadratical-







Proof: Exercise 8.5.



The estimation (8.24) suggests that we carefully choose a very fine level

of accuracy η¯ of the inner iteration to guarantee the above statements of

convergence. This is particularly true for ill-conditioned Df (¯

x) (which is

common for discretization matrices: See (5.34)). In fact, the analysis in the

weighted norm · = Df (¯

x) · shows that only ηk ≤ η¯ < 1 has to be

ensured (cf. [22, pp. 97 ff.]). With this and on the basis of

η˜k = α f x(k)



2



/ f x(k−1)



2



for some α ≤ 1 we can construct ηk in an adaptive way (see [22, p. 105]).

Most of the iterative methods introduced in Chapter 5 do not require the

explicit knowledge of the matrix Df x(k) . It suffices that the operation

Df x(k) y be feasible for vectors y, in general for fewer than m of them;

i.e., the directional derivative of f in x(k) in direction y is needed. Thus

in case a difference scheme for the derivatives of f should be necessary or

reasonable, it is more convenient to choose directly a difference scheme for

the directional derivative.

Since we cannot expect convergence of Newton’s method in general,

we require indicators for the convergence behaviour of the iteration. The

solution x

¯ is in particular also the solution of

Minimize



f (x)



2



for x ∈ Rm .



8.2. Newton’s Method and Variants



357



¯ ∈ (0, 1), k = 0, i = 0 be given.

Let x(0) , τ > 0, η0 , Θ

(1)



δ˜(k,0) := 0 , i := 1 .



(2) Determine the ith iterate δ˜(k,i) for Df (x(k) )δ˜(k) = −f (x(k) )

and calculate

r(i) := Df (x(k) )δ˜(k,i) + f (x(k) ) .

(3) If r(i) ≤ ηk f (x(k) ) , then go to (4),

else set i := i + 1 and go to (2).

(4)



δ˜(k) := δ˜(k,i) .



(5)



x(k+1) := x(k) + δ˜(k) .



(6) If f (x(k+1) ) > Θ f (x(k) ) , interrupt.

(7) If f (x(k+1) ) ≤ τ f (x(0) ) , end.

Else calculate ηk+1 , set k := k + 1, and go to (1).

Table 8.1. Inexact Newton’s method with monotonicity test.



Thus we could expect a descent of the sequence of iterates (x(k) ) in this

functional, i.e.,

¯ f (x(k) )

f (x(k+1) ) ≤ Θ



¯ < 1.

for a Θ



If this monotonicity test is not fulfilled, the iteration is terminated. Such

an example of an inexact Newton’s method is given in Table 8.1.

In order to avoid the termination of the method due to divergence, the

continuation methods have been developed. They attribute the problem

f (x) = 0 to a family of problems to provide successively good initial iterates. The approach presented at the end of Section 8.3 for time-dependent

problems is similar to the continuation methods. Another approach (which

can be combined with the latter) modifies the (inexact) Newton’s method,

so that the range of convergence is enlarged: Applying the damped (inexact) Newton’s method means reducing the step length of x(k) to x(k+1) as

long as we observe a decrease conformable to the monotonicity test. One

strategy of damping, termed Armijo’s rule, is described in Table 8.2 and

replaces the steps (1), (5), and (6) in Table 8.1.

Thus damping Newton’s method means also a relaxation similar to

(5.30), where ω = λk is being adjusted to the iteration step as in (5.41).

In the formulation of Table 8.2 the iteration may eventually not terminate

if in (5) λk is successively reduced. This must be avoided in a practical

implementation of the method. But except for situations where divergence

is obvious, this situation will not appear, because we have the following

theorem:



358



8. Iterative Methods for Nonlinear Equations



Let additionally α, β ∈ (0, 1) be given.

δ˜(k,0) := 0 , i := 1 , λk := 1.



(1)



(5) If f (x(k) + λk δ˜(k) ) ≥ (1 − αλk ) f (x(k) ) , set λk := βλk

and go to (5).

x(k+1) := x(k) + λk δ˜(k) .



(6)



Table 8.2. Damped inexact Newton step according to Armijo’s rule.



Theorem 8.15 Let α, β, γ > 0 exist such that conditions (i), (ii) of Theorem 8.11 on k∈N Br (x(k) ) hold for the sequence (x(k) )k defined according

to Table 8.2. Let ηk ≤ η¯ for an η¯ < 1 − α.

¯ for all k ∈ N. If

¯ > 0 such that λk ≥ λ

Then if f x(0) = 0, there exists a λ

¯, satisfying (8.17)

furthermore x(k) k is bounded, then there exists a zero x

and

x(k) → x¯



for k → ∞ .



There exists a k0 ∈ N such that for k ≥ k0 the relation

λk = 1

holds.

Proof: See [22, pp. 139 ff.].







We see that in the final stage of the iteration we again deal with the

(inexact) Newton’s method with the previously described behaviour of

convergence.

Finally, the following should be mentioned: The problem f (x) = 0 and

Newton’s method are affine-invariant in the sense that a transition to

Af (x) = 0 with a nonsingular A ∈ Rm,m changes neither the problem

nor the iteration method, since

D(Af )(x)−1 Af (x) = Df (x)−1 f (x) .

Among the assumptions of Theorem 8.11, (8.14) is not affine-invariant. A

possible alternative would be

Df (y)−1 (Df (x) − Df (y)) ≤ γ x − y ,

which fulfils the requirement. With the proof of Lemma 8.10 it follows that

γ

x−y 2.

Df (y)−1 (f (x) − f (y) − Df (y)(x − y)) ≤

2

With this argument a similar variant of Theorem 8.11 can be proven.



8.2. Newton’s Method and Variants



359



The test of monotonicity is not affine-invariant, so probably the natural

test of monotonicity

Df x(k)



−1



f x(k+1)



¯ Df x(k)

≤Θ



−1



f x(k)



has to be preferred. The vector on the right-hand side has already been

calculated, being, except for the sign, the Newton correction δ (k) . But for

the vector in the left-hand side, −δ¯(k+1) , the system of equations

Df x(k) δ¯(k+1) = −f x(k+1)

additionally has to be resolved.



Exercises

8.3 Consider the chord method as described in (8.20). Prove the

convergence of this method to the solution x

¯ under the following

assumptions:

x) ⊂ C hold,

(1) Let (8.17) with B r (¯

(2)



Df (x(0) )



−1



≤β,



(3) 2βγr < 1 ,

(4) x(0) ∈ B r (¯

x) .

8.4 Let assumption (8.17) hold. Prove for compatible matrix and vector

norms that for every > 0 there exists a δ > 0 such that for every x ∈

x),

Bδ (¯

x) ,

(1) Df (x) ≤ (1 + )1/2 Df (¯

(2) Df (x)−1 ≤ (1 + )1/2 Df (¯

x)−1

(employ



(I − M )−1 ≤ 1/(1 − M )



(3) (1 + )−1/2 Df (¯

x)−1

≤ (1 + )



1/2



−1



Df (¯

x)



for M < 1),



x − x¯ ≤ f (x)

x − x¯ ,



(4) Theorem 8.12.

8.5 Prove Corollary 8.14.

8.6 Let U ⊂ Rm be open and convex. Consider problem (8.2) with continuously differentiable f : U → Rm . For i = 1, . . . , m let Ji ⊂ {1, . . . , m}

be defined by

/ Ji and every x ∈ U .

∂j fi (x) = 0 for j ∈



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2 Newton's Method and Its Variants

Tải bản đầy đủ ngay(0 tr)

×