2 Newton's Method and Its Variants
Tải bản đầy đủ - 0trang
8.2. Newton’s Method and Variants
349
• Replace f by a simple function g that approximates f near x(0) and
whose zero is to be determined.
• Find x(1) as the solution of g(x) = 0.
Newton’s method needs the diﬀerentiability of f , and one chooses the
approximating aﬃne-linear function given by Df (x(0) ), i.e.,
g(x) = f x(0) + Df x(0) x − x(0) .
Under the assumption that Df (x(0) ) is nonsingular, the new iterate x(1) is
determined by solving the system of linear equations
Df x(0) x(1) − x(0) = −f x(0) ,
(8.10)
or formally by
x(1) := x(0) − Df x(0)
−1
f x(0) .
This suggests the following deﬁnition:
Φ(f )(x) = x − Df (x)−1 f (x) .
(8.11)
Here Φ is well-deﬁned only if Df (x) is nonsingular. Then x ∈ Rm is a zero
of f if and only if x is a ﬁxed point of Φ. When executing the iteration,
−1
we do not calculate Df x(k)
but only the system of equations similar
to (8.10).
Thus, the kth iteration of Newton’s method reads as follows: Solve
Df x(k) δ (k) = −f x(k)
(8.12)
x(k+1) := x(k) + δ (k) .
(8.13)
and set
Equation (8.13) has the same form as (5.10) with W = Df (x(k) ), with
the residual at x(k)
d(k) := f x(k) .
Thus the subproblem of the kth iteration is easier in the sense that it consists of a system of linear equations (with the same structure of dependence
as f ; see Exercise 8.6). In the same sense the system of equations (5.10)
in the case of linear stationary methods is “easier” to solve than the original problem of the same type. Furthermore, W is in general diﬀerent for
diﬀerent k.
An application of (8.12), (8.13) to Ax = b, i.e., Df (x) = A for all x ∈ Rm
results in (5.10) with W = A, a method converging in one step, which just
reformulates the original problem:
A x − x(0) = − Ax(0) − b .
350
8. Iterative Methods for Nonlinear Equations
The range of the iteration may be very small, as can be shown already
by one-dimensional examples. But in this neighbourhood of the solution
we have, e.g., for m = 1, the following:
¯ be a simple zero of f (i.e., f (¯
x) =
Corollary 8.9 Let f ∈ C 3 (R) and let x
0). Then Newton’s method converges locally to x¯, of order at least 2.
Proof: There exists an open neighbourhood V of x¯ such that f (x) = 0 for
all x ∈ V ; i.e., Φ is well-deﬁned by (8.11), continuous on V , and x
¯ is a ﬁxed
point of Φ. According to Corollary 8.8 it suﬃces to show that Φ (¯
x) = 0:
Φ (x) = 1 −
f (x)2 − f (x)f (x)
f (x)
= f (x)
=0
f (x)2
f (x)2
for x = x
¯,
and Φ exists continuously, because f ∈ C 3 (R).
✷
In the following we want to develop a general local theorem of convergence for Newton’s method (according to L.V. Kantorovich). It necessitates
only the Lipschitz continuity of Df and ensures the existence of a zero, too.
Here we always suppose a ﬁxed norm on Rm and consider a compatible
norm on Rm,m . As a prerequisite we need the following lemma:
Lemma 8.10 Let C0 ⊂ Rm be convex, open, f : C0 → Rm diﬀerentiable,
and suppose there exists γ > 0 such that
Df (x) − Df (y) ≤ γ x − y
for all x, y ∈ C0 .
(8.14)
Then for all x, y ∈ C0 ,
f (x) − f (y) − Df (y)(x − y) ≤
γ
x−y
2
2
.
Proof: Let ϕ : [0, 1] → Rm be deﬁned by ϕ(t) := f (y + t(x − y)), for
arbitrary, ﬁxed x, y ∈ C0 . Then ϕ is diﬀerentiable on [0, 1] and
ϕ (t) = Df (y + t(x − y))(x − y) .
Thus for t ∈ [0, 1] we have
ϕ (t) − ϕ (0)
=
(Df (y + t(x − y)) − Df (y)) (x − y)
≤
Df (y + t(x − y)) − Df (y)
x−y ≤ γt x−y
2
.
For
1
(ϕ (t) − ϕ (0)) dt
∆ := f (x)−f (y)−Df (y)(x−y) = ϕ(1)−ϕ(0)−ϕ (0) =
0
we also get
1
∆ ≤
1
ϕ (t) − ϕ (0) dt ≤ γ x − y
0
2
t dt =
0
1
γ x−y
2
2
.
✷
8.2. Newton’s Method and Variants
351
Now we are able to conclude local, quadratic convergence:
Theorem 8.11 Let C ⊂ Rm be convex, open and f : C → Rm diﬀerentiable.
For x(0) ∈ C there exist α, β, γ > 0 such that
h := α β γ/2 < 1 ,
r := α/(1 − h) ,
¯r x(0) ⊂ C .
B
Furthermore, we require:
(i) Df is Lipschitz continuous on C0 = Br+ε x(0) for some ε > 0 with
constant γ in the sense of (8.14).
(ii) For all x ∈ Br x(0) there exists Df (x)−1 and Df (x)−1 ≤ β.
Df x(0)
(iii)
−1
f x(0)
≤ α.
Then:
(1) The Newton iteration
x(k+1) := x(k) − Df x(k)
−1
f x(k)
is well-deﬁned and
for all k ∈ N .
x(k) ∈ Br x(0)
(k)
(2) x
(3)
→x
¯ for k → ∞ and f (¯
x) = 0.
(k+1)
x
− x
¯
for k ∈ N .
βγ (k)
≤
− x
¯
x
2
h2 −1
≤ α
1 − h2k
k
2
and
(k)
x
− x
¯
Proof: (1): To show that x(k+1) is well-deﬁned it is suﬃcient to verify
x(k) ∈ Br x(0) (⊂ C)
for all k ∈ N .
By induction we prove the extended proposition
x(k) ∈ Br x(0)
and
k−1
x(k) − x(k−1) ≤ α h2
−1
for all k ∈ N . (8.15)
The proposition (8.15) holds for k = 1, because according to (iii),
x(1) − x(0) = Df x(0)
−1
f x(0)
≤ α < r.
Let (8.15) be valid for l = 1, . . . , k. Then x(k+1) is well-deﬁned, and by the
application of the Newton iteration for k − 1 we get
x(k+1) − x(k) = Df x(k)
−1
f x(k)
≤ β f (x(k) )
= β f x(k) − f x(k−1) − Df x(k−1) x(k) − x(k−1)
βγ (k)
2
≤
x − x(k−1)
2
352
8. Iterative Methods for Nonlinear Equations
according to Lemma 8.10 with C0 = Br x(0) , and
k
βγ (k)
βγ 2 2k −2
2
x − x(k−1) ≤
α h
= αh2 −1 .
2
2
Thus the second part of (8.15) holds for k + 1, and also
x(k+1) − x(k) ≤
≤ x(k+1) − x(k) + x(k) − x(k−1) + · · · + x(1) − x(0)
x(k+1) − x(0)
≤ α h2 −1 + h2
< α/(1 − h) = r .
k
k−1
−1
+ · · · + h7 + h3 + h + 1
Hence (8.15) holds for k + 1.
(2): Using (8.15) we are able to verify that (x(k) )k is a Cauchy sequence,
because for l ≥ k we have
≤ x(l+1) − x(l) + x(l) − x(l−1) + · · · + x(k+1) − x(k)
x(l+1) − x(k)
k
≤ α h2
<
αh
−1
2k −1
k
1 − h2
k
3
k
1 + h2 + h2
→0
+ ···
(8.16)
for k → ∞ ,
¯ ∈ B r x(0) .
since h < 1. Hence there exists x
¯ = limk→∞ x(k) and x
Furthermore, f (¯
x) = 0, because we can conclude from x(k) ∈ Br x(0)
that
Df x(k) − Df x(0)
≤ γ x(k) − x(0) < γr ;
thus
≤ γr + Df x(0)
Df x(k)
(k)
and from f x
= −Df x
(k)
f x(k)
(k+1)
x
−x
(k)
=: K
, we obtain
≤ K x(k+1) − x(k) → 0
for k → ∞. Thus we also have
f (¯
x) = lim f (x(k) ) = 0 .
k→∞
(3): With l → ∞ in (8.16) we can prove the second part in (3); the ﬁrst
part follows from
x(k+1) − x¯
= x(k) − Df x(k)
−1
f x(k) − x¯
= x(k) − x
¯ − Df x(k)
= Df x(k)
−1
−1
f x(k) − f (¯
x)
f (¯
x) − f x(k) − Df x(k) x¯ − x(k)
which implies, according to Lemma 8.10 with C0 = Br+ε x(0) ⊂ C,
γ (k)
2
x(k+1) − x¯ ≤ β
x −x
¯ .
2
,
✷
8.2. Newton’s Method and Variants
353
The termination criterion (5.15), which is oriented at the residual, may
also be used for the nonlinear problem (and not just for the Newton
iteration). This can be deduced in analogy to (5.16):
Theorem 8.12 Let the following be valid:
There exists a zero x
¯ of f such that Df (¯
x) is nonsingular and
Df is Lipschitz continuous in an open neighbourhood C of x¯.
Then for every
f (y)
(8.17)
> 0 there exists a δ > 0 such that for x, y ∈ Bδ (¯
x),
x − x¯ ≤ (1 + ) κ(Df (¯
x)) f (x)
y − x¯ .
✷
Proof: See [22, p. 69, p. 72] and Exercise 8.4.
Here κ is the condition number in a matrix norm that is consistent with
the chosen vector norm. For x = x(k) and y = x(0) we get (locally) the
generalization of (5.16).
8.2.2 Modiﬁcations of Newton’s Method
Modiﬁcations of Newton’s method aim in two directions:
• Reduction of the cost of the assembling and the solution of the system of equations (8.12) (without a signiﬁcant deterioration of the
properties of convergence).
• Enlargement of the range of convergence.
We can account for the ﬁrst aspect by simplifying the matrix in (8.12)
(modiﬁed or simpliﬁed Newton’s method). The extreme case is the replacement of Df x(k) by the identity matrix; this leads us to the ﬁxed-point
iteration (8.9). If the mapping f consists of a nonlinear and a linear part,
f (x) := Ax + g(x) = 0 ,
(8.18)
then the system of equations (8.12) of the Newton iteration reads as
A + Dg x(k)
δ (k) = −f x(k) .
A straightforward simpliﬁcation in this case is the ﬁxed-point iteration
A δ (k) = −f x(k) .
(8.19)
It may be interpreted as the ﬁxed-point iteration (8.9) of the system that
is preconditioned with A, i.e., of
A−1 f (x) = 0.
In (8.19) the matrix is identical in every iteration step; therefore, it has to
be assembled only once, and if we use a direct method (cf. Section 2.5),
the LU factorization has to be carried out only once. Thus with forward
354
8. Iterative Methods for Nonlinear Equations
and backward substitution we have only to perform methods with lower
computational cost. For iterative methods we cannot rely on this advantage,
but we can expect that x(k+1) is close to x(k) , and consequently δ (k,0) = 0
constitutes a good initial guess. Accordingly, the assembling of the matrix
gets more important with respect to the overall computational cost, and
savings during the assembling become relevant.
We get a system of equations similar to (8.19) by applying the chord
method (see Exercise 8.3), where the linear approximation of the initial
iterate is maintained, i.e.,
Df x(0) δ (k) = −f x(k) .
(8.20)
If the matrix B x(k) , which approximates Df x(k) , is changing in each
iteration step, i.e.,
B x(k) δ (k) = −f x(k) ,
(8.21)
then the only advantage can be a possibly easier assembling or solvability of
the system of equations. If the partial derivatives ∂j fi (x) are more diﬃcult
to evaluate than the function fi (y) itself (or possibly not evaluable at all),
then the approximation of Df (x(k) ) by diﬀerence quotients can be taken
into consideration. This corresponds to
B x(k) ej =
1
f (x + hej ) − f (x)
h
(8.22)
for column j of B x(k) with a ﬁxed h > 0. The number of computations for
the assembling of the matrix remains the same: m2 for the full matrix and
analogously for the sparse matrix (see Exercise 8.6). Observe that numerical
diﬀerentiation is an ill-posed problem, which means that we should ideally
choose h ∼ δ 1/2 , where δ > 0 is the error level in the evaluation of f . Even
then we can merely expect
Df x(k) − B x(k)
≤ Cδ 1/2
(see [22, pp. 80 f.]). Thus in the best case we can expect only half of the
signiﬁcant digits of the machine precision. The second aspect of facilitated
solvability of (8.21) occurs if there appear “small” entries in the Jacobian, due to a problem-dependent weak coupling of the components, and
these entries may be skipped. Take, for example, a Df x(k) with a block
structure as in (5.39):
Df x(k) = Aij
ij
,
Aij ∈ Rmi ,mj ,
such that the blocks Aij may be neglected for j > i. Then there results a
nested system of equations of the dimensions m1 , m2 , . . . , mp .
The possible advantages of such simpliﬁed Newton’s methods have to
be weighted against the disadvantage of a deterioration in the order of
convergence: Instead of an estimation like that in Theorem 8.11, (3), we
8.2. Newton’s Method and Variants
355
have to expect an additional term
B x(k) − Df x(k)
x(k) − x .
This means only linear or — by successive improvement of the approximation — superlinear convergence (see [22, pp. 75 ﬀ.]). If we have a good
initial iterate, it may often be advantageous to perform a small number of
steps of Newton’s method. So in the following we will treat again Newton’s
method, although the subsequent considerations can also be transferred to
its modiﬁcations.
If the linear problems (8.12) are solved with an iterative scheme, we
have the possibility to adjust the accuracy of the algorithm in order to
reduce the number of inner iterations, without a (severe) deterioration of
the convergence of the outer iteration of the Newton iteration. So dealing
with such inexact Newton’s methods, we determine instead of δ (k) from
(8.12) only δ˜(k) , which fulﬁls (8.12) only up to an inner residual r(k) , i.e.,
Df x(k) δ˜(k) = −f x(k) + r(k) .
The new iterate is given by
x(k+1) := x(k) + δ˜(k) .
The accuracy of δ˜(k) is estimated by the requirement
r(k) ≤ ηk f x(k)
(8.23)
with certain properties for the sequence (ηk )k that still have to be determined. Since the natural choice of the initial iterate for solving (8.12)
is δ (k,0) = 0, (8.23) corresponds to the termination criterion (5.15).
Conditions for ηk can be deduced from the following theorem:
Theorem 8.13 Let (8.17) hold and consider compatible matrix and vector
x),
norms. Then there exists for every > 0 a δ > 0 such that for x(k) ∈ Bδ (¯
x(k+1) − x¯
≤
x(k) − Df x(k)
−1
¯
f x(k) − x
+ (1 + ) κ Df (¯
x) ηk x(k) − x¯ .
(8.24)
Proof: By the choice of δ we can ensure the nonsingularity of Df (x(k) ).
From
δ˜(k) = −Df x(k)
−1
f x(k) + Df x(k)
−1 (k)
r
it follows that
x(k+1) − x
¯
=
x(k) − x
¯ + δ˜(k)
≤
x(k) − x
¯ − Df x(k)
−1
f x(k)
The assertion can be deduced from the estimation
Df x(k)
−1 (k)
r
≤ (1 + )1/2 Df (¯
x)−1
r(k)
+ Df x(k)
−1 (k)
r
.
356
8. Iterative Methods for Nonlinear Equations
≤
x)−1 ηk (1 + )1/2 Df (¯
x)
(1 + )1/2 Df (¯
x(k) − x
¯ .
✷
Here we used Exercise 8.4 (2), (3) and (8.23).
The ﬁrst part of the approximation corresponds to the error of the exact Newton step, which can be estimated using the same argument as in
Theorem 8.11, (3) (with Exercise 8.4, (2)) by
x(k) − Df x(k)
−1
x)−1
f x(k) − x¯ ≤ (1 + )1/2 Df (¯
γ (k)
x −x
¯
2
2
.
This implies the following result:
Corollary 8.14 Let the assumptions of Theorem 8.13 be satisﬁed. Then
there exist δ > 0 and η¯ > 0 such that for x(0) ∈ Bδ (¯
x) and ηk ≤ η¯ for all
k ∈ N for the inexact Newton’s method the following hold:
(1) The sequence x(k)
k
converges linearly to x
¯.
(2) If ηk → 0 for k → ∞, then x(k)
(3) If ηk ≤ K f x(k)
ly.
k
converges superlinearly.
for a K > 0, then x(k)
k
converges quadratical-
✷
Proof: Exercise 8.5.
The estimation (8.24) suggests that we carefully choose a very ﬁne level
of accuracy η¯ of the inner iteration to guarantee the above statements of
convergence. This is particularly true for ill-conditioned Df (¯
x) (which is
common for discretization matrices: See (5.34)). In fact, the analysis in the
weighted norm · = Df (¯
x) · shows that only ηk ≤ η¯ < 1 has to be
ensured (cf. [22, pp. 97 ﬀ.]). With this and on the basis of
η˜k = α f x(k)
2
/ f x(k−1)
2
for some α ≤ 1 we can construct ηk in an adaptive way (see [22, p. 105]).
Most of the iterative methods introduced in Chapter 5 do not require the
explicit knowledge of the matrix Df x(k) . It suﬃces that the operation
Df x(k) y be feasible for vectors y, in general for fewer than m of them;
i.e., the directional derivative of f in x(k) in direction y is needed. Thus
in case a diﬀerence scheme for the derivatives of f should be necessary or
reasonable, it is more convenient to choose directly a diﬀerence scheme for
the directional derivative.
Since we cannot expect convergence of Newton’s method in general,
we require indicators for the convergence behaviour of the iteration. The
solution x
¯ is in particular also the solution of
Minimize
f (x)
2
for x ∈ Rm .
8.2. Newton’s Method and Variants
357
¯ ∈ (0, 1), k = 0, i = 0 be given.
Let x(0) , τ > 0, η0 , Θ
(1)
δ˜(k,0) := 0 , i := 1 .
(2) Determine the ith iterate δ˜(k,i) for Df (x(k) )δ˜(k) = −f (x(k) )
and calculate
r(i) := Df (x(k) )δ˜(k,i) + f (x(k) ) .
(3) If r(i) ≤ ηk f (x(k) ) , then go to (4),
else set i := i + 1 and go to (2).
(4)
δ˜(k) := δ˜(k,i) .
(5)
x(k+1) := x(k) + δ˜(k) .
(6) If f (x(k+1) ) > Θ f (x(k) ) , interrupt.
(7) If f (x(k+1) ) ≤ τ f (x(0) ) , end.
Else calculate ηk+1 , set k := k + 1, and go to (1).
Table 8.1. Inexact Newton’s method with monotonicity test.
Thus we could expect a descent of the sequence of iterates (x(k) ) in this
functional, i.e.,
¯ f (x(k) )
f (x(k+1) ) ≤ Θ
¯ < 1.
for a Θ
If this monotonicity test is not fulﬁlled, the iteration is terminated. Such
an example of an inexact Newton’s method is given in Table 8.1.
In order to avoid the termination of the method due to divergence, the
continuation methods have been developed. They attribute the problem
f (x) = 0 to a family of problems to provide successively good initial iterates. The approach presented at the end of Section 8.3 for time-dependent
problems is similar to the continuation methods. Another approach (which
can be combined with the latter) modiﬁes the (inexact) Newton’s method,
so that the range of convergence is enlarged: Applying the damped (inexact) Newton’s method means reducing the step length of x(k) to x(k+1) as
long as we observe a decrease conformable to the monotonicity test. One
strategy of damping, termed Armijo’s rule, is described in Table 8.2 and
replaces the steps (1), (5), and (6) in Table 8.1.
Thus damping Newton’s method means also a relaxation similar to
(5.30), where ω = λk is being adjusted to the iteration step as in (5.41).
In the formulation of Table 8.2 the iteration may eventually not terminate
if in (5) λk is successively reduced. This must be avoided in a practical
implementation of the method. But except for situations where divergence
is obvious, this situation will not appear, because we have the following
theorem:
358
8. Iterative Methods for Nonlinear Equations
Let additionally α, β ∈ (0, 1) be given.
δ˜(k,0) := 0 , i := 1 , λk := 1.
(1)
(5) If f (x(k) + λk δ˜(k) ) ≥ (1 − αλk ) f (x(k) ) , set λk := βλk
and go to (5).
x(k+1) := x(k) + λk δ˜(k) .
(6)
Table 8.2. Damped inexact Newton step according to Armijo’s rule.
Theorem 8.15 Let α, β, γ > 0 exist such that conditions (i), (ii) of Theorem 8.11 on k∈N Br (x(k) ) hold for the sequence (x(k) )k deﬁned according
to Table 8.2. Let ηk ≤ η¯ for an η¯ < 1 − α.
¯ for all k ∈ N. If
¯ > 0 such that λk ≥ λ
Then if f x(0) = 0, there exists a λ
¯, satisfying (8.17)
furthermore x(k) k is bounded, then there exists a zero x
and
x(k) → x¯
for k → ∞ .
There exists a k0 ∈ N such that for k ≥ k0 the relation
λk = 1
holds.
Proof: See [22, pp. 139 ﬀ.].
✷
We see that in the ﬁnal stage of the iteration we again deal with the
(inexact) Newton’s method with the previously described behaviour of
convergence.
Finally, the following should be mentioned: The problem f (x) = 0 and
Newton’s method are aﬃne-invariant in the sense that a transition to
Af (x) = 0 with a nonsingular A ∈ Rm,m changes neither the problem
nor the iteration method, since
D(Af )(x)−1 Af (x) = Df (x)−1 f (x) .
Among the assumptions of Theorem 8.11, (8.14) is not aﬃne-invariant. A
possible alternative would be
Df (y)−1 (Df (x) − Df (y)) ≤ γ x − y ,
which fulﬁls the requirement. With the proof of Lemma 8.10 it follows that
γ
x−y 2.
Df (y)−1 (f (x) − f (y) − Df (y)(x − y)) ≤
2
With this argument a similar variant of Theorem 8.11 can be proven.
8.2. Newton’s Method and Variants
359
The test of monotonicity is not aﬃne-invariant, so probably the natural
test of monotonicity
Df x(k)
−1
f x(k+1)
¯ Df x(k)
≤Θ
−1
f x(k)
has to be preferred. The vector on the right-hand side has already been
calculated, being, except for the sign, the Newton correction δ (k) . But for
the vector in the left-hand side, −δ¯(k+1) , the system of equations
Df x(k) δ¯(k+1) = −f x(k+1)
additionally has to be resolved.
Exercises
8.3 Consider the chord method as described in (8.20). Prove the
convergence of this method to the solution x
¯ under the following
assumptions:
x) ⊂ C hold,
(1) Let (8.17) with B r (¯
(2)
Df (x(0) )
−1
≤β,
(3) 2βγr < 1 ,
(4) x(0) ∈ B r (¯
x) .
8.4 Let assumption (8.17) hold. Prove for compatible matrix and vector
norms that for every > 0 there exists a δ > 0 such that for every x ∈
x),
Bδ (¯
x) ,
(1) Df (x) ≤ (1 + )1/2 Df (¯
(2) Df (x)−1 ≤ (1 + )1/2 Df (¯
x)−1
(employ
(I − M )−1 ≤ 1/(1 − M )
(3) (1 + )−1/2 Df (¯
x)−1
≤ (1 + )
1/2
−1
Df (¯
x)
for M < 1),
x − x¯ ≤ f (x)
x − x¯ ,
(4) Theorem 8.12.
8.5 Prove Corollary 8.14.
8.6 Let U ⊂ Rm be open and convex. Consider problem (8.2) with continuously diﬀerentiable f : U → Rm . For i = 1, . . . , m let Ji ⊂ {1, . . . , m}
be deﬁned by
/ Ji and every x ∈ U .
∂j fi (x) = 0 for j ∈