Newton-Raphson (NR) optimization

Many algorithms for geometry optimization are based on some variant of the Newton-Raphson (NR) scheme. The latter represents a general method for finding the extrema (minima or maxima) of a given function f(x) in an iterative manner. For minima, the first derivative f'(x) must be zero and the second derivative f''(x) must have a positive value, while for maxima f'(x) is again zero and the second derivative f''(x) has a negative value. The recipe for finding, starting from a point xn, the next point xn+1 in the iterative series is:

xn+1 = xn - (f'(xn)/f''(xn))

It is clear from this recipe that the first and second derivatives (or at least good approximations thereoff) are required in the process. As with any iterative procedure, a convergence criterion must be selected at which the iterative process can be considered to be converged. Typical choices include the gradient f'(xn), the difference between the functional values of two consecutive iterations df = f(xn+1) - f(xn), or the difference between the values of x itself between two consecutive iterations dx = xn+1 - xn.

As an example the following function has been chosen:

f(x) = 3x3 - 10x2 -56x +5

The first and second derivatives of f(x) with respect to x are:

f'(x) = 9x2 -20x -56

f''(x) = 18x -20

In the range of -4 < x < 6, these functions behave as follows (Scheme 1):
 


As a first starting point we will choose x0 = 0.0. This point is highlighted in Scheme 1 as start 1. The iterative procedure takes four steps to converge to the point designated in Scheme 1 as end 1 assuming a convergence criterion of dx = 0.001.
 

iteration xn f(x) f'(x) f''(x) dx
0 0.0 +5.0 -56.0 -20.0 -2.8
1 -2.8 +17.5440 +70.560 -70.40 +1.0023
2 -1.7977 +55.9249 +9.0395 -52.3586 +0.1726
3 -1.6251 +56.7207 +0.2683 -49.2510 -0.0054
4 -1.6197 +56.7214 +0.0049 -49.1546 -0.0001


In this case the NR algorithm locates a functional maximum at x=-1.6197. The maximum is designated through a large negative second derivative f''(x). The largest step dx is made at the very beginning due to the fairly small value of f''(x) at the starting point. This is actually one of the weaknesses of the NR procedure and the variants used in molecular geometry optimization therefore impose a limit on the maximum step size of each NR step (trust radius). Choosing a starting point somewhat closer to the functional minimum with x0 = 2.0 (start 2) gives a different result:
 

iteration xn f(x) f'(x) f''(x) dx
0 +2.0 -123.0 -60.0 +16.0 +3.75
1 +5.750 -77.2969 +70.560 +83.50 -1.5157
2 +4.2343 -183.6597 +9.0395 +56.2174 -0.3678
3 +3.8665 -187.6118 +0.2683 +49.5967 -0.0246
4 +3.8419 -187.6268 +0.0049 +49.1551 -0.0001


In this second case the NR procedure leads to a local minimum at x=+3.8419 (end 2). The second derivative at this point is strongly positive indicating a true minimum. As in the first example the first step is by far the largest and "overshoots" the minimum by a good margin. But convergence is achieved within four cycles. Function f(x) has, of course, been chosen such that an analytical solution can also be found in a straight forward manner for the condition that f'(x)=0 at x= +3.8418 and -1.6196. We can thus conclude that the NR procedure locates the maxima/minima in few iterations with good accuracy.
Additional consideration should, however, be given to the following points: Comparison of the first and second example shows that the final result of the NR procedure strongly depends on the starting point. Locating a minimum or a maximum using NR-based methods thus requires a good starting point. A further point that makes application of the NR algorithm less appealing is the need of the second derivatives f''(x). These are expensive to compute for large molecular systems and a variety of methods exist to deal with this point.