Section 1.6

The Newton Method

Lecture Video

html"""

<div style="

position: absolute;

width: calc(100% - 30px);

border: 50vw solid #282936;

border-top: 500px solid #282936;

border-bottom: none;

box-sizing: content-box;

left: calc(-50vw + 15px);

top: -500px;

height: 500px;

pointer-events: none;

"></div>

<div style="

height: 500px;

width: 100%;

background: #282936;

color: #fff;

padding-top: 68px;

<span style="

font-family: Vollkorn, serif;

font-weight: 700;

font-feature-settings: 'lnum', 'pnum';

"> <p style="

font-size: 1.5rem;

opacity: .8;

"><em>Section 1.6</em></p>

<em> The Newton Method </em>

</p>

<p style="

font-size: 1.5rem;

text-align: center;

155 μs

begin

using Symbolics

, ForwardDiff

, Plots

, PlutoUI

, LaTeXStrings

using ForwardDiff: jacobian

end

16.7 s

Solving equations and finding inverse transformations using the Newton method

md"""

## Solving equations and finding inverse transformations using the Newton method

"""

244 μs

In science and engineering we often need to solve systems of equations.

If the equations are linear then linear algebra tells us a general method to solve them; these are now routinely applied to solve systems of millions of linear equations.

If the equations are nonlinear then things are less obvious. The main solution methods we know work by... reducing the nonlinear equations to a sequence of linear equations! They do this by approximating the function by a linear function and solving that to get a better solution, then repeating this operation as many times as necessary to get a sequence of increasingly better solutions. This is an example of an iterative algorithm.

A well-known and elegant method, which can be used in many different contexts, is the Newton method. It does, however, have the disadvantage that it requires derivatives of the function. This can be overcome using automatic differentiation techniques.

We will illustrate the Newton method using the ForwardDiff.jl package to carry out automatic differentiation, but we will also try to understand what's going on "under the hood".

md"""

In science and engineering we often need to *solve systems of equations*.

If the equations are *linear* then linear algebra tells us a general method to solve them; these are now routinely applied to solve systems of millions of linear equations.

If the equations are *non*linear then things are less obvious. The main solution methods we know work by... reducing the nonlinear equations to a sequence of linear equations! They do this by *approximating* the function by a linear function and solving that to get a better solution, then repeating this operation as many times as necessary to get a *sequence* of increasingly better solutions. This is an example of an **iterative algorithm**.

A well-known and elegant method, which can be used in many different contexts, is the **Newton method**. It does, however, have the disadvantage that it requires derivatives of the function. This can be overcome using **automatic differentiation** techniques.

We will illustrate the Newton method using the `ForwardDiff.jl` package to carry out automatic differentiation, but we will also try to understand what's going on "under the hood".

"""

915 μs

The Newton method in 1D

We would like to solve equations like $f (x) = g (x)$ . We rewrite that by moving all the terms to one side of the equation so that we can write $h (x) = 0$ , with $h (x) := f (x) - g (x)$ .

A point $x^{*}$ such that $h (x^{*}) = 0$ is called a root or zero of $h$ .

The Newton method finds zeros, and hence solves the original equation.

md"""

## The Newton method in 1D

We would like to solve equations like $f(x) = g(x)$.

We rewrite that by moving all the terms to one side of the equation so that we can write $h(x) = 0$, with $h(x) := f(x) - g(x)$.

A point $x^*$ such that $h(x^*) = 0$ is called a **root** or **zero** of $h$.

The Newton method finds zeros, and hence solves the original equation.

"""

570 μs

The idea of the Newton method is to follow the direction in which the function is pointing! We do this by building a tangent line at the current position and following that instead, until it hits the $x$ -axis.

Let's look at that visually first:

md"""

The idea of the Newton method is to *follow the direction in which the function is pointing*! We do this by building a **tangent line** at the current position and following that instead, until it hits the $x$-axis.

Let's look at that visually first:

"""

418 μs

n = 0

md"""

n = $(@bind n2 Slider(0:10, show_value=true, default=0))

"""

227 ms

x₀ = 6

md"""

x₀ = $(@bind x02 Slider(-10:10, show_value=true, default=6))

"""

1.3 ms

let

f(x) = x^2 - 2

standard_Newton(f, n2, -1:0.01:10, x02, -10, 70)

end

6.2 s

n = 0

md"""

n = $(@bind n Slider(0:10, show_value=true, default=0))

"""

1.2 ms

x₀ = 6

md"""

x₀ = $(@bind x0 Slider(-10:10, show_value=true, default=6))

"""

1.2 ms

let

f(x) = 0.2x^3 - 4x + 1

standard_Newton(f, n, -10:0.01:10, x0, -10, 70)

end

297 ms

Using symbolic calculations to understand derivatives and nonlinear maps

md"""

## Using symbolic calculations to understand derivatives and nonlinear maps

"""

234 μs

We can use Julia's new symbolic capabilities to understand what's going on with a nonlinear (polynomial) function.

Let's see what happens if we perturb a function $f$ around a point $z$ by a small amount $η$ .

md"""

We can use Julia's new symbolic capabilities to understand what's going on with a nonlinear (polynomial) function.

Let's see what happens if we perturb a function $f$ around a point $z$ by a small amount $\eta$.

"""

328 μs

Symbolics.Num

$z$

$η$

@variables z, η

230 ms

f(x) = x^m - 2;

451 μs

f′(x) = ForwardDiff.derivative(f, x);

467 μs

m = 1

md"""

m = $(@bind m Slider(1:6, show_value=true))

"""

17.1 ms

$- 2 + z$

f(z)

1.6 s

$- 2 + z + η$

f(z + η)

83.5 ms

$- 2 + z + η$

expand( f(z + η) )

4.1 s

When $η$ is small, $η^{2}$ is very small, so we can ignore it. We are left with terms that either don't contain $η$ (constants), or multiply $η$ (linear). The part that multiplies $η$ is the derivative:

md"""

When $\eta$ is small, $\eta^2$ is *very* small, so we can ignore it. We are left with terms that either don't contain $\eta$ (constants), or multiply $\eta$ (linear). The part that multiplies $\eta$ is the derivative:

"""

338 μs

$1$

f′(z)

215 ms

$- 2 + z + η$

f(z) + η*f′(z)

126 ms

$0$

expand( f(z + η) ) - ( f(z) + η*f′(z) )

60.1 ms

The derivative gives the "linear part" of the function. ForwardDiff.jl, and forward-mode automatic differentiation in general, effectively uses this (although not symbolically in this sense) to just propagate the linear part of each function through a calculation.

md"""

The derivative gives the "*linear* part" of the function. `ForwardDiff.jl`, and forward-mode automatic differentiation in general, effectively uses this (although not symbolically in this sense) to just propagate the linear part of each function through a calculation.

"""

324 μs

Mathematics of the Newton method

md"""

## Mathematics of the Newton method

"""

225 μs

We can convert the idea of "following the tangent line" into equations as follows. (You can also do so by just looking at the geometry in 1D, but that does not help in 2D.)

md"""

We can convert the idea of "following the tangent line" into equations as follows.

(You can also do so by just looking at the geometry in 1D, but that does not help in 2D.)

"""

261 μs

Suppose we have a guess $x_{0}$ for the root and we want to find a (hopefully) better guess $x_{1}$ .

Let's set $x_{1} = x_{0} + δ$ , where $x_{1}$ and $δ$ are still unknown.

We want $x_{1}$ to be a root, so

md"""

Suppose we have a guess $x_0$ for the root and we want to find a (hopefully) better guess $x_1$.

Let's set $x_1 = x_0 + \delta$, where $x_1$ and $\delta$ are still unknown.

We want $x_1$ to be a root, so

"""

1.2 ms

$f (x_{1}) = f (x_{0} + δ) ≃ 0$

md"""

$$f(x_1) = f(x_0 + \delta) \simeq 0$$

"""

191 μs

If we are already "quite close" to the root then $δ$ should be small, so we can approximate $f$ using the tangent line:

$f (x_{0}) + δ f^{'} (x_{0}) ≃ 0$

and hence

$δ ≃ \frac{- f (x_{0})}{f^{'} (x_{0})}$

so that

$x_{1} = x_{0} - \frac{f (x_{0})}{f^{'} (x_{0})}$

Now we can repeat so that

$x_{2} = x_{1} - \frac{f (x_{1})}{f^{'} (x_{1})}$

and in general

$x_{n + 1} = x_{n} - \frac{f (x_{n})}{f^{'} (x_{n})} .$

This is the Newton method in 1D.

md"""

If we are already "quite close" to the root then $\delta$ should be small, so we can approximate $f$ using the tangent line:

$$f(x_0) + \delta \, f'(x_0) \simeq 0$$

and hence

$$\delta \simeq \frac{-f(x_0)}{f'(x_0)}$$

so that

$$x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}$$

Now we can repeat so that

$$x_2 = x_1 - \frac{f(x_1)}{f'(x_1)}$$

and in general

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.$$

This is the Newton method in 1D.

"""

533 μs

Implementation in 1D

md"""

## Implementation in 1D

"""

227 μs

newton1D (generic function with 1 method)

function newton1D(f, x0)

f′(x) = ForwardDiff.derivative(f, x) # \prime<TAB>

x0 = 37.0 # starting point

sequence = [x0]

x = x0

for i in 1:10

x -= f(x) / f′(x)

end

return

end

1.8 ms

1.414213562373095

newton1D(x -> x^2 - 2, 37.0)

40.0 ms

1.4142135623730951

sqrt(2)

17.7 μs

Symbolic derivative in 2D

md"""

## Symbolic derivative in 2D

"""

227 μs

Let's see what happens when we perturb by small amounts $δ$ in the $x$ direction and $ϵ$ in the $y$ direction around the point $(a, b)$ :

md"""

Let's see what happens when we perturb by small amounts $\delta$ in the $x$ direction and $\epsilon$ in the $y$ direction around the point $(a, b)$:

"""

279 μs

p = 0.0

md"""

p = $(@bind p Slider(0:0.01:1, show_value=true))

"""

96.4 ms

Symbolics.Num

$a$

$b$

$δ$

$ϵ$

@variables a, b, δ, ϵ

128 μs

image

Symbolics.Num

$a + δ$

$b + ϵ$

image = expand.(T(p)( [ (a + δ), (b + ϵ) ] ))

932 ms

2×2 Matrix{Text{Num}}:
 1.0  0.0
 0.0  1.0

jacobian(T(p), [a, b]) .|> Text

2.2 s

Symbolics.Num

$δ$

$ϵ$

jacobian(T(p), [a, b]) * [δ, ϵ]

204 ms

Symbolics.Num

$δ$

$ϵ$

image - T(p)([a, b])

32.6 ms

Symbolics.Num

$0$

simplify.(expand.(image - T(p)([a, b]) - jacobian(T(p), [a, b]) * [δ, ϵ]))

214 ms

Newton for transformations in 2 dimensions

$T : R^{2} \to R^{2}$

md"""

## Newton for transformations in 2 dimensions

$$T: \mathbb{R}^2 \to \mathbb{R}^2$$

"""

246 μs

We want to find the inverse $T^{- 1} (y)$ , i.e. to solve the equation $T (x) = y$ for $x$ .

We use the same idea as in 1D, but now in 2D:

md"""

We want to find the inverse $T^{-1}(y)$, i.e. to solve the equation $T(x) = y$ for $x$.

We use the same idea as in 1D, but now in 2D:

"""

320 μs

$T (x_{0} + δ) ≃ 0$

$T (x_{0}) + J \cdot δ ≃ 0,$

where $J := D T_{x_{0}}$ is the Jacobian matrix of $T$ at $x_{0}$ , i.e. the best linear approximation of $T$ near to $x_{0}$ .

md"""

$$T(x_0 + \delta) \simeq 0$$

$$T(x_0) + J \cdot \delta \simeq 0,$$

where $J := DT_{x_0}$ is the Jacobian matrix of $T$ at $x_0$, i.e. the best linear approximation of $T$ near to $x_0$.

"""

295 μs

Hence $δ$ is the solution of the system of linear equations

md"""

Hence $\delta$ is the solution of the system of linear equations

"""

251 μs

$J \cdot δ = - T (x_{0})$

Then we again construct the new approximation $x_{1}$ as $x_{1} := x_{0} + δ$ .

md"""

$$J \cdot \delta = -T(x_0)$$

Then we again construct the new approximation $x_1$ as $x_1 := x_0 + \delta$.

"""

276 μs

In 2D we have an explicit formula for the inverse of the matrix.

md"""

In 2D we have an explicit formula for the inverse of the matrix.

"""

241 μs

Implementation in 2D

md"""

## Implementation in 2D

"""

226 μs

newton2D_step (generic function with 1 method)

function newton2D_step(T, x)

J = ForwardDiff.jacobian(T, x) # should use StaticVectors

δ = J \ T(x) # J^(-1) * T(x)

return x - δ

end

636 μs

newton2D

Looks for x such that T(x) = 0

"Looks for x such that T(x) = 0"

function newton2D(T, x0, n=10)

x = x0

for i in 1:n

x = newton2D_step(T, x)

end

return

end

41.1 ms

Remember that Newton is designed to look for roots, i.e. places where $T (x) = 0$ . We want $T (x) = y$ , so we need another layer:

md"""

Remember that Newton is designed to look for *roots*, i.e. places where $T(x) = 0$.

We want $T(x) = y$, so we need another layer:

"""

322 μs

inverse

Looks for x such that f(x) = y, i.e. f(x) - y = 0

"Looks for x such that f(x) = y, i.e. f(x) - y = 0"

function inverse(f, y, x0=[0, 0])

return newton2D(x -> f(x) - y, x0)

end

1.7 ms

inverse (generic function with 3 methods)

inverse(f) = y -> inverse(f, y)

836 μs

straight (generic function with 1 method)

straight(x0, y0, x, m) = y0 + m * (x - x0)

514 μs

standard_Newton (generic function with 3 methods)

function standard_Newton(f, n, x_range, x0, ymin=-10, ymax=10)

f′ = x -> ForwardDiff.derivative(f, x)

p = plot(f, x_range, lw=3, ylim=(ymin, ymax), legend=:false, size=(400, 300))

hline!([0.0], c="magenta", lw=3, ls=:dash)

scatter!([x0], [0], c="green", ann=(x0, -5, L"x_0", 10))

for i in 1:n

plot!([x0, x0], [0, f(x0)], c=:gray, alpha=0.5)

scatter!([x0], [f(x0)], c=:red)

m = f′(x0)

plot!(x_range, [straight(x0, f(x0), x, m) for x in x_range],

c=:blue, alpha=0.5, ls=:dash, lw=2)

x1 = x0 - f(x0) / m

scatter!([x1], [0], c="green", ann=(x1, -5, L"x_%$i", 10))

x0 = x1

end

p |> as_svg

end

45.4 ms

T (generic function with 1 method)

T(α) = ((x, y),) -> [x + α*y^2, y + α*x^2]

1.5 ms

α = 0.0

md"""

α = $(@bind α Slider(0.0:0.01:1.0, show_value=true))

"""

1.4 ms

Float64

0.3

0.4

T(α)( [0.3, 0.4] )

22.0 μs

Float64

0.3

0.4

inverse(T(α))( [0.3, 0.4] )

3.7 s

Float64

0.3

0.4

( T(α) ∘ inverse(T(α)) )( [0.3, 0.4] )

7.4 ms

Appendix

md"""

## Appendix

"""

225 μs

expand (generic function with 1 method)

expand(ex) = simplify(ex, polynorm=true)

546 μs