Section 1.6

The Newton Method

Lecture Video

html"""
<div style="
position: absolute;
width: calc(100% - 30px);
border: 50vw solid #282936;
border-top: 500px solid #282936;
border-bottom: none;
box-sizing: content-box;
left: calc(-50vw + 15px);
top: -500px;
height: 500px;
pointer-events: none;
"></div>

<div style="
height: 500px;
width: 100%;
background: #282936;
color: #fff;
padding-top: 68px;
">
<span style="
font-family: Vollkorn, serif;
font-weight: 700;
font-feature-settings: 'lnum', 'pnum';
"> <p style="
font-size: 1.5rem;
opacity: .8;
"><em>Section 1.6</em></p>
<p style="text-align: center; font-size: 2rem;">
<em> The Newton Method </em>
</p>

<p style="
font-size: 1.5rem;
text-align: center;
155 μs
begin
using Symbolics, ForwardDiff, Plots, PlutoUI, LaTeXStrings
using ForwardDiff: jacobian
end
16.7 s

Solving equations and finding inverse transformations using the Newton method

md"""
## Solving equations and finding inverse transformations using the Newton method
"""
244 μs

In science and engineering we often need to solve systems of equations.

If the equations are linear then linear algebra tells us a general method to solve them; these are now routinely applied to solve systems of millions of linear equations.

If the equations are nonlinear then things are less obvious. The main solution methods we know work by... reducing the nonlinear equations to a sequence of linear equations! They do this by approximating the function by a linear function and solving that to get a better solution, then repeating this operation as many times as necessary to get a sequence of increasingly better solutions. This is an example of an iterative algorithm.

A well-known and elegant method, which can be used in many different contexts, is the Newton method. It does, however, have the disadvantage that it requires derivatives of the function. This can be overcome using automatic differentiation techniques.

We will illustrate the Newton method using the ForwardDiff.jl package to carry out automatic differentiation, but we will also try to understand what's going on "under the hood".

md"""
In science and engineering we often need to *solve systems of equations*.

If the equations are *linear* then linear algebra tells us a general method to solve them; these are now routinely applied to solve systems of millions of linear equations.

If the equations are *non*linear then things are less obvious. The main solution methods we know work by... reducing the nonlinear equations to a sequence of linear equations! They do this by *approximating* the function by a linear function and solving that to get a better solution, then repeating this operation as many times as necessary to get a *sequence* of increasingly better solutions. This is an example of an **iterative algorithm**.

A well-known and elegant method, which can be used in many different contexts, is the **Newton method**. It does, however, have the disadvantage that it requires derivatives of the function. This can be overcome using **automatic differentiation** techniques.

We will illustrate the Newton method using the `ForwardDiff.jl` package to carry out automatic differentiation, but we will also try to understand what's going on "under the hood".
"""
915 μs

The Newton method in 1D

We would like to solve equations like f(x)=g(x). We rewrite that by moving all the terms to one side of the equation so that we can write h(x)=0, with h(x):=f(x)g(x).

A point x such that h(x)=0 is called a root or zero of h.

The Newton method finds zeros, and hence solves the original equation.

md"""
## The Newton method in 1D

We would like to solve equations like $f(x) = g(x)$.
We rewrite that by moving all the terms to one side of the equation so that we can write $h(x) = 0$, with $h(x) := f(x) - g(x)$.

A point $x^*$ such that $h(x^*) = 0$ is called a **root** or **zero** of $h$.

The Newton method finds zeros, and hence solves the original equation.
"""
570 μs

The idea of the Newton method is to follow the direction in which the function is pointing! We do this by building a tangent line at the current position and following that instead, until it hits the x-axis.

Let's look at that visually first:

md"""


The idea of the Newton method is to *follow the direction in which the function is pointing*! We do this by building a **tangent line** at the current position and following that instead, until it hits the $x$-axis.

Let's look at that visually first:

"""
418 μs

n = 0

md"""
n = $(@bind n2 Slider(0:10, show_value=true, default=0))
"""
227 ms

x₀ = 6

md"""
x₀ = $(@bind x02 Slider(-10:10, show_value=true, default=6))
"""
1.3 ms
let
f(x) = x^2 - 2

standard_Newton(f, n2, -1:0.01:10, x02, -10, 70)
end
6.2 s

n = 0

md"""
n = $(@bind n Slider(0:10, show_value=true, default=0))
"""
1.2 ms

x₀ = 6

md"""
x₀ = $(@bind x0 Slider(-10:10, show_value=true, default=6))
"""
1.2 ms
let
f(x) = 0.2x^3 - 4x + 1
standard_Newton(f, n, -10:0.01:10, x0, -10, 70)
end
297 ms

Using symbolic calculations to understand derivatives and nonlinear maps

md"""
## Using symbolic calculations to understand derivatives and nonlinear maps
"""
234 μs

We can use Julia's new symbolic capabilities to understand what's going on with a nonlinear (polynomial) function.

Let's see what happens if we perturb a function f around a point z by a small amount η.

md"""
We can use Julia's new symbolic capabilities to understand what's going on with a nonlinear (polynomial) function.

Let's see what happens if we perturb a function $f$ around a point $z$ by a small amount $\eta$.
"""
328 μs
@variables z, η
230 ms
f(x) = x^m - 2;
451 μs
f′(x) = ForwardDiff.derivative(f, x);
467 μs

m = 1

md"""
m = $(@bind m Slider(1:6, show_value=true))
"""
17.1 ms

2+z

f(z)
1.6 s

2+z+η

f(z + η)
83.5 ms

2+z+η

expand( f(z + η) )
4.1 s

When η is small, η2 is very small, so we can ignore it. We are left with terms that either don't contain η (constants), or multiply η (linear). The part that multiplies η is the derivative:

md"""
When $\eta$ is small, $\eta^2$ is *very* small, so we can ignore it. We are left with terms that either don't contain $\eta$ (constants), or multiply $\eta$ (linear). The part that multiplies $\eta$ is the derivative:
"""
338 μs

1

f′(z)
215 ms

2+z+η

f(z) + η*f′(z)
126 ms

0

expand( f(z + η) ) - ( f(z) + η*f′(z) )
60.1 ms

The derivative gives the "linear part" of the function. ForwardDiff.jl, and forward-mode automatic differentiation in general, effectively uses this (although not symbolically in this sense) to just propagate the linear part of each function through a calculation.

md"""
The derivative gives the "*linear* part" of the function. `ForwardDiff.jl`, and forward-mode automatic differentiation in general, effectively uses this (although not symbolically in this sense) to just propagate the linear part of each function through a calculation.
"""
324 μs

Mathematics of the Newton method

md"""
## Mathematics of the Newton method
"""
225 μs

We can convert the idea of "following the tangent line" into equations as follows. (You can also do so by just looking at the geometry in 1D, but that does not help in 2D.)

md"""
We can convert the idea of "following the tangent line" into equations as follows.
(You can also do so by just looking at the geometry in 1D, but that does not help in 2D.)
"""
261 μs

Suppose we have a guess x0 for the root and we want to find a (hopefully) better guess x1.

Let's set x1=x0+δ, where x1 and δ are still unknown.

We want x1 to be a root, so

md"""
Suppose we have a guess $x_0$ for the root and we want to find a (hopefully) better guess $x_1$.

Let's set $x_1 = x_0 + \delta$, where $x_1$ and $\delta$ are still unknown.

We want $x_1$ to be a root, so
"""
1.2 ms

f(x1)=f(x0+δ)0

md"""
$$f(x_1) = f(x_0 + \delta) \simeq 0$$
"""
191 μs

If we are already "quite close" to the root then δ should be small, so we can approximate f using the tangent line:

f(x0)+δf(x0)0

and hence

δf(x0)f(x0)

so that

x1=x0f(x0)f(x0)

Now we can repeat so that

x2=x1f(x1)f(x1)

and in general

xn+1=xnf(xn)f(xn).

This is the Newton method in 1D.

md"""
If we are already "quite close" to the root then $\delta$ should be small, so we can approximate $f$ using the tangent line:

$$f(x_0) + \delta \, f'(x_0) \simeq 0$$

and hence

$$\delta \simeq \frac{-f(x_0)}{f'(x_0)}$$

so that

$$x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}$$

Now we can repeat so that

$$x_2 = x_1 - \frac{f(x_1)}{f'(x_1)}$$

and in general

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.$$


This is the Newton method in 1D.
"""
533 μs

Implementation in 1D

md"""
## Implementation in 1D
"""
227 μs
newton1D (generic function with 1 method)
function newton1D(f, x0)
f′(x) = ForwardDiff.derivative(f, x) # \prime<TAB>
x0 = 37.0 # starting point
sequence = [x0]
x = x0
for i in 1:10
x -= f(x) / f′(x)
end
return x
end
1.8 ms
1.414213562373095
newton1D(x -> x^2 - 2, 37.0)
40.0 ms
1.4142135623730951
sqrt(2)
17.7 μs

Symbolic derivative in 2D

md"""
## Symbolic derivative in 2D

"""
227 μs

Let's see what happens when we perturb by small amounts δ in the x direction and ϵ in the y direction around the point (a,b):

md"""
Let's see what happens when we perturb by small amounts $\delta$ in the $x$ direction and $\epsilon$ in the $y$ direction around the point $(a, b)$:
"""
279 μs

p = 0.0

md"""
p = $(@bind p Slider(0:0.01:1, show_value=true))
"""
96.4 ms
@variables a, b, δ, ϵ
128 μs
image
image = expand.(T(p)( [ (a + δ), (b + ϵ) ] ))
932 ms
2×2 Matrix{Text{Num}}:
 1.0  0.0
 0.0  1.0
jacobian(T(p), [a, b]) .|> Text
2.2 s
jacobian(T(p), [a, b]) * [δ, ϵ]
204 ms
image - T(p)([a, b])
32.6 ms
simplify.(expand.(image - T(p)([a, b]) - jacobian(T(p), [a, b]) * [δ, ϵ]))
214 ms

Newton for transformations in 2 dimensions

T:R2R2

md"""
## Newton for transformations in 2 dimensions

$$T: \mathbb{R}^2 \to \mathbb{R}^2$$

"""
246 μs

We want to find the inverse T1(y), i.e. to solve the equation T(x)=y for x.

We use the same idea as in 1D, but now in 2D:

md"""
We want to find the inverse $T^{-1}(y)$, i.e. to solve the equation $T(x) = y$ for $x$.

We use the same idea as in 1D, but now in 2D:
"""
320 μs

T(x0+δ)0

T(x0)+Jδ0,

where J:=DTx0 is the Jacobian matrix of T at x0, i.e. the best linear approximation of T near to x0.

md"""
$$T(x_0 + \delta) \simeq 0$$

$$T(x_0) + J \cdot \delta \simeq 0,$$

where $J := DT_{x_0}$ is the Jacobian matrix of $T$ at $x_0$, i.e. the best linear approximation of $T$ near to $x_0$.
"""
295 μs

Hence δ is the solution of the system of linear equations

md"""
Hence $\delta$ is the solution of the system of linear equations
"""
251 μs

Jδ=T(x0)

Then we again construct the new approximation x1 as x1:=x0+δ.

md"""
$$J \cdot \delta = -T(x_0)$$

Then we again construct the new approximation $x_1$ as $x_1 := x_0 + \delta$.
"""
276 μs

In 2D we have an explicit formula for the inverse of the matrix.

md"""
In 2D we have an explicit formula for the inverse of the matrix.
"""
241 μs

Implementation in 2D

md"""
## Implementation in 2D
"""
226 μs
newton2D_step (generic function with 1 method)
function newton2D_step(T, x)
J = ForwardDiff.jacobian(T, x) # should use StaticVectors
δ = J \ T(x) # J^(-1) * T(x)
return x - δ
end
636 μs
newton2D

Looks for x such that T(x) = 0

"Looks for x such that T(x) = 0"
function newton2D(T, x0, n=10)
x = x0

for i in 1:n
x = newton2D_step(T, x)
end
return x
end
41.1 ms

Remember that Newton is designed to look for roots, i.e. places where T(x)=0. We want T(x)=y, so we need another layer:

md"""
Remember that Newton is designed to look for *roots*, i.e. places where $T(x) = 0$.
We want $T(x) = y$, so we need another layer:
"""
322 μs
inverse

Looks for x such that f(x) = y, i.e. f(x) - y = 0

"Looks for x such that f(x) = y, i.e. f(x) - y = 0"
function inverse(f, y, x0=[0, 0])
return newton2D(x -> f(x) - y, x0)
end
1.7 ms
inverse (generic function with 3 methods)
inverse(f) = y -> inverse(f, y)
836 μs
straight (generic function with 1 method)
straight(x0, y0, x, m) = y0 + m * (x - x0)
514 μs
standard_Newton (generic function with 3 methods)
function standard_Newton(f, n, x_range, x0, ymin=-10, ymax=10)
f′ = x -> ForwardDiff.derivative(f, x)


p = plot(f, x_range, lw=3, ylim=(ymin, ymax), legend=:false, size=(400, 300))

hline!([0.0], c="magenta", lw=3, ls=:dash)
scatter!([x0], [0], c="green", ann=(x0, -5, L"x_0", 10))

for i in 1:n

plot!([x0, x0], [0, f(x0)], c=:gray, alpha=0.5)
scatter!([x0], [f(x0)], c=:red)
m = f′(x0)

plot!(x_range, [straight(x0, f(x0), x, m) for x in x_range],
c=:blue, alpha=0.5, ls=:dash, lw=2)

x1 = x0 - f(x0) / m

scatter!([x1], [0], c="green", ann=(x1, -5, L"x_%$i", 10))
x0 = x1

end

p |> as_svg


end
45.4 ms
T (generic function with 1 method)
T(α) = ((x, y),) -> [x + α*y^2, y + α*x^2]
1.5 ms

α = 0.0

md"""
α = $(@bind α Slider(0.0:0.01:1.0, show_value=true))
"""
1.4 ms
T(α)( [0.3, 0.4] )
22.0 μs
inverse(T(α))( [0.3, 0.4] )
3.7 s
( T(α) ∘ inverse(T(α)) )( [0.3, 0.4] )
7.4 ms

Appendix

md"""
## Appendix
"""
225 μs
expand (generic function with 1 method)
expand(ex) = simplify(ex, polynorm=true)
546 μs