true
xxxxxxxxxx
begin
ENV["LANG"]="C"
using Pkg
Pkg.activate(mktempdir())
Pkg.add(["PyPlot","PlutoUI","DualNumbers","ForwardDiff","DiffResults"])
using PlutoUI
using PyPlot
using DualNumbers
using LinearAlgebra
using ForwardDiff
using DiffResults
PyPlot.svg(true)
end
Contents
Nonlinear systems of equations
Automatic differentiation
Dual numbers
We all know the field of complex numbers
Dual numbers are defined by extending the real numbers by formally adding an number
They form a ring, not a field.
Evaluating polynomials on dual numbers: Let
. Then
This can be generalized to any analytical function.
automatic evaluation of function and derivative at once forward mode automatic differentiationMultivariate dual numbers: generalization for partial derivatives
Dual numbers in Julia
Constructing a dual number:
2 + 1ɛ
xxxxxxxxxx
d=Dual(2,1)
Accessing its components:
2
1
xxxxxxxxxx
d.value,d.epsilon
Comparison with known derivative:
testdual (generic function with 1 method)
xxxxxxxxxx
function testdual(x,f,df)
xdual=Dual(x,1)
fdual=f(xdual)'
(f=f(x),f_dual=fdual.value),(df=df(x),df_dual=fdual.epsilon)
end
Polynomial expressions:
p (generic function with 1 method)
xxxxxxxxxx
p(x)=x^3+2x+1
dp (generic function with 1 method)
xxxxxxxxxx
dp(x)=3x^2+2
34.0
34.0
29.0
29.0
xxxxxxxxxx
testdual(3.0,p,dp)
Standard functions:
-0.544021
-0.544021
-0.839072
-0.839072
xxxxxxxxxx
testdual(10,sin,cos)
2.56495
2.56495
0.0769231
0.0769231
xxxxxxxxxx
testdual(13,log, x->1/x)
Function composition:
-0.506366
-0.506366
17.2464
17.2464
xxxxxxxxxx
testdual(10,x->sin(x^2),x->2x*cos(x^2))
Conclusion: if we apply dual numbers in the right way, we can do calculations with derivatives of complicated nonlinear expressions without the need to write code to calculate derivatives.
The forwardiff package provides these facilities.
testdual1 (generic function with 1 method)
xxxxxxxxxx
function testdual1(x,f,df)
(f=f(x),df=df(x),df_dual=ForwardDiff.derivative(f,x))
end
0.420167
0.907447
0.907447
xxxxxxxxxx
testdual1(13,sin,cos)
Let us plot some complicated function:
g (generic function with 1 method)
xxxxxxxxxx
g(x)=sin(exp(0.2*x)+cos(3x))
-5.0:0.01:5.0
xxxxxxxxxx
X=(-5:0.01:5)
xxxxxxxxxx
let
clf()
grid()
plot(X,g.(X),label="g(x)")
plot(X,ForwardDiff.derivative.(g,X), label="g'(x)")
legend()
gcf().set_size_inches(5,3)
gcf()
end
Solving nonlinear systems of equations
Let
There is no analogon to Gaussian elimination, so we need to solve iteratively.
Fixpoint iteration scheme:
Assume
Then we can define the iteration scheme: choose an initial value
Terminate if
or
Large domain of convergence
Convergence may be slow
Smooth coefficients not necessary
fixpoint! (generic function with 1 method)
xxxxxxxxxx
function fixpoint!(u,M,f, imax, tol)
history=Float64[]
for i=1:imax
res=norm(M(u)*u-f)
push!(history,res)
if res<tol
return u,history
end
u=M(u)\f
end
error("No convergence after $imax iterations")
end
Example problem
M (generic function with 1 method)
xxxxxxxxxx
function M(u)
[ 0.1+(u[1]^2+u[2]^2) -(u[1]^2+u[2]^2);
-(u[1]^2+u[2]^2) 0.1+(u[1]^2+u[2]^2)]
end
1
3
xxxxxxxxxx
F=[1,3]
19.9994
20.0006
3.16228
28284.3
0.282829
4.95196e-10
1.81899e-12
0.0
xxxxxxxxxx
fixpt_result,fixpt_history=fixpoint!([0,0],M,F,100,1.0e-12)
contraction (generic function with 1 method)
xxxxxxxxxx
contraction(h)=h[2:end]./h[1:end-1]
plothistory (generic function with 1 method)
xxxxxxxxxx
function plothistory(history)
clf()
semilogy(history)
grid()
gcf()
end
8944.27
9.9995e-6
1.75087e-9
0.00367327
0.0
xxxxxxxxxx
contraction(fixpt_history)
xxxxxxxxxx
plothistory(fixpt_history)
0.0
0.0
xxxxxxxxxx
M(fixpt_result)*fixpt_result-F
Newton iteration scheme
The fixed point iteration scheme assumes a particular structure of the nonlinear system. Can we do better ?
Let
'with
The one calculates in the
One can split this a folows:
Calculate residual:
Solve linear system for update:
Update solution:
General properties are:
Potenially small domain of convergence - one needs a good initial value
Possibly slow initial convergence
Quadratic convergence close to the solution
Linear and quadratic convergence
Let
Linear convergence: observed for e.g. linear systems: Asymptically constant error contraction rate
Quadratic convergence:
such that ,As
decreases, the contraction rate decreases:
In practice, we can watch
or
Automatic differentiation for Newton's method
This is the situation where we could apply automatic differentiation for vector functions of vectors.
A (generic function with 1 method)
xxxxxxxxxx
A(u)=M(u)*u
Create a result buffer for
MutableDiffResult([5.0e-324, 5.0e-324], ([6.91366206214077e-310 6.91366204885713e-310; 6.91366206214235e-310 6.91366042732735e-310],))
xxxxxxxxxx
dresult=DiffResults.JacobianResult(ones(2))
Calculate function and derivative at once:
MutableDiffResult([0.1999999999999993, 0.1999999999999993], ([8.100000000000001 -8.0; -8.0 8.100000000000001],))
xxxxxxxxxx
ForwardDiff.jacobian!(dresult,A,[2.0, 2.0])
0.2
0.2
xxxxxxxxxx
DiffResults.value(dresult)
2×2 Array{Float64,2}:
8.1 -8.0
-8.0 8.1
xxxxxxxxxx
DiffResults.jacobian(dresult)
A Newton solver with automatic differentiation
newton (generic function with 1 method)
xxxxxxxxxx
function newton(A,b,u0; tol=1.0e-12, maxit=100)
result=DiffResults.JacobianResult(u0)
history=Float64[]
u=copy(u0)
it=1
while it<maxit
ForwardDiff.jacobian!(result,(v)->A(v)-b ,u)
res=DiffResults.value(result)
jac=DiffResults.jacobian(result)
h=jac\res
u-=h
nm=norm(h)
push!(history,nm)
if nm<tol
return u,history
end
it=it+1
end
throw("convergence failed")
end
19.9994
20.0006
28.8467
5.58664
0.493295
0.000301159
3.69765e-13
1.28622e-11
1.60767e-15
xxxxxxxxxx
newton_result,newton_history=newton(A,F,[0,0.1],tol=1.e-13)
0.193667
0.0882991
0.000610505
1.22781e-9
34.7848
0.000124992
xxxxxxxxxx
contraction(newton_history)
xxxxxxxxxx
plothistory(newton_history)
1.81899e-12
-1.81899e-12
xxxxxxxxxx
A(newton_result)-F
Let us take a more complicated example:
A2 (generic function with 1 method)
xxxxxxxxxx
A2(x)= [x[1]+x[1]^5+3*x[2]*x[3],
0.1*x[2]+x[2]^5-3*x[1]-x[3],
x[3]^5+x[1]*x[2]*x[3]]
0.1
0.1
0.1
xxxxxxxxxx
F2=[0.1,0.1,0.1]
1.0
1.0
1.0
xxxxxxxxxx
U02=[1,1.0,1.0]
-0.248731
0.175566
0.663915
0.796625
4.90091
27.5487
5.62444
4.49756
3.59886
2.88249
2.31732
1.93716
1.62794
1.23389
1.26897
0.912283
1.2057
0.608024
0.819861
0.686672
0.763308
0.669681
3.8059
0.579875
0.503141
0.538473
0.992612
0.421726
0.136995
0.0144148
0.00042281
4.77938e-7
6.61053e-13
xxxxxxxxxx
res2,hist2=newton(A2,F2,U02)
0.0
-1.38778e-16
-1.38778e-17
xxxxxxxxxx
A2(res2)-F2
63
6.15208
5.62115
0.204163
0.799647
0.80018
0.800945
0.803928
0.83595
0.840376
0.757943
1.02843
0.718918
1.32163
0.504291
1.3484
0.837547
1.1116
0.87734
5.68315
0.246733
0.817058
0.86767
1.07022
1.84338
0.424865
0.324844
0.105221
0.0293317
0.00113039
1.38313e-6
xxxxxxxxxx
length(hist2),contraction(hist2)
xxxxxxxxxx
plothistory(hist2)
Here, we observe that we have to use lots of iteration steps and see a rather erratic behaviour of the residual. After
Damped Newton iteration
There are may ways to improve the convergence behaviour and/or to increase the convergence radius in such a case. The simplest ones are:
find a good estimate of the initial value
damping: do not use the full update, but damp it by some factor which we increase during the iteration process
linesearch: automatic detection of a damping factor
dnewton (generic function with 1 method)
xxxxxxxxxx
function dnewton(A,b,u0; tol=1.0e-12,maxit=100,damp=0.01,damp_growth=1)
result=DiffResults.JacobianResult(u0)
history=Float64[]
u=copy(u0)
it=1
while it<maxit
ForwardDiff.jacobian!(result,(v)->A(v)-b ,u)
res=DiffResults.value(result)
jac=DiffResults.jacobian(result)
h=jac\res
u-=damp*h
nm=norm(h)
push!(history,nm)
if nm<tol
return u,history
end
it=it+1
damp=min(damp*damp_growth,1.0)
end
throw("convergence failed")
end
-0.248731
0.175566
0.663915
0.796625
1.62137
0.572359
0.326425
0.178438
0.0805738
0.0268168
0.00467128
0.000174043
8.16876e-8
2.00594e-14
xxxxxxxxxx
res3,hist3=dnewton(A2,F2,U02,damp=0.5,damp_growth=1.1)
11
2.0353
0.35301
0.570316
0.546644
0.45155
0.332823
0.174192
0.037258
0.000469354
2.45563e-7
xxxxxxxxxx
length(hist3),contraction(hist3)
xxxxxxxxxx
plothistory(hist3)
-2.77556e-17
-2.77556e-17
-1.38778e-17
xxxxxxxxxx
A2(res3)-F2
The example shows: damping indeed helps to improve the convergece behaviour. However, if we keep the damping parameter less than 1, we loose the quadratic convergence behavior.
Parameter embedding
Another option is the use of parameter embedding for parameter dependent problems.
Problem: solve
for .Assume
can be easily solved.Choose step size
Solve
Set
Solve
with initial valueSet
If
repeat with 3.
If
is small enough, we can ensure that is a good initial value for .Possibility to adapt
depending on Newton convergenceParameter embedding + damping + update based convergence control go a long way to solve even strongly nonlinear problems!
As we will see later, a similar apporach can be used for time dependent problems.