2.1. Symbolic Differentiation
Differentiation is at the heart of many problems in scientific computing, from optimizing functions to solving differential equations. Computing derivatives is a mechanical process but very error prone when done manually. In this chapter, we will explore how we can compute derivatives of expressions and programs automatically.
Let's recall what derivative is, for a function f : ℝ → ℝ
we define it derivative as
$$f'(x) = \lim_{h → 0} \frac{f(x+h) - f(x)}{h} $$
The mathematical library Mathlib defines derivative fderiv
for a general function f : X → Y
from some vector spaces X
and Y
. The derivative of f
at point x : X
in direction dx : X
is
$$\texttt{fderiv ℝ f x dx} = \lim_{h → 0} \frac{f(x+h\cdot dx) - f(x)}{h} $$
SciLean provides tactic fun_trans
which can compute derivatives. The name fun_trans
stands for 'function transformation' and differentiations is just one example of function transformation. We will talk about general function transformations later in the book.
One of the simplest examples of a derivative is the derivative of \(x^n\) which is equal to \(n x^{n-1}\). Let's compute it using fun_trans
tactic
#check (fderiv ℝ (fun (x : ℝ) => x^n) x 1) rewrite_by n:ℕx:ℝ| ↑n * x ^ (n - 1)
↑n * x ^ (n - 1) : ℝ
2.1.1. Notation
Writing fderiv ℝ (fun x => f x)
is somewhat tedious so SciLean makes our life easier by introducing a nice notation ∂ x, f x
for differentiating (f x)
w.r.t. x
.
Before we explore this notation further we have to mention that fderiv
can also compute complex derivatives with @fderiv ℂ
instead of @fderiv ℝ
. However, most of the time we work exclusively with real derivative so we can inform Lean that the default choce of the scalar field are real numbers using the following command
set_default_scalar ℝ
Now Lean knows that we want real derivative when we write ∂ x, f x
.
Using this notation we can compute again the above derivative
#check (∂ (x : ℝ), x^n) rewrite_by n:ℕx₀:ℝu₀:ℝ × ℝdu₀:ℝ × ℝ| fun x => ↑n * x ^ (n - 1)
fun x => ↑n * x ^ (n - 1) : ℝ → ℝ
Because we did not specify the point where we want to compute the derivative we obtained a function in x
. We can specify the point where we want to compute the derivative with ∂ (x:=x₀), ...
#check (∂ (x:=x₀), x^n) rewrite_by n:ℕx₀:ℝu₀:ℝ × ℝdu₀:ℝ × ℝ| ↑n * x₀ ^ (n - 1)
↑n * x₀ ^ (n - 1) : ℝ
Writing rewrite_by fun_trans
every time we want to diferentiate an expression gets a bit tedious. We can add an exclamation mark after ∂
to indicate that we want to run fun_trans
tactic to compute the derivative.
#check (∂! (x : ℝ), x^n)
fun x => ↑n * x ^ (n - 1) : ℝ → ℝ
We can differentiate w.r.t to a vector valued variable (x : ℝ×ℝ)
#check ∂! (x : ℝ×ℝ), ‖x‖₂²
fun x => fun dx =>L[ℝ] 2 * ⟪dx, x⟫ : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ
For derivatives w.r.t. a vector valued variable we have to also specify the direction in which we differentiate. Therefore in the above we obtained derivative as a function of the position x
and direction dx
. Furthermore, the notation fun dx =>L[ℝ] ...
indicates that the function is linear function in dx
and similarly X →L[ℝ] Y
stands for the space of ℝ-linear functions from X
to Y
.
If we want to specify the position and the direction in which we want to compute the derivatives we use the notation ∂ (x:=x₀;dx₀), f x
#check ∂! (u:=u₀;du₀), ‖u‖₂²
2 * ⟪du₀, u₀⟫ : ℝ
To summarize all the different variants. For function of a scalar valued argument
#check ∂ f
#check ∂ x, f x
#check ∂ (x:=x₀), f x
For function of a vector valued argument
#check ∂ f
#check ∂ u, g u
#check ∂ (u:=u₀), g u
#check ∂ (u:=u₀;du₀), g u
There is nothing stopping us from applying derivative multiple times to compute higher order derivatives
#check (∂! (∂! (x:ℝ), x^2))
fun x => 2 : ℝ → ℝ
2.1.1.1. Exercises
-
Express first derivative of
f : ℝ → ℝ → ℝ
in the first and the second argument. Also express derivative in both arguments at the same time.
Solution
variable (f : ℝ → ℝ → ℝ) (x₀ y₀ : ℝ)
-- first argument
#check ∂ (x:=x₀), (f x y₀)
#check ∂ (f · y₀) x₀
-- second agument
#check ∂ (y:=y₀), (f x₀ y)
#check ∂ (f x₀ ·) y₀
-- both arguments
#check ∂ ((x,y):=(x₀,y₀)), f x y
-
For
(g : ℝ×ℝ → ℝ)
, express derivative ofg (x,y)
inx
.
Solution
variable (g : ℝ×ℝ → ℝ) (x₀ y₀ : ℝ)
#check ∂ (xy:=(x₀,y₀);(1,0)), g xy
#check ∂ g (x₀,y₀) (1,0)
#check ∂ (x:=x₀), g (x,y₀)
-
Express second derivatives of
f : ℝ → ℝ → ℝ
in the first and the second argument.
Solution
variable (f : ℝ → ℝ → ℝ) (x₀ y₀ : ℝ)
#check ∂ (x':= x₀), ∂ (x'':=x'), (f x'' y₀)
#check ∂ (∂ (f · y₀)) x₀
#check ∂ (y':=y₀), ∂ (y'':=y'), (f x₀ y'')
#check ∂ (∂ (f x₀ ·)) y₀
-
Let \(L(t,x)\) be a function of time and space and
y(t)
a function of time. Express \( \frac{d}{dt} L(t, y(t)) \) and \( \frac{\partial}{\partial t} L(t, y(t)) \) in Lean. What is the difference between these two expressions?
Solution
variable (L : ℝ → ℝ → ℝ) (y : ℝ → ℝ) (t : ℝ)
-- d/dt L
#check ∂ (t':=t), L t' (y t')
-- ∂/∂t L
#check ∂ (t':=t), L t' (y t)
Because SciLean's notation forces you to be a bit more explicit, there is no need to distinguish between \( \frac{d}{dt} \) and \( \frac{\partial}{\partial t} \). Lots of pain and suffering has been infliced on generations of physics students because of the ambiguity of partial derivative notation.
-
Express one dimensional Euler-Lagrange equation in Lean
$$\frac{d}{dt} \frac{\partial L}{\partial \dot x}(x(t),\dot x(t)) - \frac{\partial L}{\partial x}(x(t), \dot x(t)) $$
Solution
variable (L : ℝ → ℝ → ℝ) (x : ℝ → ℝ) (t : ℝ)
#check
let v := ∂ x
∂ (t':=t), (∂ (v':=v t'), L (x t') v') - ∂ (x':=x t), L x' (v t)
2.1.2. Examples
Let's put computing derivatives to some practical use. We will demonstrate how to use SciLean symbolic differentiations to solve few common tasks in scientific computing and physics.
2.1.2.1. Newton's solver
Solving non-linear equation \( f(x) = 0 \) is a very common problem in scientific computing. Often this can be done only approximately and a popular tool to do so is Newton's method. It is an interative process that starts with an initial guess \(x₀\) which is incrementally improved by the following rule
$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} $$
We can use Newton's method to compute sqruare root of \(y\) by choosing \( f(x) = x^2 - y\).
def mySqrt (steps : Nat) (y : Float) : Float := Id.run do
let f := fun x => x^2 - y
let mut x := 1.0
for _ in [0:steps] do
x := x - f x / (∂! x':=x, f x')
return x
#eval mySqrt 10 2.0
1.414214
You might feel a bit unconfortable here are we are differentiating a function defined on floating point numbers. If you think that can't be formally correct then you are right. We will discuss this issue in a later chapter "Working with Real Numbers".
2.1.2.1.1. Exercises
-
try to solve different equations, for example
exp x = y
to obtainlog
,x*exp x = y
to obtain Lambert W function or some polynomial. -
measure relative,\(\left| \frac{f(x_n)}{x_n} \right| \), and absolute error \( \left| f(x_n) \right| \) and use them for stopping criteria.
-
A difficult exercise is to define a general
newtonSolve
function that takes an arbitrary functionf : Float → Float
and uring elaboration synthesizes its derivative. Add multiple hints, 1. useinfer_var
trick, 2. state explicitly how the arguments should look like
Solution
set_default_scalar Float
def newtonSolve (steps : Nat) (x₀ : Float)
(f : Float → Float) {f' : Float → Float}
(hf : f' = (∂ f) := by
unfold deriv; fun_trans; infer_var) :
Float := Id.run do
let mut x := x₀
for _ in [0:steps] do
x := x - f x / f' x
return x
#eval newtonSolve 10 1.0 (fun x => x^2 - 2.0)
2.1.2.2. Kinematics
We can also use SciLean's symbolic differentiation to prove some basic theorems from physics. For example we can state the second Newton's law
def NewtonSecondLaw (m : ℝ)
(x : ℝ → ℝ) (F : ℝ → ℝ) : Prop :=
∀ t, m * deriv (∂ x) t = F t
saying that for a particle with mass m
under the influence of force F
has trajectory x
if the mass times the acceleration deriv (∂ x) t
, i.e. the second derivative of trajectory, is equal to the force F t
.
We can show that under constant force f
a particle with mass m
has trajectory (fun t => 1/2 * f/m * t^2)
example (m f : ℝ) (hm : m ≠ 0) :
NewtonSecondLaw m (fun t => f/(2*m) * t^2) (fun _ => f)
:= m:ℝf:ℝhm:m ≠ 0⊢ NewtonSecondLaw m (fun t => f / (2 * m) * t ^ 2) fun x => f
m:ℝf:ℝhm:m ≠ 0⊢ ∀ (t : ℝ), m * ∂ (∂ t, f / (2 * m) * t ^ 2) t = (fun x => f) t
-- compute derivatives
m:ℝf:ℝhm:m ≠ 0⊢ m * (f / (2 * m) * 2) = f
-- finish algebraic simplifications
m:ℝf:ℝhm:m ≠ 0⊢ m * (f * 2) = f * (2 * m); All goals completed! 🐙
2.1.2.2.1. Exercises
-
show that trajectory
x := fun t => (cos t, sin t)
satisfies differential equation∂ x t = (- (x t).2, (x t).1)
Solution
open SciLean Scalar
def ode (x : ℝ → ℝ×ℝ) :=
∀ t, deriv x t = (- (x t).2, (x t).1)
example : ode (fun t => (cos t, sin t)) := ⊢ ode fun t => (cos t, sin t)
⊢ ∀ (t : ℝ), (∂ (t:=t), (cos t, sin t)) 1 = (-((fun t => (cos t, sin t)) t).2, ((fun t => (cos t, sin t)) t).1); All goals completed! 🐙
-
Show that trajectory \(x(t) = \sin(\omega t) \) corresponds to the force \(f(x) = - k x \) with \(\omega = \sqrt{(k/m)} \)
Hint
After differentiation you will have to show that \(m \sqrt{\frac{k}{m}}^2 = k\). Unfortunatelly Lean is not yet very powerful computer algebra system. So you can finish the proof with
ring_nf -- m * (sqrt (k / m) * (sqrt (k / m) ==> m * sqrt (k * m⁻¹) ^ 2 have h : m * sqrt (k * m⁻¹) ^ 2 = k := sorry_proof simp[h]
where we call ring_nf
to clean up the expression, then we just assume that m * sqrt (k * m⁻¹) ^ 2
is equal to k
and finally we can finish the proof by running simp
Solution
open SciLean Scalar
example (m k : ℝ) :
let ω := sqrt (k/m)
let x := fun t => sin (ω*t)
NewtonSecondLaw m x (fun t => - k*x t) := m:ℝk:ℝ⊢ let ω := sqrt (k / m);
let x := fun t => sin (ω * t);
NewtonSecondLaw m x fun t => -k * x t
m:ℝk:ℝ⊢ let ω := sqrt (k / m);
let x := fun t => sin (ω * t);
∀ (t : ℝ), m * (∂ (x_1:=t), (∂ x x_1) 1) 1 = (fun t => -k * x t) t
m:ℝk:ℝ⊢ ∀ (t : ℝ), m * (sqrt (k / m) * (sqrt (k / m) * sin (sqrt (k / m) * t))) = k * sin (sqrt (k / m) * t)
m:ℝk:ℝ⊢ ∀ (t : ℝ), m * sqrt (k * m⁻¹) ^ 2 * sin (sqrt (k * m⁻¹) * t) = k * sin (sqrt (k * m⁻¹) * t)
m:ℝk:ℝh:m * sqrt (k * m⁻¹) ^ 2 = k⊢ ∀ (t : ℝ), m * sqrt (k * m⁻¹) ^ 2 * sin (sqrt (k * m⁻¹) * t) = k * sin (sqrt (k * m⁻¹) * t)
All goals completed! 🐙
-
Show that \(u(t,x) = sin(x-t)\) is a solution to the wave equation
$$\frac{\partial^2 u}{\partial t^2} = \frac{\partial^2 u}{\partial x^2} $$
Solution
open SciLean Scalar
def WaveEquation (u : ℝ → ℝ → ℝ) :=
∀ x t, (∂ (∂ (u · x)) t) = (∂ (∂ (u t ·)) x)
example :
WaveEquation (fun t x => sin (x - t)) := ⊢ WaveEquation fun t x => sin (x - t)
⊢ ∀ (x t : ℝ),
(∂ (x_1:=t), (∂ (x_2:=x_1), (fun t x => sin (x - t)) x_2 x) 1) 1 =
(∂ (x:=x), (∂ (x:=x), (fun t x => sin (x - t)) t x) 1) 1
All goals completed! 🐙
-
solution to heat equation
2.1.3. Gradient
In many practical applications, we need to compute gradient instead of directional derivative. For a function \(f : \mathbb{R}^n \rightarrow \mathbb{R} \) the gradient of \(f\) is a vector of all its partial derivatives
$$\nabla f = \left(\frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right) $$
A more general way of defining gradient is through linear map transposition/adjoint. The derivative of a function f
at point x
is a linear map from X
to ℝ
#check (∂ f x)
∂ f x : X →L[ℝ] ℝ
To obtain gradient we take an adjoint and evaluate it at one. This is exactly how gradient is defined.
example : (∇ f x) = adjoint ℝ (∂ f x) 1 := X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx:Xg:ℝ × ℝ → ℝx₀:Xy:X⊢ ∇ f x = adjoint ℝ (⇑(∂ f x)) 1 All goals completed! 🐙
Few examples of computing gradients
#check ∇! x:=(0 : ℝ×ℝ), x.1
(1, 0) : ℝ × ℝ
#check ∇! (x:=x₀), ‖x‖₂²
2 • x₀ : X
#check ∇! (x:=x₀), ⟪x,y⟫
y : X
2.1.3.1. Exercises
-
Compute gradient of
x[0]
,‖x‖₂²
,⟪x,y⟫
forx y : Float^[3]
and gradient ofA[0,1]
,‖A‖₂²
,⟪A,B⟫
forA B : Float^[2,2]
. Also evaluate those results for some concrete values.
Solution
set_default_scalar Float
#eval ∇! (x:=⊞[1.0,2.0,3.0]), x[0]
#eval ∇! (x:=⊞[1.0,2.0,3.0]), ‖x‖₂²
#eval ∇! (x:=⊞[1.0,2.0,3.0]), ⟪x, ⊞[0.0,1.0,0.0]⟫
def matrix1 := ⊞[1.0,2.0;3.0,4.0]
#eval ∇! (A:=matrix1), A[0,1]
#eval ∇! (A:=matrix1), ‖A‖₂²
#eval ∇! (A:=matrix1), ⟪A, ⊞[0.0,1.0;0.0,0.0]⟫
-
Previously we computed \(\sqrt{y}\) using Newton's method. Similarly we can
mySqrt
Computesqrt y
using gradient descent by minimizing objective functionf := fun x => (x^2 - y)^2
-
Linear regression via gradient descent. Find matrix \( A \in \mathbb{R}^{2\times 2} \) that for given data \( x_i, y_i \in \mathbb{R}^2 \) minimizes
$$A = \text{argmin}_B \sum_i \| B x_i - y_i \|^2 $$
Solution
set_default_scalar Float
partial def linreg {n : ℕ} (x y : Float^[2]^[n]) :
Float^[2,2] := Id.run do
let loss := fun (A : Float^[2,2]) =>
∑ i, ‖A * x[i] - y[i]‖₂²
let rate := 1e-1
let mut A : Float^[2,2] := 0
let mut err := 1.0
while err > 1e-6 do
let ΔA := (rate • ∇ A':=A, loss A')
rewrite_by X:Type ?u.782inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝¹:Xg:ℝ × ℝ → ℝx₀:Xy✝:Xn:ℕx:Float^[2]^[n]y:Float^[2]^[n]loss:Float^[2, 2] → Float := fun A => ∑ i, ‖A * x[i] - y[i]‖₂²rate:Float := 0.1A✝:Float^[2, 2] := 0err✝:Float := 1.0col✝:Lean.Loop := Lean.Loop.mkx✝:Unitr✝:MProd (Float^[2, 2]) FloatA:Float^[2, 2] := r✝.fsterr:Float := r✝.snd| rate • ∇ (A':=A), ∑ i, ‖MatrixType.gemv 1 0 A' x[i] 0 - y[i]‖₂²; X:Type ?u.782inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝¹:Xg:ℝ × ℝ → ℝx₀:Xy✝:Xn:ℕx:Float^[2]^[n]y:Float^[2]^[n]loss:Float^[2, 2] → Float := fun A => ∑ i, ‖A * x[i] - y[i]‖₂²rate:Float := 0.1A✝:Float^[2, 2] := 0err✝:Float := 1.0col✝:Lean.Loop := Lean.Loop.mkx✝:Unitr✝:MProd (Float^[2, 2]) FloatA:Float^[2, 2] := r✝.fsterr:Float := r✝.snd| rate • ∑ i,
let zdf := x[i];
let zdf_1 := MatrixType.gemv 1 0 A zdf 0;
let ydg := y[i];
let ydg := zdf_1 - ydg;
let dy := 2 • ydg;
let dy := MatrixType.outerprodAdd 1 dy zdf 0;
dy
err := ‖ΔA‖₂
A := A - ΔA
return A
-
Write down Euler-Lagrange equation over abstract vector space
X
and show that for lagrangianL x v := 1/2 * m * ‖v‖₂² - φ x
the Euler-Langran equation ism * ∂ (∂ x) t = - ∇ φ x
Either define the Lagrangian over ℝ×ℝ
, L : ℝ×ℝ → ℝ×ℝ → ℝ
or you can introduce abstract vector space X
using this variable command
variable {X} [NormedAddCommGroup X] [AdjointSpace ℝ X] [CompleteSpace X]
The explanation of these typeclasses will be discussed in the last section "Abstract Vector Spaces".
Solution
set_default_scalar ℝ
noncomputable
def EulerLagrange (L : X → X → ℝ) (x : ℝ → X) (t : ℝ) :=
let v := ∂ x
∂ (t':=t), (∇ (v':=v t'), L (x t') v')
-
∇ (x':=x t), L x' (v t)
noncomputable
def NewtonsLaw (m : ℝ) (φ : X → ℝ) (x : ℝ → X) (t : ℝ) :=
m • (∂ (∂ x) t) + (∇ φ (x t))
example
(x : ℝ → X) (hx : ContDiff ℝ ⊤ x)
(φ : X → ℝ) (hφ : Differentiable ℝ φ) :
EulerLagrange (fun x v => m/2 * ‖v‖₂² - φ x) x t
=
NewtonsLaw m φ x t := X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝:Xg:ℝ × ℝ → ℝx₀:Xy:Xm:ℝt:ℝx:ℝ → Xhx:ContDiff ℝ ⊤ xφ:X → ℝhφ:Differentiable ℝ φ⊢ EulerLagrange (fun x v => m / 2 * ‖v‖₂² - φ x) x t = NewtonsLaw m φ x t
X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝:Xg:ℝ × ℝ → ℝx₀:Xy:Xm:ℝt:ℝx:ℝ → Xhx:ContDiff ℝ ⊤ xφ:X → ℝhφ:Differentiable ℝ φ⊢ (let v := fun x_1 => (∂ x x_1) 1;
(∂ (t':=t), (<∂ (v':=v t'), (fun x v => m / 2 * ‖v‖₂² - φ x) (x t') v').2 1) 1 -
(<∂ (x':=x t), (fun x v => m / 2 * ‖v‖₂² - φ x) x' (v t)).2 1) =
m • (∂ (x_1:=t), (∂ x x_1) 1) 1 + (<∂ φ (x t)).2 1
X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝:Xg:ℝ × ℝ → ℝx₀:Xy:Xm:ℝt:ℝx:ℝ → Xhx:ContDiff ℝ ⊤ xφ:X → ℝhφ:Differentiable ℝ φ⊢ (∂ (t':=t), m • (∂ x t') 1) 1 = m • (∂ (x_1:=t), (∂ x x_1) 1) 1
All goals completed! 🐙
2.1.4. Derivative Rules
A commong issue when fun_trans
is not doing what we expect is that there is a missing differentiation theorem.
For example, if we define a function
def foo (x : ℝ) := x^2 + x
then nothing happens when we try to differentiate it
#check ∂! x, foo x
fun x => (∂ (x:=x), foo x) 1 : ℝ → ℝ
Turning on the fun_trans
trace reveals useful information
set_option trace.Meta.Tactic.fun_trans true in
#check ∂! x, foo x
[Meta.Tactic.fun_trans] [❌] ∂ (x:=x), foo x [Meta.Tactic.fun_trans] candidate theorems for foo: [] [Meta.Tactic.fun_trans] candidate local theorems for foo: [] [Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv] [Meta.Tactic.fun_trans] [❌] applying: isContinuousLinearMap_fderiv [Meta.Tactic.fun_trans] isContinuousLinearMap_fderiv, failed to discharge hypotheses SciLean.IsContinuousLinearMap ℝ fun x => foo x
The ❌
on the first line signifies that fun_trans
failed to make prograss on ∂ (x:=x), foo x
. The next two lines
[Meta.Tactic.fun_trans] candidate theorems for foo: [] [Meta.Tactic.fun_trans] candidate local theorems for foo: []
states that there are no derivative theorems for foo
. The next line
[Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv]
states that there is a potentially applicable theorem isContinuousLinearMap_fderiv
which can differentiate linear functions. However the next few lines report that applying this theorem failed as fun_trans
can't prove that foo
is (continuous) linear map.
To remedy this problem we can define derivative of foo
def foo_deriv (x : ℝ) := 2*x + 1
and add a theorem that the derivative of foo
is equal to foo_deriv
open SciLean
@[fun_trans]
theorem foo_deriv_rule :
fderiv ℝ foo
=
fun x => fun dx =>L[ℝ] dx • foo_deriv x := ⊢ ∂ foo = fun x => fun dx =>L[ℝ] dx • foo_deriv x
⊢ ∂ x, (x ^ 2 + x) = fun x => fun dx =>L[ℝ] dx • (2 * x + 1); x:ℝ⊢ (∂ (x:=x), (x ^ 2 + x)) 1 = (fun dx =>L[ℝ] dx • (2 * x + 1)) 1; All goals completed! 🐙
Because foo_deriv_rule
is marked with fun_trans
attribute it will be used when we try to differentiate foo
now
#check ∂! x, foo x
fun x => foo_deriv x : ℝ → ℝ
Unfortuantelly there is one more issue, fun_trans
will do nothing when we try to compose foo
together
#check ∂! x, foo (foo x)
fun x => (∂ (x:=x), foo (foo x)) 1 : ℝ → ℝ
set_option trace.Meta.Tactic.fun_trans true in
#check ∂! x, foo (foo x)
... [Meta.Tactic.fun_trans] trying comp theorem SciLean.fderiv.comp_rule [Meta.Tactic.fun_trans] [❌] applying: SciLean.fderiv.comp_rule [Meta.Tactic.fun_trans] SciLean.fderiv.comp_rule, failed to discharge hypotheses Differentiable ℝ fun x0 => foo x0 ...
The trace reveals that fun_trans
tries to apply composition(chain) rule SciLean.fderiv.comp_rule
but it fails as it can't prove Differentiable ℝ fun x0 => foo x0
. We need another theorem stating that foo
is differentiable function. Mathlib has a tactic fun_prop
that can prove differentiability and many other function properties like linearity, continuity, measurability etc. and fun_trans
uses this tactic to ensure it can apply chain rule.
We need to add fun_prop
theorem for foo
@[fun_prop]
theorem foo_differentiable : Differentiable ℝ foo := ⊢ Differentiable ℝ foo ⊢ Differentiable ℝ fun x => x ^ 2 + x; All goals completed! 🐙
Now fun_trans
knows that foo
is differentiable function and can safely apply chain rule
#check (∂! x, foo (foo x)) rewrite_by | fun x => foo_deriv x * foo_deriv (foo x)
fun x => foo_deriv x * foo_deriv (foo x) : ℝ → ℝ
Writing these theorems by hand is quite tedious so there is a macro def_fun_prop
and def_fun_trans
to help you with writing these theorems
def_fun_prop : Differentiable ℝ foo by ⊢ Differentiable ℝ fun x => x ^ 2 + x; All goals completed! 🐙
def_fun_trans : fderiv ℝ foo by | ∂ x, (x ^ 2 + x); | fun x => fun x_1 =>L[ℝ] 2 * x_1 * x + x_1
It generates these theorems and definition
#check foo.arg_x.Differentiable_rule
#print foo.arg_x.fderiv
#check foo.arg_x.fderiv_rule
The problem of writing appropriate theorems for fun_trans
and fun_prop
is quite involved problem and will be discussed in future chapter.
2.1.5. Differentiating Division, Log, Sqrt, ...
So far we have worked with functions that are smooth. However, functions like division, log
, sqrt
, ‖·‖₂
are not differentiable everywhere. We have to be a bit careful with those functions because SciLean tries to perform computations that are, at least in principle, fully formalizable. Let's try to differentiate division
#check ∂! (x:ℝ), 1/x
fun x => (∂ (x:=x), x⁻¹) 1 : ℝ → ℝ
We did not get expected -x⁻²
. When differentiation, or any tactic, is not doing what we expect we can turn on tracing. Let's try again with tracing on
set_option trace.Meta.Tactic.fun_trans true in
#check ∂! (x:ℝ), 1/x
and the beggining of the trace is saying that fun_trans
tried to apply theorem HDiv.hDiv.arg_a0a1.fderiv_rule_at
however it failed to discharge x ≠ 0
[Meta.Tactic.fun_trans] [❌] ∂ (x:=x), 1 / x [Meta.Tactic.fun_trans] candidate theorems for HDiv.hDiv: [HDiv.hDiv.arg_a0a1.fderiv_rule_at] [Meta.Tactic.fun_trans] [❌] applying: HDiv.hDiv.arg_a0a1.fderiv_rule_at [Meta.Tactic.fun_trans] [❌] discharging: x ≠ 0 [Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at, failed to discharge hypotheses x ≠ 0
This makes sense as division 1/x
is well defined and differentiable only away from zero. Therefore we have to differentiate it at a concrete point that is not equal to zero.
variable (x₀ : ℝ) (hx₀ : x₀ ≠ 0)
set_option trace.Meta.Tactic.fun_trans true in
#check (∂ (x:=x₀), 1/x) rewrite_by fun_trans (disch:=All goals completed! 🐙)
-1 / x₀ ^ 2 : ℝ
We introduced a point x₀
and assumption hx₀
that it is not equal to zero. By default fun_trans
does not see this assumption and we have to provide discharger. A discharger is any tactic that tries to satisfy(discharge) any assumption of the theorems fun_trans
is using. In this simple case assumption
tactic is enough as it just looks through the local context and tries to directly apply any existing assumptions.
Using assumption
is not enough for a more complicated expression
variable (x₀ : ℝ) (hx₀ : x₀ ≠ 0)
#check (∂ (x:=x₀), 1/x^2) rewrite_by fun_trans (disch:=x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝhx₀:x₀ ≠ 0⊢ ∀ (x : ℝ), x ≠ 0)
(∂ (x:=x₀), (x ^ 2)⁻¹) 1 : ℝ
tracing shows
[Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at, failed to discharge hypotheses x₀ ^ 2 ≠ 0
We need a tactic that is capable of infering (x₀^2 ≠ 0)
from (x₀ ≠ 0)
. A very general and powerful tactic is aesop
variable (x₀ : ℝ) (hx₀ : x₀ ≠ 0)
#check (∂ (x:=x₀), 1/x^2) rewrite_by x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝhx₀:x₀ ≠ 0| -(2 * x₀) / (x₀ ^ 2) ^ 2
-(2 * x₀) / (x₀ ^ 2) ^ 2 : ℝ
In multiple dimensions we often want to differentiate the norm
open SciLean Scalar
variable (x₀ : ℝ×ℝ) (hx₀ : x₀ ≠ 0)
#check (∇ (x:=x₀), ‖x‖₂) rewrite_by x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0| (<∂ (x:=x₀), ‖x‖₂[ℝ]).2 1; fun_trans (disch:=All goals completed! 🐙)
The fact that norm is not differentiable can cause issues. The common practice when writing numerical algorithms is to regularize norm using a small positive ε
.
open SciLean Scalar
variable (ε : ℝ) (hε : 0 < ε)
#check (∂ (x:ℝ×ℝ), sqrt (ε + ‖x‖₂²)) rewrite_by
x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε| ∂ x, sqrt (ε + ‖x‖₂²)
fun_trans (disch:=x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < εx:ℝ × ℝ⊢ ε + ‖x‖₂² ≠ 0; All goals completed! 🐙)
fun w => fun x =>L[ℝ] 2 * ⟪x, w⟫ / (2 * sqrt (ε + ‖w‖₂²)) : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ
2.1.5.1. Exercises
-
gradient of energy for n-body system, newton's potential, lenard jones potential
-
do it for two particles
-
do it for n-particles
-
-
signed distance function
-
compute gradient of sphere sdf
-
compute mean and gaussian curvature of sphere
-
pick SDF from https://iquilezles.org/articles/distfunctions/
-
most of them involve functions that are not differentiable everywhere
-
compute derivative in unsafe mode
-
figure out the minimal condition under which it is differentiable
-
-
2.1.6. Abstract Vector Spaces
In calculus we usually consider only functions \((f : \mathbb{R}^n \rightarrow \mathbb{R}^m) \). The issue is that on computers the type \( \mathbb{R}^n \) can have multiple different realizations. For example \( \mathbb{R}^3 \) can be modeled by Float×Float×Float
, Float^[3]
or Float×Float^[2]
. They are all equivalent but in code we have to explicitely convert between these types. For this reason it is better to work with abstract vector spaces instead of with \( \mathbb{R}^n \).
Fortunately mathlib's derivative fderiv
is already defined for a function (f : X → Y)
between two abstract vector spaces X
and Y
over a field 𝕜
. Mathlib's way of introducing an abstract vector space is rather involved and we need to spend some time talking about it. This presentation will be rather simplified. For interested reader we provide references at the end of this section that go over mathlib's algebraic stracutes in more detail.
A vector space X
is a set with operations +,-,•,0
such that
∀ (x y z : X), x + (y + z) = (x + y) + z ∀ (x y : X), x + y = y + x ∀ (x : X), x + 0 = 0 ∀ (x : X), x + (-x) = 0 ∀ (r s : 𝕜) (x : X), r • (s • x) = (r * s) • x( ∀ (x : X), 1 • x = x ∀ (r : 𝕜) (x y : X), r • (x + y) = r • x + r • y ∀ (r s : 𝕜) (x : X), (r + s) • x = r • x + s • x
in mathlib the axioms talking about addition and negation are captured by the type class AddCommGroup
and the aximps talking about scalar multiplication are captured by the type class Module
. Therefore if we want to introduce a new abstract vector space over a field R
we have to introduce these variables
variable
{𝕜 : Type} [Field 𝕜]
{X : Type} [AddCommGroup X] [Module 𝕜 X]
example (s r : 𝕜) (x y : X) :
(s + r) • (x + y) = s • x + r • x + s • y + r • y := x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝²:Field 𝕜X:Typeinst✝¹:AddCommGroup Xinst✝:Module 𝕜 Xs:𝕜r:𝕜x:Xy:X⊢ (s + r) • (x + y) = s • x + r • x + s • y + r • y
All goals completed! 🐙
When we want to differentiate a function (f : X → Y)
between two vector spaces we also need that X
and Y
are equiped with a norm. For this purpose there is NormedAddCommGroup
which equips AddCommGroup
with a norm and guarantees that it compatible with addition and negation, and NormedSpace
which equips Module
with a norm and guarentees that it is compatible with scalar multiplication. Furthermore, we have to restric to a filed 𝕜
that is either real numbers ℝ
or complex numbers ℂ
. The type class RCLike
states exactly that. Therefore when we work with derivative in general setting the code usually looks like this
variable
{𝕜 : Type} [RCLike 𝕜]
{X : Type} [NormedAddCommGroup X] [NormedSpace 𝕜 X]
{Y : Type} [NormedAddCommGroup Y] [NormedSpace 𝕜 Y]
set_default_scalar 𝕜
example (f g : X → Y) (hf : Differentiable 𝕜 f) (hg : Differentiable 𝕜 g) :
(∂ x, (f x + g x)) = (∂ f) + (∂ g) := x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁴:RCLike 𝕜X:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace 𝕜 XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace 𝕜 Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 g⊢ ∂ x, (f x + g x) = ∂ f + ∂ g x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁴:RCLike 𝕜X:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace 𝕜 XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace 𝕜 Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xdx:X⊢ (∂ (x:=x), (f x + g x)) dx = ((∂ f + ∂ g) x) dx; All goals completed! 🐙
When working with gradients we also need inner product as adjoint
is defined through inner product. Unfortunately, here we diverge from mathlib a little bit. Mathlib defines InnerProductSpace
which equips NormedSpace
with inner product. Understandably InnerProductSpace
requires that the ⟪x,x⟫ = ‖x‖²
however mathlib made the unfortunate decision by definin norm on produce spaces as ‖(x,y)‖ = max ‖x‖ ‖y‖
which is incompatible with the inner product structure. Therefore type like ℝ×ℝ
can't be equiped with InnerProductSpace
. Because of these issues, SciLean introduces AdjointSpace
which is almost identical to InnerProductSpace
but it only requires that the norm induced by inner product is equivalend to the existing norm i.e. ∃ (c d : ℝ⁺), ∀ x, c * ⟪x,x⟫ ≤ ‖x‖^2 ≤ d * ⟪x,x⟫
. On AdjointSpace
SciLean introduces L₂-norm ‖x‖₂ := sqrt ⟪x,x⟫
which you have seen already and it is the norm you most likely want to use instead of the default norm ‖x‖
. Therfore when we work with gradient in general setting the code usually looks like this
open SciLean
variable
{𝕜 : Type} [RCLike 𝕜]
{X : Type} [NormedAddCommGroup X] [AdjointSpace 𝕜 X] [CompleteSpace X]
{Y : Type} [NormedAddCommGroup Y] [AdjointSpace 𝕜 Y] [CompleteSpace Y]
set_default_scalar 𝕜
example (f g : X → Y) (hf : Differentiable 𝕜 f) (hg : Differentiable 𝕜 g) :
(∇ x, (f x + g x)) = (∇ f) + (∇ g) := x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 g⊢ ∇ x, (f x + g x) = ∇ f + ∇ g
x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xx✝:Y⊢ ∇ (x:=x;x✝), (f x + g x) = (∇ f + ∇ g) x x✝; x₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xx✝:Y⊢ let ydf := <∂ f x;
let ydg := <∂ g x;
ydf.2 x✝ + ydg.2 x✝ = (<∂ f x).2 x✝ + (<∂ g x).2 x✝; All goals completed! 🐙
For interested reader we recommend reading the chapter Hierachies from Mathematics in Lean which explains how mathlib approaches algebraic hierachies like monoids, groups or modules. After reading that we recommend reading Differential Calculus in Normed Spaces which how NormedSpace
is structured.