2.1. Symbolic Differentiation

Differentiation is at the heart of many problems in scientific computing, from optimizing functions to solving differential equations. Computing derivatives is a mechanical process but very error prone when done manually. In this chapter, we will explore how we can compute derivatives of expressions and programs automatically.

Let's recall what derivative is, for a function f : ℝ → ℝ we define it derivative as

$$f'(x) = \lim_{h → 0} \frac{f(x+h) - f(x)}{h} $$

The mathematical library Mathlib defines derivative fderiv for a general function f : X → Y from some vector spaces X and Y. The derivative of f at point x : X in direction dx : X is

$$\texttt{fderiv ℝ f x dx} = \lim_{h → 0} \frac{f(x+h\cdot dx) - f(x)}{h} $$

SciLean provides tactic fun_trans which can compute derivatives. The name fun_trans stands for 'function transformation' and differentiations is just one example of function transformation. We will talk about general function transformations later in the book.

One of the simplest examples of a derivative is the derivative of \(x^n\) which is equal to \(n x^{n-1}\). Let's compute it using fun_trans tactic

↑n * x ^ (n - 1) : ℝ#check (fderiv (fun (x : ) => x^n) x 1) rewrite_by n:x:| n * x ^ (n - 1)
↑n * x ^ (n - 1) : ℝ

2.1.1. Notation

Writing fderiv ℝ (fun x => f x) is somewhat tedious so SciLean makes our life easier by introducing a nice notation ∂ x, f x for differentiating (f x) w.r.t. x.

Before we explore this notation further we have to mention that fderiv can also compute complex derivatives with @fderiv instead of @fderiv . However, most of the time we work exclusively with real derivative so we can inform Lean that the default choce of the scalar field are real numbers using the following command

set_default_scalar

Now Lean knows that we want real derivative when we write ∂ x, f x.

Using this notation we can compute again the above derivative

fun x => ↑n * x ^ (n - 1) : ℝ → ℝ#check ( (x : ), x^n) rewrite_by n:x₀:u₀: × du₀: × | fun x => ↑n * x ^ (n - 1)
fun x => ↑n * x ^ (n - 1) : ℝ → ℝ

Because we did not specify the point where we want to compute the derivative we obtained a function in x. We can specify the point where we want to compute the derivative with ∂ (x:=x₀), ...

↑n * x₀ ^ (n - 1) : ℝ#check ( (x:=x₀), x^n) rewrite_by n:x₀:u₀: × du₀: × | n * x₀ ^ (n - 1)
↑n * x₀ ^ (n - 1) : ℝ

Writing rewrite_by fun_trans every time we want to diferentiate an expression gets a bit tedious. We can add an exclamation mark after to indicate that we want to run fun_trans tactic to compute the derivative.

fun x => ↑n * x ^ (n - 1) : ℝ → ℝ#check (∂! (x : ), x^n)
fun x => ↑n * x ^ (n - 1) : ℝ → ℝ

We can differentiate w.r.t to a vector valued variable (x : ℝ×ℝ)

fun x => fun dx =>L[ℝ] 2 * ⟪dx, x⟫ : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ#check ∂! (x : ×), x‖₂²
fun x => fun dx =>L[ℝ] 2 * ⟪dx, x⟫ : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ

For derivatives w.r.t. a vector valued variable we have to also specify the direction in which we differentiate. Therefore in the above we obtained derivative as a function of the position x and direction dx. Furthermore, the notation fun dx =>L[ℝ] ... indicates that the function is linear function in dx and similarly X →L[ℝ] Y stands for the space of ℝ-linear functions from X to Y.

If we want to specify the position and the direction in which we want to compute the derivatives we use the notation ∂ (x:=x₀;dx₀), f x

2 * ⟪du₀, u₀⟫ : ℝ#check ∂! (u:=u₀;du₀), u‖₂²
2 * ⟪du₀, u₀⟫ : ℝ

To summarize all the different variants. For function of a scalar valued argument

∂ f : ℝ → ℝ#check f ∂ x, f x : ℝ → ℝ#check x, f x ∂ (x:=x₀), f x : ℝ#check (x:=x₀), f x

For function of a vector valued argument

∂ f : ℝ → ℝ#check f ∂ u, g u : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ#check u, g u ∂ (u:=u₀), g u : ℝ × ℝ →L[ℝ] ℝ#check (u:=u₀), g u (∂ (u:=u₀), g u) du₀ : ℝ#check (u:=u₀;du₀), g u

There is nothing stopping us from applying derivative multiple times to compute higher order derivatives

fun x => 2 : ℝ → ℝ#check (∂! (∂! (x:), x^2))
fun x => 2 : ℝ → ℝ

2.1.1.1. Exercises

  1. Express first derivative of f : ℝ → ℝ → ℝ in the first and the second argument. Also express derivative in both arguments at the same time.

Solutionvariable (f : ) (x₀ y₀ : ) -- first argument ∂ (x:=x₀), f x y₀ : ℝ#check (x:=x₀), (f x y₀) ∂ (x:=x₀), f x y₀ : ℝ#check (f · y₀) x₀ -- second agument ∂ (y:=y₀), f x₀ y : ℝ#check (y:=y₀), (f x₀ y) ∂ (x:=y₀), f x₀ x : ℝ#check (f x₀ ·) y₀ -- both arguments ∂ (x:=(x₀, y₀)), match x with | (x, y) => f x y : ℝ × ℝ →L[ℝ] ℝ#check ((x,y):=(x₀,y₀)), f x y
  1. For (g : ℝ×ℝ → ℝ), express derivative of g (x,y) in x.

Solutionvariable (g : × ) (x₀ y₀ : ) (∂ (xy:=(x₀, y₀)), g xy) (1, 0) : ℝ#check (xy:=(x₀,y₀);(1,0)), g xy (∂ g (x₀, y₀)) (1, 0) : ℝ#check g (x₀,y₀) (1,0) ∂ (x:=x₀), g (x, y₀) : ℝ#check (x:=x₀), g (x,y₀)
  1. Express second derivatives of f : ℝ → ℝ → ℝ in the first and the second argument.

Solutionvariable (f : ) (x₀ y₀ : ) ∂ (x':=x₀), ∂ (x'':=x'), f x'' y₀ : ℝ#check (x':= x₀), (x'':=x'), (f x'' y₀) ∂ (∂ x, f x y₀) x₀ : ℝ#check ( (f · y₀)) x₀ ∂ (y':=y₀), ∂ (y'':=y'), f x₀ y'' : ℝ#check (y':=y₀), (y'':=y'), (f x₀ y'') ∂ (∂ x, f x₀ x) y₀ : ℝ#check ( (f x₀ ·)) y₀
  1. Let \(L(t,x)\) be a function of time and space and y(t) a function of time. Express \( \frac{d}{dt} L(t, y(t)) \) and \( \frac{\partial}{\partial t} L(t, y(t)) \) in Lean. What is the difference between these two expressions?

Solutionvariable (L : ) (y : ) (t : ) -- d/dt L ∂ (t':=t), L t' (y t') : ℝ#check (t':=t), L t' (y t') -- ∂/∂t L ∂ (t':=t), L t' (y t) : ℝ#check (t':=t), L t' (y t)

Because SciLean's notation forces you to be a bit more explicit, there is no need to distinguish between \( \frac{d}{dt} \) and \( \frac{\partial}{\partial t} \). Lots of pain and suffering has been infliced on generations of physics students because of the ambiguity of partial derivative notation.

  1. Express one dimensional Euler-Lagrange equation in Lean

$$\frac{d}{dt} \frac{\partial L}{\partial \dot x}(x(t),\dot x(t)) - \frac{\partial L}{\partial x}(x(t), \dot x(t)) $$

Solutionvariable (L : ) (x : ) (t : ) let v := ∂ x; ∂ (t':=t), ∂ (v':=v t'), L (x t') v' - ∂ (x':=x t), L x' (v t) : ℝ#check let v := x (t':=t), ( (v':=v t'), L (x t') v') - (x':=x t), L x' (v t)

2.1.2. Examples

Let's put computing derivatives to some practical use. We will demonstrate how to use SciLean symbolic differentiations to solve few common tasks in scientific computing and physics.

2.1.2.1. Newton's solver

Solving non-linear equation \( f(x) = 0 \) is a very common problem in scientific computing. Often this can be done only approximately and a popular tool to do so is Newton's method. It is an interative process that starts with an initial guess \(x₀\) which is incrementally improved by the following rule

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} $$

We can use Newton's method to compute sqruare root of \(y\) by choosing \( f(x) = x^2 - y\).

def mySqrt (steps : Nat) (y : Float) : Float := Id.run do let f := fun x => x^2 - y let mut x := 1.0 for _ in [0:steps] do x := x - f x / (∂! x':=x, f x') return x 1.414214#eval mySqrt 10 2.0
1.414214

You might feel a bit unconfortable here are we are differentiating a function defined on floating point numbers. If you think that can't be formally correct then you are right. We will discuss this issue in a later chapter "Working with Real Numbers".

2.1.2.1.1. Exercises

  1. try to solve different equations, for example exp x = y to obtain log, x*exp x = y to obtain Lambert W function or some polynomial.

  2. measure relative,\(\left| \frac{f(x_n)}{x_n} \right| \), and absolute error \( \left| f(x_n) \right| \) and use them for stopping criteria.

  3. A difficult exercise is to define a general newtonSolve function that takes an arbitrary function f : Float → Float and uring elaboration synthesizes its derivative. Add multiple hints, 1. use infer_var trick, 2. state explicitly how the arguments should look like

Solutionset_default_scalar Float def newtonSolve (steps : Nat) (x₀ : Float) (f : Float Float) {f' : Float Float} (hf : f' = ( f) := by unfold deriv; fun_trans; infer_var) : Float := Id.run do let mut x := x₀ for _ in [0:steps] do x := x - f x / f' x return x 1.414214#eval newtonSolve 10 1.0 (fun x => x^2 - 2.0)

2.1.2.2. Kinematics

We can also use SciLean's symbolic differentiation to prove some basic theorems from physics. For example we can state the second Newton's law

def NewtonSecondLaw (m : ) (x : ) (F : ) : Prop := t, m * deriv ( x) t = F t

saying that for a particle with mass m under the influence of force F has trajectory x if the mass times the acceleration deriv (∂ x) t, i.e. the second derivative of trajectory, is equal to the force F t.

We can show that under constant force f a particle with mass m has trajectory (fun t => 1/2 * f/m * t^2)

example (m f : ) (hm : m 0) : NewtonSecondLaw m (fun t => f/(2*m) * t^2) (fun _ => f) := m:f:hm:m0NewtonSecondLaw m (fun t => f / (2 * m) * t ^ 2) fun x => f m:f:hm:m0∀ (t : ), m * ∂ (∂ t, f / (2 * m) * t ^ 2) t = (fun x => f) t -- compute derivatives m:f:hm:m0m * (f / (2 * m) * 2) = f -- finish algebraic simplifications m:f:hm:m0m * (f * 2) = f * (2 * m); All goals completed! 🐙

2.1.2.2.1. Exercises

  1. show that trajectory x := fun t => (cos t, sin t) satisfies differential equation ∂ x t = (- (x t).2, (x t).1)

Solutionopen SciLean Scalar def ode (x : ×) := t, deriv x t = (- (x t).2, (x t).1) example : ode (fun t => (cos t, sin t)) := ode fun t => (cos t, sin t) ∀ (t : ), (∂ (t:=t), (cos t, sin t)) 1 = (-((fun t => (cos t, sin t)) t).2, ((fun t => (cos t, sin t)) t).1); All goals completed! 🐙
  1. Show that trajectory \(x(t) = \sin(\omega t) \) corresponds to the force \(f(x) = - k x \) with \(\omega = \sqrt{(k/m)} \)

Hint

After differentiation you will have to show that \(m \sqrt{\frac{k}{m}}^2 = k\). Unfortunatelly Lean is not yet very powerful computer algebra system. So you can finish the proof with

  ring_nf --  m * (sqrt (k / m) * (sqrt (k / m) ==> m * sqrt (k * m⁻¹) ^ 2
  have h : m * sqrt (k * m⁻¹) ^ 2 = k := sorry_proof
  simp[h]

where we call ring_nf to clean up the expression, then we just assume that m * sqrt (k * m⁻¹) ^ 2 is equal to k and finally we can finish the proof by running simp

Solutionopen SciLean Scalar example (m k : ) : let ω := sqrt (k/m) let x := fun t => sin (ω*t) NewtonSecondLaw m x (fun t => - k*x t) := m:k:let ω := sqrt (k / m); let x := fun t => sin (ω * t); NewtonSecondLaw m x fun t => -k * x t m:k:let ω := sqrt (k / m); let x := fun t => sin (ω * t); ∀ (t : ), m * (∂ (x_1:=t), (∂ x x_1) 1) 1 = (fun t => -k * x t) t m:k:∀ (t : ), m * (sqrt (k / m) * (sqrt (k / m) * sin (sqrt (k / m) * t))) = k * sin (sqrt (k / m) * t) m:k:∀ (t : ), m * sqrt (k * m⁻¹) ^ 2 * sin (sqrt (k * m⁻¹) * t) = k * sin (sqrt (k * m⁻¹) * t) m:k:h:m * sqrt (k * m⁻¹) ^ 2 = k∀ (t : ), m * sqrt (k * m⁻¹) ^ 2 * sin (sqrt (k * m⁻¹) * t) = k * sin (sqrt (k * m⁻¹) * t) All goals completed! 🐙
  1. Show that \(u(t,x) = sin(x-t)\) is a solution to the wave equation

$$\frac{\partial^2 u}{\partial t^2} = \frac{\partial^2 u}{\partial x^2} $$

Solutionopen SciLean Scalar def WaveEquation (u : ) := x t, ( ( (u · x)) t) = ( ( (u t ·)) x) example : WaveEquation (fun t x => sin (x - t)) := WaveEquation fun t x => sin (x - t) ∀ (x t : ), (∂ (x_1:=t), (∂ (x_2:=x_1), (fun t x => sin (x - t)) x_2 x) 1) 1 = (∂ (x:=x), (∂ (x:=x), (fun t x => sin (x - t)) t x) 1) 1 All goals completed! 🐙
  1. solution to heat equation

2.1.3. Gradient

In many practical applications, we need to compute gradient instead of directional derivative. For a function \(f : \mathbb{R}^n \rightarrow \mathbb{R} \) the gradient of \(f\) is a vector of all its partial derivatives

$$\nabla f = \left(\frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right) $$

A more general way of defining gradient is through linear map transposition/adjoint. The derivative of a function f at point x is a linear map from X to

∂ f x : X →L[ℝ] ℝ#check ( f x)
∂ f x : X →L[ℝ] ℝ

To obtain gradient we take an adjoint and evaluate it at one. This is exactly how gradient is defined.

example : ( f x) = adjoint ( f x) 1 := X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace Xinst✝:CompleteSpace Xf:Xx:Xg: × x₀:Xy:Xf x = adjoint (⇑(∂ f x)) 1 All goals completed! 🐙

Few examples of computing gradients

(1, 0) : ℝ × ℝ#check ∇! x:=(0 : ×), x.1
(1, 0) : ℝ × ℝ
2 • x₀ : X#check ∇! (x:=x₀), x‖₂²
2 • x₀ : X
y : X#check ∇! (x:=x₀), x,y
y : X

2.1.3.1. Exercises

  1. Compute gradient of x[0], ‖x‖₂², ⟪x,y⟫ for x y : Float^[3] and gradient of A[0,1], ‖A‖₂², ⟪A,B⟫ for A B : Float^[2,2]. Also evaluate those results for some concrete values.

Solutionset_default_scalar Float ⊞[1.000000, 0.000000, 0.000000]#eval ∇! (x:=⊞[1.0,2.0,3.0]), x[0] ⊞[2.000000, 4.000000, 6.000000]#eval ∇! (x:=⊞[1.0,2.0,3.0]), x‖₂² ⊞[0.000000, 1.000000, 0.000000]#eval ∇! (x:=⊞[1.0,2.0,3.0]), x, ⊞[0.0,1.0,0.0] def matrix1 := ⊞[1.0,2.0;3.0,4.0] ⊞[0.000000, 1.000000, 0.000000, 0.000000]#eval ∇! (A:=matrix1), A[0,1] ⊞[2.000000, 4.000000, 6.000000, 8.000000]#eval ∇! (A:=matrix1), A‖₂² ⊞[0.000000, 1.000000, 0.000000, 0.000000]#eval ∇! (A:=matrix1), A, ⊞[0.0,1.0;0.0,0.0]
  1. Previously we computed \(\sqrt{y}\) using Newton's method. Similarly we can mySqrt Compute sqrt y using gradient descent by minimizing objective function f := fun x => (x^2 - y)^2

  1. Linear regression via gradient descent. Find matrix \( A \in \mathbb{R}^{2\times 2} \) that for given data \( x_i, y_i \in \mathbb{R}^2 \) minimizes

$$A = \text{argmin}_B \sum_i \| B x_i - y_i \|^2 $$

Solutionset_default_scalar Float partial def linreg {n : } (x y : Float^[2]^[n]) : Float^[2,2] := Id.run do let loss := fun (A : Float^[2,2]) => i, A * x[i] - y[i]‖₂² let rate := 1e-1 let mut A : Float^[2,2] := 0 let mut err := 1.0 while err > 1e-6 do let ΔA := (rate A':=A, loss A') rewrite_by X:Type ?u.782inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace Xinst✝:CompleteSpace Xf:Xx✝¹:Xg: × x₀:Xy✝:Xn:x:Float^[2]^[n]y:Float^[2]^[n]loss:Float^[2, 2] → Float := fun A => ∑ i, ‖A * x[i] - y[i]‖₂²rate:Float := 0.1A✝:Float^[2, 2] := 0err✝:Float := 1.0col✝:Lean.Loop := Lean.Loop.mkx✝:Unitr✝:MProd (Float^[2, 2]) FloatA:Float^[2, 2] := r✝.fsterr:Float := r✝.snd| rate∇ (A':=A), ∑ i, ‖MatrixType.gemv 1 0 A' x[i] 0 - y[i]‖₂²; X:Type ?u.782inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace Xinst✝:CompleteSpace Xf:Xx✝¹:Xg: × x₀:Xy✝:Xn:x:Float^[2]^[n]y:Float^[2]^[n]loss:Float^[2, 2] → Float := fun A => ∑ i, ‖A * x[i] - y[i]‖₂²rate:Float := 0.1A✝:Float^[2, 2] := 0err✝:Float := 1.0col✝:Lean.Loop := Lean.Loop.mkx✝:Unitr✝:MProd (Float^[2, 2]) FloatA:Float^[2, 2] := r✝.fsterr:Float := r✝.snd| ratei, let zdf := x[i]; let zdf_1 := MatrixType.gemv 1 0 A zdf 0; let ydg := y[i]; let ydg := zdf_1 - ydg; let dy := 2ydg; let dy := MatrixType.outerprodAdd 1 dy zdf 0; dy err := ΔA‖₂ A := A - ΔA return A
  1. Write down Euler-Lagrange equation over abstract vector space X and show that for lagrangian L x v := 1/2 * m * ‖v‖₂² - φ x the Euler-Langran equation is m * ∂ (∂ x) t = - ∇ φ x

Either define the Lagrangian over ℝ×ℝ, L : ℝ×ℝ → ℝ×ℝ → ℝ or you can introduce abstract vector space X using this variable command

variable {X} [NormedAddCommGroup X] [AdjointSpace ℝ X] [CompleteSpace X]

The explanation of these typeclasses will be discussed in the last section "Abstract Vector Spaces".

Solutionset_default_scalar noncomputable def EulerLagrange (L : X X ) (x : X) (t : ) := let v := x (t':=t), ( (v':=v t'), L (x t') v') - (x':=x t), L x' (v t) noncomputable def NewtonsLaw (m : ) (φ : X ) (x : X) (t : ) := m ( ( x) t) + ( φ (x t)) declaration uses 'sorry'example (x : X) (hx : ContDiff x) (φ : X ) ( : Differentiable φ) : EulerLagrange (fun x v => m/2 * v‖₂² - φ x) x t = NewtonsLaw m φ x t := X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace Xinst✝:CompleteSpace Xf:Xx✝:Xg: × x₀:Xy:Xm:t:x:Xhx:ContDiff xφ:X:Differentiable φEulerLagrange (fun x v => m / 2 * v‖₂² - φ x) x t = NewtonsLaw m φ x t X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace Xinst✝:CompleteSpace Xf:Xx✝:Xg: × x₀:Xy:Xm:t:x:Xhx:ContDiff xφ:X:Differentiable φ(let v := fun x_1 => (∂ x x_1) 1; (∂ (t':=t), (<∂ (v':=v t'), (fun x v => m / 2 * v‖₂² - φ x) (x t') v').2 1) 1 - (<∂ (x':=x t), (fun x v => m / 2 * v‖₂² - φ x) x' (v t)).2 1) = m(∂ (x_1:=t), (∂ x x_1) 1) 1 + (<∂ φ (x t)).2 1 X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace Xinst✝:CompleteSpace Xf:Xx✝:Xg: × x₀:Xy:Xm:t:x:Xhx:ContDiff xφ:X:Differentiable φ(∂ (t':=t), m(∂ x t') 1) 1 = m(∂ (x_1:=t), (∂ x x_1) 1) 1 All goals completed! 🐙

2.1.4. Derivative Rules

A commong issue when fun_trans is not doing what we expect is that there is a missing differentiation theorem.

For example, if we define a function

def foo (x : ) := x^2 + x

then nothing happens when we try to differentiate it

fun x => (∂ (x:=x), foo x) 1 : ℝ → ℝ#check ∂! x, foo x
fun x => (∂ (x:=x), foo x) 1 : ℝ → ℝ

Turning on the fun_trans trace reveals useful information

set_option trace.Meta.Tactic.fun_trans true in fun x => (∂ (x:=x), foo x) 1 : ℝ → ℝ#check ∂! x, foo x
[Meta.Tactic.fun_trans] [❌] ∂ (x:=x), foo x
  [Meta.Tactic.fun_trans] candidate theorems for foo: []
  [Meta.Tactic.fun_trans] candidate local theorems for foo: []
  [Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv]
  [Meta.Tactic.fun_trans] [❌] applying: isContinuousLinearMap_fderiv
    [Meta.Tactic.fun_trans] isContinuousLinearMap_fderiv, failed to discharge hypotheses
          SciLean.IsContinuousLinearMap ℝ fun x => foo x

The on the first line signifies that fun_trans failed to make prograss on ∂ (x:=x), foo x. The next two lines

  [Meta.Tactic.fun_trans] candidate theorems for foo: []
  [Meta.Tactic.fun_trans] candidate local theorems for foo: []

states that there are no derivative theorems for foo. The next line

  [Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv]

states that there is a potentially applicable theorem isContinuousLinearMap_fderiv which can differentiate linear functions. However the next few lines report that applying this theorem failed as fun_trans can't prove that foo is (continuous) linear map.

To remedy this problem we can define derivative of foo

def foo_deriv (x : ) := 2*x + 1

and add a theorem that the derivative of foo is equal to foo_deriv

open SciLean @[fun_trans] theorem foo_deriv_rule : fderiv foo = fun x => fun dx =>L[] dx foo_deriv x := foo = fun x => fun dx =>L[] dxfoo_deriv x x, (x ^ 2 + x) = fun x => fun dx =>L[] dx(2 * x + 1); x:(∂ (x:=x), (x ^ 2 + x)) 1 = (fun dx =>L[] dx(2 * x + 1)) 1; All goals completed! 🐙

Because foo_deriv_rule is marked with fun_trans attribute it will be used when we try to differentiate foo now

fun x => foo_deriv x : ℝ → ℝ#check ∂! x, foo x
fun x => foo_deriv x : ℝ → ℝ

Unfortuantelly there is one more issue, fun_trans will do nothing when we try to compose foo together

fun x => (∂ (x:=x), foo (foo x)) 1 : ℝ → ℝ#check ∂! x, foo (foo x)
fun x => (∂ (x:=x), foo (foo x)) 1 : ℝ → ℝ
set_option trace.Meta.Tactic.fun_trans true in fun x => (∂ (x:=x), foo (foo x)) 1 : ℝ → ℝ#check ∂! x, foo (foo x)
...
  [Meta.Tactic.fun_trans] trying comp theorem SciLean.fderiv.comp_rule
  [Meta.Tactic.fun_trans] [❌] applying: SciLean.fderiv.comp_rule
    [Meta.Tactic.fun_trans] SciLean.fderiv.comp_rule, failed to discharge hypotheses
          Differentiable ℝ fun x0 => foo x0
...

The trace reveals that fun_trans tries to apply composition(chain) rule SciLean.fderiv.comp_rule but it fails as it can't prove Differentiable fun x0 => foo x0. We need another theorem stating that foo is differentiable function. Mathlib has a tactic fun_prop that can prove differentiability and many other function properties like linearity, continuity, measurability etc. and fun_trans uses this tactic to ensure it can apply chain rule.

We need to add fun_prop theorem for foo

@[fun_prop] theorem foo_differentiable : Differentiable foo := Differentiable foo Differentiable fun x => x ^ 2 + x; All goals completed! 🐙

Now fun_trans knows that foo is differentiable function and can safely apply chain rule

fun x => foo_deriv x * foo_deriv (foo x) : ℝ → ℝ#check (∂! x, foo (foo x)) rewrite_by | fun x => foo_deriv x * foo_deriv (foo x)
fun x => foo_deriv x * foo_deriv (foo x) : ℝ → ℝ

Writing these theorems by hand is quite tedious so there is a macro def_fun_prop and def_fun_trans to help you with writing these theorems

def_fun_prop : Differentiable foo by Differentiable fun x => x ^ 2 + x; All goals completed! 🐙 def_fun_trans : fderiv foo by | x, (x ^ 2 + x); | fun x => fun x_1 =>L[] 2 * x_1 * x + x_1

It generates these theorems and definition

foo.arg_x.Differentiable_rule : Differentiable ℝ foo#check foo.arg_x.Differentiable_rule def foo.arg_x.fderiv : ℝ → ℝ →L[ℝ] ℝ := fun x => fun x_1 =>L[ℝ] 2 * x_1 * x + x_1#print foo.arg_x.fderiv foo.arg_x.fderiv_rule : ∂ foo = foo.arg_x.fderiv#check foo.arg_x.fderiv_rule

The problem of writing appropriate theorems for fun_trans and fun_prop is quite involved problem and will be discussed in future chapter.

2.1.5. Differentiating Division, Log, Sqrt, ...

So far we have worked with functions that are smooth. However, functions like division, log, sqrt, ‖·‖₂ are not differentiable everywhere. We have to be a bit careful with those functions because SciLean tries to perform computations that are, at least in principle, fully formalizable. Let's try to differentiate division

fun x => (∂ (x:=x), x⁻¹) 1 : ℝ → ℝ#check ∂! (x:), 1/x
fun x => (∂ (x:=x), x⁻¹) 1 : ℝ → ℝ

We did not get expected -x⁻². When differentiation, or any tactic, is not doing what we expect we can turn on tracing. Let's try again with tracing on

set_option trace.Meta.Tactic.fun_trans true in fun x => (∂ (x:=x), x⁻¹) 1 : ℝ → ℝ#check ∂! (x:), 1/x

and the beggining of the trace is saying that fun_trans tried to apply theorem HDiv.hDiv.arg_a0a1.fderiv_rule_at however it failed to discharge x ≠ 0

[Meta.Tactic.fun_trans] [❌] ∂ (x:=x), 1 / x
  [Meta.Tactic.fun_trans] candidate theorems for HDiv.hDiv: [HDiv.hDiv.arg_a0a1.fderiv_rule_at]
  [Meta.Tactic.fun_trans] [❌] applying: HDiv.hDiv.arg_a0a1.fderiv_rule_at
    [Meta.Tactic.fun_trans] [❌] discharging: x ≠ 0
    [Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at, failed to discharge hypotheses
          x ≠ 0

This makes sense as division 1/x is well defined and differentiable only away from zero. Therefore we have to differentiate it at a concrete point that is not equal to zero.

variable (x₀ : ) (hx₀ : x₀ 0) set_option trace.Meta.Tactic.fun_trans true in -1 / x₀ ^ 2 : ℝ#check ( (x:=x₀), 1/x) rewrite_by fun_trans (disch:=All goals completed! 🐙)
-1 / x₀ ^ 2 : ℝ

We introduced a point x₀ and assumption hx₀ that it is not equal to zero. By default fun_trans does not see this assumption and we have to provide discharger. A discharger is any tactic that tries to satisfy(discharge) any assumption of the theorems fun_trans is using. In this simple case assumption tactic is enough as it just looks through the local context and tries to directly apply any existing assumptions.

Using assumption is not enough for a more complicated expression

variable (x₀ : ) (hx₀ : x₀ 0) (∂ (x:=x₀), (x ^ 2)⁻¹) 1 : ℝ#check ( (x:=x₀), 1/x^2) rewrite_by fun_trans (disch:=x₀✝:hx₀✝:x₀✝0x₀:hx₀:x₀0∀ (x : ), x0)
(∂ (x:=x₀), (x ^ 2)⁻¹) 1 : ℝ

tracing shows

[Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at,
  failed to discharge hypotheses x₀ ^ 2 ≠ 0

We need a tactic that is capable of infering (x₀^2 ≠ 0) from (x₀ ≠ 0). A very general and powerful tactic is aesop

variable (x₀ : ) (hx₀ : x₀ 0) -(2 * x₀) / (x₀ ^ 2) ^ 2 : ℝ#check ( (x:=x₀), 1/x^2) rewrite_by x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀:hx₀:x₀0| -(2 * x₀) / (x₀ ^ 2) ^ 2
-(2 * x₀) / (x₀ ^ 2) ^ 2 : ℝ

In multiple dimensions we often want to differentiate the norm

open SciLean Scalar variable (x₀ : ×) (hx₀ : x₀ 0) ‖x₀‖₂[ℝ]⁻¹ • x₀ : ℝ × ℝ#check ( (x:=x₀), x‖₂) rewrite_by x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0| (<∂ (x:=x₀), x‖₂[]).2 1; fun_trans (disch:=All goals completed! 🐙)

The fact that norm is not differentiable can cause issues. The common practice when writing numerical algorithms is to regularize norm using a small positive ε.

open SciLean Scalar variable (ε : ) ( : 0 < ε) fun w => fun x =>L[ℝ] 2 * ⟪x, w⟫ / (2 * sqrt (ε + ‖w‖₂²)) : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ#check ( (x:×), sqrt (ε + x‖₂²)) rewrite_by x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < ε| x, sqrt (ε + x‖₂²) fun_trans (disch:=x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < εx: × ε + x‖₂²0; All goals completed! 🐙)
fun w => fun x =>L[ℝ] 2 * ⟪x, w⟫ / (2 * sqrt (ε + ‖w‖₂²)) : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ

2.1.5.1. Exercises

  1. gradient of energy for n-body system, newton's potential, lenard jones potential

    • do it for two particles

    • do it for n-particles

  2. signed distance function

    • compute gradient of sphere sdf

    • compute mean and gaussian curvature of sphere

    • pick SDF from https://iquilezles.org/articles/distfunctions/

      • most of them involve functions that are not differentiable everywhere

      • compute derivative in unsafe mode

      • figure out the minimal condition under which it is differentiable

2.1.6. Abstract Vector Spaces

In calculus we usually consider only functions \((f : \mathbb{R}^n \rightarrow \mathbb{R}^m) \). The issue is that on computers the type \( \mathbb{R}^n \) can have multiple different realizations. For example \( \mathbb{R}^3 \) can be modeled by Float×Float×Float, Float^[3] or Float×Float^[2]. They are all equivalent but in code we have to explicitely convert between these types. For this reason it is better to work with abstract vector spaces instead of with \( \mathbb{R}^n \).

Fortunately mathlib's derivative fderiv is already defined for a function (f : X → Y) between two abstract vector spaces X and Y over a field 𝕜. Mathlib's way of introducing an abstract vector space is rather involved and we need to spend some time talking about it. This presentation will be rather simplified. For interested reader we provide references at the end of this section that go over mathlib's algebraic stracutes in more detail.

A vector space X is a set with operations +,-,•,0 such that

  ∀ (x y z : X), x + (y + z) = (x + y) + z
  ∀ (x y : X), x + y = y + x
  ∀ (x : X), x + 0 = 0
  ∀ (x : X), x + (-x) = 0

  ∀ (r s : 𝕜) (x : X), r • (s • x) = (r * s) • x(
  ∀ (x : X), 1 • x = x
  ∀ (r : 𝕜) (x y : X), r • (x + y) = r • x + r • y
  ∀ (r s : 𝕜) (x : X), (r + s) • x = r • x + s • x

in mathlib the axioms talking about addition and negation are captured by the type class AddCommGroup and the aximps talking about scalar multiplication are captured by the type class Module. Therefore if we want to introduce a new abstract vector space over a field R we have to introduce these variables

variable {𝕜 : Type} [Field 𝕜] {X : Type} [AddCommGroup X] [Module 𝕜 X] example (s r : 𝕜) (x y : X) : (s + r) (x + y) = s x + r x + s y + r y := x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < ε𝕜:Typeinst✝²:Field 𝕜X:Typeinst✝¹:AddCommGroup Xinst✝:Module 𝕜 Xs:𝕜r:𝕜x:Xy:X(s + r)(x + y) = sx + rx + sy + ry All goals completed! 🐙

When we want to differentiate a function (f : X → Y) between two vector spaces we also need that X and Y are equiped with a norm. For this purpose there is NormedAddCommGroup which equips AddCommGroup with a norm and guarantees that it compatible with addition and negation, and NormedSpace which equips Module with a norm and guarentees that it is compatible with scalar multiplication. Furthermore, we have to restric to a filed 𝕜 that is either real numbers or complex numbers . The type class RCLike states exactly that. Therefore when we work with derivative in general setting the code usually looks like this

variable {𝕜 : Type} [RCLike 𝕜] {X : Type} [NormedAddCommGroup X] [NormedSpace 𝕜 X] {Y : Type} [NormedAddCommGroup Y] [NormedSpace 𝕜 Y] set_default_scalar 𝕜 example (f g : X Y) (hf : Differentiable 𝕜 f) (hg : Differentiable 𝕜 g) : ( x, (f x + g x)) = ( f) + ( g) := x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < ε𝕜:Typeinst✝⁴:RCLike 𝕜X:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace 𝕜 XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace 𝕜 Yf:XYg:XYhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx, (f x + g x) = f + g x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < ε𝕜:Typeinst✝⁴:RCLike 𝕜X:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace 𝕜 XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace 𝕜 Yf:XYg:XYhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xdx:X(∂ (x:=x), (f x + g x)) dx = ((∂ f + g) x) dx; All goals completed! 🐙

When working with gradients we also need inner product as adjoint is defined through inner product. Unfortunately, here we diverge from mathlib a little bit. Mathlib defines InnerProductSpace which equips NormedSpace with inner product. Understandably InnerProductSpace requires that the ⟪x,x⟫ = ‖x‖² however mathlib made the unfortunate decision by definin norm on produce spaces as ‖(x,y)‖ = max ‖x‖ ‖y‖ which is incompatible with the inner product structure. Therefore type like ℝ×ℝ can't be equiped with InnerProductSpace. Because of these issues, SciLean introduces AdjointSpace which is almost identical to InnerProductSpace but it only requires that the norm induced by inner product is equivalend to the existing norm i.e. ∃ (c d : ℝ⁺), ∀ x, c * ⟪x,x⟫ ≤ ‖x‖^2 ≤ d * ⟪x,x⟫. On AdjointSpace SciLean introduces L₂-norm ‖x‖₂ := sqrt ⟪x,x⟫ which you have seen already and it is the norm you most likely want to use instead of the default norm ‖x‖. Therfore when we work with gradient in general setting the code usually looks like this

open SciLean variable {𝕜 : Type} [RCLike 𝕜] {X : Type} [NormedAddCommGroup X] [AdjointSpace 𝕜 X] [CompleteSpace X] {Y : Type} [NormedAddCommGroup Y] [AdjointSpace 𝕜 Y] [CompleteSpace Y] set_default_scalar 𝕜 example (f g : X Y) (hf : Differentiable 𝕜 f) (hg : Differentiable 𝕜 g) : ( x, (f x + g x)) = ( f) + ( g) := x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:XYg:XYhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx, (f x + g x) = f + g x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:XYg:XYhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xx✝:Y∇ (x:=x;x✝), (f x + g x) = (∇ f + g) x x✝; x₀✝²:hx₀✝²:x₀✝²0x₀✝¹:hx₀✝¹:x₀✝¹0x₀✝:hx₀✝:x₀✝0x₀: × hx₀:x₀0ε::0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:XYg:XYhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xx✝:Ylet ydf := <∂ f x; let ydg := <∂ g x; ydf.2 x✝ + ydg.2 x✝ = (<∂ f x).2 x✝ + (<∂ g x).2 x✝; All goals completed! 🐙

For interested reader we recommend reading the chapter Hierachies from Mathematics in Lean which explains how mathlib approaches algebraic hierachies like monoids, groups or modules. After reading that we recommend reading Differential Calculus in Normed Spaces which how NormedSpace is structured.