2.1. Symbolic Differentiation

Differentiation is at the heart of many problems in scientific computing, from optimizing functions to solving differential equations. Computing derivatives is a mechanical process but very error prone when done manually. In this chapter, we will explore how we can compute derivatives of expressions and programs automatically.

Let's recall what derivative is, for a function f : ℝ → ℝ we define it derivative as

$$f'(x) = \lim_{h → 0} \frac{f(x+h) - f(x)}{h} $$

The mathematical library Mathlib defines derivative fderiv for a general function f : X → Y from some vector spaces X and Y. The derivative of f at point x : X in direction dx : X is

$$\texttt{fderiv ℝ f x dx} = \lim_{h → 0} \frac{f(x+h\cdot dx) - f(x)}{h} $$

SciLean provides tactic fun_trans which can compute derivatives. The name fun_trans stands for 'function transformation' and differentiations is just one example of function transformation. We will talk about general function transformations later in the book.

One of the simplest examples of a derivative is the derivative of $x^n$ which is equal to $n x^{n-1}$. Let's compute it using fun_trans tactic

↑n * x ^ (n - 1) : ℝ#check (fderiv ℝ (fun (x : ℝ) => x^n) x 1) rewrite_by fun_transn:ℕx:ℝ| ↑n * x ^ (n - 1)

↑n * x ^ (n - 1) : ℝ

2.1.1. Notation

Writing fderiv ℝ (fun x => f x) is somewhat tedious so SciLean makes our life easier by introducing a nice notation ∂ x, f x for differentiating (f x) w.r.t. x.

Before we explore this notation further we have to mention that fderiv can also compute complex derivatives with @fderiv ℂ instead of @fderiv ℝ. However, most of the time we work exclusively with real derivative so we can inform Lean that the default choce of the scalar field are real numbers using the following command

set_default_scalar ℝ

Now Lean knows that we want real derivative when we write ∂ x, f x.

Using this notation we can compute again the above derivative

fun x => ↑n * x ^ (n - 1) : ℝ → ℝ#check (∂ (x : ℝ), x^n) rewrite_by fun_transn:ℕx₀:ℝu₀:ℝ × ℝdu₀:ℝ × ℝ| fun x => ↑n * x ^ (n - 1)

fun x => ↑n * x ^ (n - 1) : ℝ → ℝ

Because we did not specify the point where we want to compute the derivative we obtained a function in x. We can specify the point where we want to compute the derivative with ∂ (x:=x₀), ...

↑n * x₀ ^ (n - 1) : ℝ#check (∂ (x:=x₀), x^n) rewrite_by fun_transn:ℕx₀:ℝu₀:ℝ × ℝdu₀:ℝ × ℝ| ↑n * x₀ ^ (n - 1)

↑n * x₀ ^ (n - 1) : ℝ

Writing rewrite_by fun_trans every time we want to diferentiate an expression gets a bit tedious. We can add an exclamation mark after ∂ to indicate that we want to run fun_trans tactic to compute the derivative.

fun x => ↑n * x ^ (n - 1) : ℝ → ℝ#check (∂! (x : ℝ), x^n)

fun x => ↑n * x ^ (n - 1) : ℝ → ℝ

We can differentiate w.r.t to a vector valued variable (x : ℝ×ℝ)

fun x => fun dx =>L[ℝ] 2 * ⟪dx, x⟫ : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ#check ∂! (x : ℝ×ℝ), ‖x‖₂²

fun x => fun dx =>L[ℝ] 2 * ⟪dx, x⟫ : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ

For derivatives w.r.t. a vector valued variable we have to also specify the direction in which we differentiate. Therefore in the above we obtained derivative as a function of the position x and direction dx. Furthermore, the notation fun dx =>L[ℝ] ... indicates that the function is linear function in dx and similarly X →L[ℝ] Y stands for the space of ℝ-linear functions from X to Y.

If we want to specify the position and the direction in which we want to compute the derivatives we use the notation ∂ (x:=x₀;dx₀), f x

2 * ⟪du₀, u₀⟫ : ℝ#check ∂! (u:=u₀;du₀), ‖u‖₂²

2 * ⟪du₀, u₀⟫ : ℝ

To summarize all the different variants. For function of a scalar valued argument

∂ f : ℝ → ℝ#check ∂ f
∂ x, f x : ℝ → ℝ#check ∂ x, f x
∂ (x:=x₀), f x : ℝ#check ∂ (x:=x₀), f x

For function of a vector valued argument

∂ f : ℝ → ℝ#check ∂ f
∂ u, g u : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ#check ∂ u, g u
∂ (u:=u₀), g u : ℝ × ℝ →L[ℝ] ℝ#check ∂ (u:=u₀), g u
(∂ (u:=u₀), g u) du₀ : ℝ#check ∂ (u:=u₀;du₀), g u

There is nothing stopping us from applying derivative multiple times to compute higher order derivatives

fun x => 2 : ℝ → ℝ#check (∂! (∂! (x:ℝ), x^2))

fun x => 2 : ℝ → ℝ

2.1.1.1. Exercises

Express first derivative of f : ℝ → ℝ → ℝ in the first and the second argument. Also express derivative in both arguments at the same time.

Solution

variable (f : ℝ → ℝ → ℝ) (x₀ y₀ : ℝ)

-- first argument
∂ (x:=x₀), f x y₀ : ℝ#check ∂ (x:=x₀), (f x y₀)
∂ (x:=x₀), f x y₀ : ℝ#check ∂ (f · y₀) x₀

-- second agument
∂ (y:=y₀), f x₀ y : ℝ#check ∂ (y:=y₀), (f x₀ y)
∂ (x:=y₀), f x₀ x : ℝ#check ∂ (f x₀ ·) y₀

-- both arguments
∂ (x:=(x₀, y₀)),
  match x with
  | (x, y) => f x y : ℝ × ℝ →L[ℝ] ℝ#check ∂ ((x,y):=(x₀,y₀)), f x y

For (g : ℝ×ℝ → ℝ), express derivative of g (x,y) in x.

Solution

variable (g : ℝ×ℝ → ℝ) (x₀ y₀ : ℝ)

(∂ (xy:=(x₀, y₀)), g xy) (1, 0) : ℝ#check ∂ (xy:=(x₀,y₀);(1,0)), g xy
(∂ g (x₀, y₀)) (1, 0) : ℝ#check ∂ g (x₀,y₀) (1,0)
∂ (x:=x₀), g (x, y₀) : ℝ#check ∂ (x:=x₀), g (x,y₀)

Express second derivatives of f : ℝ → ℝ → ℝ in the first and the second argument.

Solution

variable (f : ℝ → ℝ → ℝ) (x₀ y₀ : ℝ)
∂ (x':=x₀), ∂ (x'':=x'), f x'' y₀ : ℝ#check ∂ (x':= x₀), ∂ (x'':=x'), (f x'' y₀)
∂ (∂ x, f x y₀) x₀ : ℝ#check ∂ (∂ (f · y₀)) x₀

∂ (y':=y₀), ∂ (y'':=y'), f x₀ y'' : ℝ#check ∂ (y':=y₀), ∂ (y'':=y'), (f x₀ y'')
∂ (∂ x, f x₀ x) y₀ : ℝ#check ∂ (∂ (f x₀ ·)) y₀

Let $L(t,x)$ be a function of time and space and y(t) a function of time. Express $ \frac{d}{dt} L(t, y(t)) $ and $ \frac{\partial}{\partial t} L(t, y(t)) $ in Lean. What is the difference between these two expressions?

Solution

variable (L : ℝ → ℝ → ℝ) (y : ℝ → ℝ) (t : ℝ)

-- d/dt L
∂ (t':=t), L t' (y t') : ℝ#check ∂ (t':=t), L t' (y t')

-- ∂/∂t L
∂ (t':=t), L t' (y t) : ℝ#check ∂ (t':=t), L t' (y t)

Because SciLean's notation forces you to be a bit more explicit, there is no need to distinguish between $ \frac{d}{dt} $ and $ \frac{\partial}{\partial t} $. Lots of pain and suffering has been infliced on generations of physics students because of the ambiguity of partial derivative notation.

Express one dimensional Euler-Lagrange equation in Lean

$$\frac{d}{dt} \frac{\partial L}{\partial \dot x}(x(t),\dot x(t)) - \frac{\partial L}{\partial x}(x(t), \dot x(t)) $$

Solution

variable (L : ℝ → ℝ → ℝ) (x : ℝ → ℝ) (t : ℝ)

let v := ∂ x;
∂ (t':=t), ∂ (v':=v t'), L (x t') v' - ∂ (x':=x t), L x' (v t) : ℝ#check
  let v := ∂ x
  ∂ (t':=t), (∂ (v':=v t'), L (x t') v') - ∂ (x':=x t), L x' (v t)

2.1.2. Examples

Let's put computing derivatives to some practical use. We will demonstrate how to use SciLean symbolic differentiations to solve few common tasks in scientific computing and physics.

2.1.2.1. Newton's solver

Solving non-linear equation $ f(x) = 0 $ is a very common problem in scientific computing. Often this can be done only approximately and a popular tool to do so is Newton's method. It is an interative process that starts with an initial guess $x₀$ which is incrementally improved by the following rule

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} $$

We can use Newton's method to compute sqruare root of $y$ by choosing $ f(x) = x^2 - y$.

def mySqrt (steps : Nat) (y : Float) : Float := Id.run do
  let f := fun x => x^2 - y
  let mut x := 1.0
  for _ in [0:steps] do
    x := x - f x / (∂! x':=x, f x')
  return x

1.414214#eval mySqrt 10 2.0

1.414214

You might feel a bit unconfortable here are we are differentiating a function defined on floating point numbers. If you think that can't be formally correct then you are right. We will discuss this issue in a later chapter "Working with Real Numbers".

2.1.2.1.1. Exercises

try to solve different equations, for example exp x = y to obtain log, x*exp x = y to obtain Lambert W function or some polynomial.
measure relative,$\left| \frac{f(x_n)}{x_n} \right| $, and absolute error $ \left| f(x_n) \right| $ and use them for stopping criteria.
A difficult exercise is to define a general newtonSolve function that takes an arbitrary function f : Float → Float and uring elaboration synthesizes its derivative. Add multiple hints, 1. use infer_var trick, 2. state explicitly how the arguments should look like

Solution

set_default_scalar Float
def newtonSolve (steps : Nat) (x₀ : Float)
    (f : Float → Float) {f' : Float → Float}
    (hf : f' = (∂ f) := by
        unfold deriv; fun_trans; infer_var) :
    Float := Id.run do
  let mut x := x₀
  for _ in [0:steps] do
    x := x - f x / f' x
  return x

1.414214#eval newtonSolve 10 1.0 (fun x => x^2 - 2.0)

2.1.2.2. Kinematics

We can also use SciLean's symbolic differentiation to prove some basic theorems from physics. For example we can state the second Newton's law

def NewtonSecondLaw (m : ℝ)
    (x : ℝ → ℝ) (F : ℝ → ℝ) : Prop :=
  ∀ t, m * deriv (∂ x) t = F t

saying that for a particle with mass m under the influence of force F has trajectory x if the mass times the acceleration deriv (∂ x) t, i.e. the second derivative of trajectory, is equal to the force F t.

We can show that under constant force f a particle with mass m has trajectory (fun t => 1/2 * f/m * t^2)

example (m f : ℝ) (hm : m ≠ 0) :
    NewtonSecondLaw m (fun t => f/(2*m) * t^2) (fun _ => f)
    := bym:ℝf:ℝhm:m ≠ 0⊢ NewtonSecondLaw m (fun t => f / (2 * m) * t ^ 2) fun x => f

  unfold NewtonSecondLawm:ℝf:ℝhm:m ≠ 0⊢ ∀ (t : ℝ), m * ∂ (∂ t, f / (2 * m) * t ^ 2) t = (fun x => f) t
  -- compute derivatives
  fun_trans [deriv]m:ℝf:ℝhm:m ≠ 0⊢ m * (f / (2 * m) * 2) = f
  -- finish algebraic simplifications
  field_simpm:ℝf:ℝhm:m ≠ 0⊢ m * (f * 2) = f * (2 * m); ringAll goals completed! 🐙

2.1.2.2.1. Exercises

show that trajectory x := fun t => (cos t, sin t) satisfies differential equation ∂ x t = (- (x t).2, (x t).1)

Solution

open SciLean Scalar
def ode (x : ℝ → ℝ×ℝ) :=
  ∀ t, deriv x t = (- (x t).2, (x t).1)

example : ode (fun t => (cos t, sin t)) := by⊢ ode fun t => (cos t, sin t)
  unfold ode deriv⊢ ∀ (t : ℝ), (∂ (t:=t), (cos t, sin t)) 1 = (-((fun t => (cos t, sin t)) t).2, ((fun t => (cos t, sin t)) t).1); fun_transAll goals completed! 🐙

Show that trajectory $x(t) = \sin(\omega t) $ corresponds to the force $f(x) = - k x $ with $\omega = \sqrt{(k/m)} $

Hint

After differentiation you will have to show that $m \sqrt{\frac{k}{m}}^2 = k$. Unfortunatelly Lean is not yet very powerful computer algebra system. So you can finish the proof with

  ring_nf --  m * (sqrt (k / m) * (sqrt (k / m) ==> m * sqrt (k * m⁻¹) ^ 2
  have h : m * sqrt (k * m⁻¹) ^ 2 = k := sorry_proof
  simp[h]

where we call ring_nf to clean up the expression, then we just assume that m * sqrt (k * m⁻¹) ^ 2 is equal to k and finally we can finish the proof by running simp

Solution

open SciLean Scalar

example (m k : ℝ) :
    let ω := sqrt (k/m)
    let x := fun t => sin (ω*t)
    NewtonSecondLaw m x (fun t => - k*x t) := bym:ℝk:ℝ⊢ let ω := sqrt (k / m);
let x := fun t => sin (ω * t);
NewtonSecondLaw m x fun t => -k * x t

  unfold NewtonSecondLaw derivm:ℝk:ℝ⊢ let ω := sqrt (k / m);
let x := fun t => sin (ω * t);
∀ (t : ℝ), m * (∂ (x_1:=t), (∂ x x_1) 1) 1 = (fun t => -k * x t) t
  fun_transm:ℝk:ℝ⊢ ∀ (t : ℝ), m * (sqrt (k / m) * (sqrt (k / m) * sin (sqrt (k / m) * t))) = k * sin (sqrt (k / m) * t)
  ring_nfm:ℝk:ℝ⊢ ∀ (t : ℝ), m * sqrt (k * m⁻¹) ^ 2 * sin (sqrt (k * m⁻¹) * t) = k * sin (sqrt (k * m⁻¹) * t)
  have h : m * sqrt (k * m⁻¹) ^ 2 = k := sorry_proofm:ℝk:ℝh:m * sqrt (k * m⁻¹) ^ 2 = k⊢ ∀ (t : ℝ), m * sqrt (k * m⁻¹) ^ 2 * sin (sqrt (k * m⁻¹) * t) = k * sin (sqrt (k * m⁻¹) * t)
  simp[h]All goals completed! 🐙

Show that $u(t,x) = sin(x-t)$ is a solution to the wave equation

$$\frac{\partial^2 u}{\partial t^2} = \frac{\partial^2 u}{\partial x^2} $$

Solution

open SciLean Scalar
def WaveEquation (u : ℝ → ℝ → ℝ) :=
  ∀ x t, (∂ (∂ (u · x)) t) = (∂ (∂ (u t ·)) x)

example :
    WaveEquation (fun t x => sin (x - t)) := by⊢ WaveEquation fun t x => sin (x - t)
  unfold WaveEquation deriv⊢ ∀ (x t : ℝ),
  (∂ (x_1:=t), (∂ (x_2:=x_1), (fun t x => sin (x - t)) x_2 x) 1) 1 =
    (∂ (x:=x), (∂ (x:=x), (fun t x => sin (x - t)) t x) 1) 1
  fun_transAll goals completed! 🐙

solution to heat equation

2.1.3. Gradient

In many practical applications, we need to compute gradient instead of directional derivative. For a function $f : \mathbb{R}^n \rightarrow \mathbb{R} $ the gradient of $f$ is a vector of all its partial derivatives

$$\nabla f = \left(\frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right) $$

A more general way of defining gradient is through linear map transposition/adjoint. The derivative of a function f at point x is a linear map from X to ℝ

∂ f x : X →L[ℝ] ℝ#check (∂ f x)

∂ f x : X →L[ℝ] ℝ

To obtain gradient we take an adjoint and evaluate it at one. This is exactly how gradient is defined.

example : (∇ f x) = adjoint ℝ (∂ f x) 1 := byX:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx:Xg:ℝ × ℝ → ℝx₀:Xy:X⊢ ∇ f x = adjoint ℝ (⇑(∂ f x)) 1 rflAll goals completed! 🐙

Few examples of computing gradients

(1, 0) : ℝ × ℝ#check ∇! x:=(0 : ℝ×ℝ), x.1

(1, 0) : ℝ × ℝ

2 • x₀ : X#check ∇! (x:=x₀), ‖x‖₂²

2 • x₀ : X

y : X#check ∇! (x:=x₀), ⟪x,y⟫

y : X

2.1.3.1. Exercises

Compute gradient of x[0], ‖x‖₂², ⟪x,y⟫ for x y : Float^[3] and gradient of A[0,1], ‖A‖₂², ⟪A,B⟫ for A B : Float^[2,2]. Also evaluate those results for some concrete values.

Solution

set_default_scalar Float

⊞[1.000000, 0.000000, 0.000000]#eval ∇! (x:=⊞[1.0,2.0,3.0]), x[0]
⊞[2.000000, 4.000000, 6.000000]#eval ∇! (x:=⊞[1.0,2.0,3.0]), ‖x‖₂²
⊞[0.000000, 1.000000, 0.000000]#eval ∇! (x:=⊞[1.0,2.0,3.0]), ⟪x, ⊞[0.0,1.0,0.0]⟫

def matrix1 := ⊞[1.0,2.0;3.0,4.0]

⊞[0.000000, 1.000000, 0.000000, 0.000000]#eval ∇! (A:=matrix1), A[0,1]
⊞[2.000000, 4.000000, 6.000000, 8.000000]#eval ∇! (A:=matrix1), ‖A‖₂²
⊞[0.000000, 1.000000, 0.000000, 0.000000]#eval ∇! (A:=matrix1), ⟪A, ⊞[0.0,1.0;0.0,0.0]⟫

Previously we computed $\sqrt{y}$ using Newton's method. Similarly we can mySqrt Compute sqrt y using gradient descent by minimizing objective function f := fun x => (x^2 - y)^2

Linear regression via gradient descent. Find matrix $ A \in \mathbb{R}^{2\times 2} $ that for given data $ x_i, y_i \in \mathbb{R}^2 $ minimizes

$$A = \text{argmin}_B \sum_i \| B x_i - y_i \|^2 $$

Solution

set_default_scalar Float

partial def linreg {n : ℕ} (x y : Float^[2]^[n]) :
    Float^[2,2] := Id.run do
  let loss := fun (A : Float^[2,2]) =>
    ∑ i, ‖A * x[i] - y[i]‖₂²

  let rate := 1e-1
  let mut A : Float^[2,2] := 0
  let mut err := 1.0

  while err > 1e-6 do
    let ΔA := (rate • ∇ A':=A, loss A')
      rewrite_by simp[loss]X:Type ?u.782inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝¹:Xg:ℝ × ℝ → ℝx₀:Xy✝:Xn:ℕx:Float^[2]^[n]y:Float^[2]^[n]loss:Float^[2, 2] → Float := fun A =>  ∑ i, ‖A * x[i] - y[i]‖₂²rate:Float := 0.1A✝:Float^[2, 2] := 0err✝:Float := 1.0col✝:Lean.Loop := Lean.Loop.mkx✝:Unitr✝:MProd (Float^[2, 2]) FloatA:Float^[2, 2] := r✝.fsterr:Float := r✝.snd| rate • ∇ (A':=A), ∑ i, ‖MatrixType.gemv 1 0 A' x[i] 0 - y[i]‖₂²; autodiffX:Type ?u.782inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝¹:Xg:ℝ × ℝ → ℝx₀:Xy✝:Xn:ℕx:Float^[2]^[n]y:Float^[2]^[n]loss:Float^[2, 2] → Float := fun A =>  ∑ i, ‖A * x[i] - y[i]‖₂²rate:Float := 0.1A✝:Float^[2, 2] := 0err✝:Float := 1.0col✝:Lean.Loop := Lean.Loop.mkx✝:Unitr✝:MProd (Float^[2, 2]) FloatA:Float^[2, 2] := r✝.fsterr:Float := r✝.snd| rate • ∑ i,
    let zdf := x[i];
    let zdf_1 := MatrixType.gemv 1 0 A zdf 0;
    let ydg := y[i];
    let ydg := zdf_1 - ydg;
    let dy := 2 • ydg;
    let dy := MatrixType.outerprodAdd 1 dy zdf 0;
    dy
    err := ‖ΔA‖₂
    A := A - ΔA

  return A

Write down Euler-Lagrange equation over abstract vector space X and show that for lagrangian L x v := 1/2 * m * ‖v‖₂² - φ x the Euler-Langran equation is m * ∂ (∂ x) t = - ∇ φ x

Either define the Lagrangian over ℝ×ℝ, L : ℝ×ℝ → ℝ×ℝ → ℝ or you can introduce abstract vector space X using this variable command

variable {X} [NormedAddCommGroup X] [AdjointSpace ℝ X] [CompleteSpace X]

The explanation of these typeclasses will be discussed in the last section "Abstract Vector Spaces".

Solution

set_default_scalar ℝ

noncomputable
def EulerLagrange (L : X → X → ℝ) (x : ℝ → X) (t : ℝ) :=
  let v := ∂ x
  ∂ (t':=t), (∇ (v':=v t'), L (x t') v')
  -
  ∇ (x':=x t), L x' (v t)

noncomputable
def NewtonsLaw (m : ℝ) (φ : X → ℝ) (x : ℝ → X) (t : ℝ) :=
  m • (∂ (∂ x) t) + (∇ φ (x t))

declaration uses 'sorry'example
    (x : ℝ → X) (hx : ContDiff ℝ ⊤ x)
    (φ : X → ℝ) (hφ : Differentiable ℝ φ) :
    EulerLagrange (fun x v => m/2 * ‖v‖₂² - φ x) x t
    =
    NewtonsLaw m φ x t := byX:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝:Xg:ℝ × ℝ → ℝx₀:Xy:Xm:ℝt:ℝx:ℝ → Xhx:ContDiff ℝ ⊤ xφ:X → ℝhφ:Differentiable ℝ φ⊢ EulerLagrange (fun x v => m / 2 * ‖v‖₂² - φ x) x t = NewtonsLaw m φ x t
  unfold EulerLagrange NewtonsLaw deriv fgradientX:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝:Xg:ℝ × ℝ → ℝx₀:Xy:Xm:ℝt:ℝx:ℝ → Xhx:ContDiff ℝ ⊤ xφ:X → ℝhφ:Differentiable ℝ φ⊢ (let v := fun x_1 => (∂ x x_1) 1;
  (∂ (t':=t), (<∂ (v':=v t'), (fun x v => m / 2 * ‖v‖₂² - φ x) (x t') v').2 1) 1 -
    (<∂ (x':=x t), (fun x v => m / 2 * ‖v‖₂² - φ x) x' (v t)).2 1) =
  m • (∂ (x_1:=t), (∂ x x_1) 1) 1 + (<∂ φ (x t)).2 1
  fun_trans [smul_smul]X:Type u_1inst✝²:NormedAddCommGroup Xinst✝¹:AdjointSpace ℝ Xinst✝:CompleteSpace Xf:X → ℝx✝:Xg:ℝ × ℝ → ℝx₀:Xy:Xm:ℝt:ℝx:ℝ → Xhx:ContDiff ℝ ⊤ xφ:X → ℝhφ:Differentiable ℝ φ⊢ (∂ (t':=t), m • (∂ x t') 1) 1 = m • (∂ (x_1:=t), (∂ x x_1) 1) 1
  sorryAll goals completed! 🐙

2.1.4. Derivative Rules

A commong issue when fun_trans is not doing what we expect is that there is a missing differentiation theorem.

For example, if we define a function

def foo (x : ℝ) := x^2 + x

then nothing happens when we try to differentiate it

fun x => (∂ (x:=x), foo x) 1 : ℝ → ℝ#check ∂! x, foo x

fun x => (∂ (x:=x), foo x) 1 : ℝ → ℝ

Turning on the fun_trans trace reveals useful information

set_option trace.Meta.Tactic.fun_trans true in
fun x => (∂ (x:=x), foo x) 1 : ℝ → ℝ#check ∂! x, foo x

[Meta.Tactic.fun_trans] [❌] ∂ (x:=x), foo x
  [Meta.Tactic.fun_trans] candidate theorems for foo: []
  [Meta.Tactic.fun_trans] candidate local theorems for foo: []
  [Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv]
  [Meta.Tactic.fun_trans] [❌] applying: isContinuousLinearMap_fderiv
    [Meta.Tactic.fun_trans] isContinuousLinearMap_fderiv, failed to discharge hypotheses
          SciLean.IsContinuousLinearMap ℝ fun x => foo x

The ❌ on the first line signifies that fun_trans failed to make prograss on ∂ (x:=x), foo x. The next two lines

  [Meta.Tactic.fun_trans] candidate theorems for foo: []
  [Meta.Tactic.fun_trans] candidate local theorems for foo: []

states that there are no derivative theorems for foo. The next line

  [Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv]

states that there is a potentially applicable theorem isContinuousLinearMap_fderiv which can differentiate linear functions. However the next few lines report that applying this theorem failed as fun_trans can't prove that foo is (continuous) linear map.

To remedy this problem we can define derivative of foo

def foo_deriv (x : ℝ) := 2*x + 1

and add a theorem that the derivative of foo is equal to foo_deriv

open SciLean
@[fun_trans]
theorem foo_deriv_rule :
    fderiv ℝ foo
    =
    fun x => fun dx =>L[ℝ] dx • foo_deriv x := by⊢ ∂ foo = fun x => fun dx =>L[ℝ] dx • foo_deriv x
  unfold foo foo_deriv⊢ ∂ x, (x ^ 2 + x) = fun x => fun dx =>L[ℝ] dx • (2 * x + 1); ext xh.hx:ℝ⊢ (∂ (x:=x), (x ^ 2 + x)) 1 = (fun dx =>L[ℝ] dx • (2 * x + 1)) 1; fun_transAll goals completed! 🐙

Because foo_deriv_rule is marked with fun_trans attribute it will be used when we try to differentiate foo now

fun x => foo_deriv x : ℝ → ℝ#check ∂! x, foo x

fun x => foo_deriv x : ℝ → ℝ

Unfortuantelly there is one more issue, fun_trans will do nothing when we try to compose foo together

fun x => (∂ (x:=x), foo (foo x)) 1 : ℝ → ℝ#check ∂! x, foo (foo x)

fun x => (∂ (x:=x), foo (foo x)) 1 : ℝ → ℝ

set_option trace.Meta.Tactic.fun_trans true in
fun x => (∂ (x:=x), foo (foo x)) 1 : ℝ → ℝ#check ∂! x, foo (foo x)

...
  [Meta.Tactic.fun_trans] trying comp theorem SciLean.fderiv.comp_rule
  [Meta.Tactic.fun_trans] [❌] applying: SciLean.fderiv.comp_rule
    [Meta.Tactic.fun_trans] SciLean.fderiv.comp_rule, failed to discharge hypotheses
          Differentiable ℝ fun x0 => foo x0
...

The trace reveals that fun_trans tries to apply composition(chain) rule SciLean.fderiv.comp_rule but it fails as it can't prove Differentiable ℝ fun x0 => foo x0. We need another theorem stating that foo is differentiable function. Mathlib has a tactic fun_prop that can prove differentiability and many other function properties like linearity, continuity, measurability etc. and fun_trans uses this tactic to ensure it can apply chain rule.

We need to add fun_prop theorem for foo

@[fun_prop]
theorem foo_differentiable : Differentiable ℝ foo := by⊢ Differentiable ℝ foo unfold foo⊢ Differentiable ℝ fun x => x ^ 2 + x; fun_propAll goals completed! 🐙

Now fun_trans knows that foo is differentiable function and can safely apply chain rule

fun x => foo_deriv x * foo_deriv (foo x) : ℝ → ℝ#check (∂! x, foo (foo x)) rewrite_by simp| fun x => foo_deriv x * foo_deriv (foo x)

fun x => foo_deriv x * foo_deriv (foo x) : ℝ → ℝ

Writing these theorems by hand is quite tedious so there is a macro def_fun_prop and def_fun_trans to help you with writing these theorems

def_fun_prop : Differentiable ℝ foo by unfold foo⊢ Differentiable ℝ fun x => x ^ 2 + x; fun_propAll goals completed! 🐙
def_fun_trans : fderiv ℝ foo by unfold foo| ∂ x, (x ^ 2 + x); fun_trans| fun x => fun x_1 =>L[ℝ] 2 * x_1 * x + x_1

It generates these theorems and definition

foo.arg_x.Differentiable_rule : Differentiable ℝ foo#check foo.arg_x.Differentiable_rule
def foo.arg_x.fderiv : ℝ → ℝ →L[ℝ] ℝ :=
fun x => fun x_1 =>L[ℝ] 2 * x_1 * x + x_1#print foo.arg_x.fderiv
foo.arg_x.fderiv_rule : ∂ foo = foo.arg_x.fderiv#check foo.arg_x.fderiv_rule

The problem of writing appropriate theorems for fun_trans and fun_prop is quite involved problem and will be discussed in future chapter.

2.1.5. Differentiating Division, Log, Sqrt, ...

So far we have worked with functions that are smooth. However, functions like division, log, sqrt, ‖·‖₂ are not differentiable everywhere. We have to be a bit careful with those functions because SciLean tries to perform computations that are, at least in principle, fully formalizable. Let's try to differentiate division

fun x => (∂ (x:=x), x⁻¹) 1 : ℝ → ℝ#check ∂! (x:ℝ), 1/x

fun x => (∂ (x:=x), x⁻¹) 1 : ℝ → ℝ

We did not get expected -x⁻². When differentiation, or any tactic, is not doing what we expect we can turn on tracing. Let's try again with tracing on

set_option trace.Meta.Tactic.fun_trans true in
fun x => (∂ (x:=x), x⁻¹) 1 : ℝ → ℝ#check ∂! (x:ℝ), 1/x

and the beggining of the trace is saying that fun_trans tried to apply theorem HDiv.hDiv.arg_a0a1.fderiv_rule_at however it failed to discharge x ≠ 0

[Meta.Tactic.fun_trans] [❌] ∂ (x:=x), 1 / x
  [Meta.Tactic.fun_trans] candidate theorems for HDiv.hDiv: [HDiv.hDiv.arg_a0a1.fderiv_rule_at]
  [Meta.Tactic.fun_trans] [❌] applying: HDiv.hDiv.arg_a0a1.fderiv_rule_at
    [Meta.Tactic.fun_trans] [❌] discharging: x ≠ 0
    [Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at, failed to discharge hypotheses
          x ≠ 0

This makes sense as division 1/x is well defined and differentiable only away from zero. Therefore we have to differentiate it at a concrete point that is not equal to zero.

variable (x₀ : ℝ) (hx₀ : x₀ ≠ 0)
set_option trace.Meta.Tactic.fun_trans true in
-1 / x₀ ^ 2 : ℝ#check (∂ (x:=x₀), 1/x) rewrite_by fun_trans (disch:=assumptionAll goals completed! 🐙)

-1 / x₀ ^ 2 : ℝ

We introduced a point x₀ and assumption hx₀ that it is not equal to zero. By default fun_trans does not see this assumption and we have to provide discharger. A discharger is any tactic that tries to satisfy(discharge) any assumption of the theorems fun_trans is using. In this simple case assumption tactic is enough as it just looks through the local context and tries to directly apply any existing assumptions.

Using assumption is not enough for a more complicated expression

variable (x₀ : ℝ) (hx₀ : x₀ ≠ 0)
(∂ (x:=x₀), (x ^ 2)⁻¹) 1 : ℝ#check (∂ (x:=x₀), 1/x^2) rewrite_by fun_trans (disch:=assumptionfunProp.dischargerx₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝhx₀:x₀ ≠ 0⊢ ∀ (x : ℝ), x ≠ 0)

(∂ (x:=x₀), (x ^ 2)⁻¹) 1 : ℝ

tracing shows

[Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at,
  failed to discharge hypotheses x₀ ^ 2 ≠ 0

We need a tactic that is capable of infering (x₀^2 ≠ 0) from (x₀ ≠ 0). A very general and powerful tactic is aesop

variable (x₀ : ℝ) (hx₀ : x₀ ≠ 0)
-(2 * x₀) / (x₀ ^ 2) ^ 2 : ℝ#check (∂ (x:=x₀), 1/x^2) rewrite_by fun_trans (disch:=aesop: maximum number of rule applications (200) reached. Set the 'maxRuleApplications' option to increase the limit.aesop)x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝhx₀:x₀ ≠ 0| -(2 * x₀) / (x₀ ^ 2) ^ 2

-(2 * x₀) / (x₀ ^ 2) ^ 2 : ℝ

In multiple dimensions we often want to differentiate the norm

open SciLean Scalar

variable (x₀ : ℝ×ℝ) (hx₀ : x₀ ≠ 0)

‖x₀‖₂[ℝ]⁻¹ • x₀ : ℝ × ℝ#check (∇ (x:=x₀), ‖x‖₂) rewrite_by unfold fgradientx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0| (<∂ (x:=x₀), ‖x‖₂[ℝ]).2 1; fun_trans (disch:=assumptionAll goals completed! 🐙)

The fact that norm is not differentiable can cause issues. The common practice when writing numerical algorithms is to regularize norm using a small positive ε.

open SciLean Scalar
variable (ε : ℝ) (hε : 0 < ε)

fun w => fun x =>L[ℝ] 2 * ⟪x, w⟫ / (2 * sqrt (ε + ‖w‖₂²)) : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ#check (∂ (x:ℝ×ℝ), sqrt (ε + ‖x‖₂²)) rewrite_by
  unfold derivx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε| ∂ x, sqrt (ε + ‖x‖₂²)
  fun_trans (disch:=intro xfunProp.dischargerx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < εx:ℝ × ℝ⊢ ε + ‖x‖₂² ≠ 0; nlinarith [norm2_nonneg ℝ x]All goals completed! 🐙)

fun w => fun x =>L[ℝ] 2 * ⟪x, w⟫ / (2 * sqrt (ε + ‖w‖₂²)) : ℝ × ℝ → ℝ × ℝ →L[ℝ] ℝ

2.1.5.1. Exercises

gradient of energy for n-body system, newton's potential, lenard jones potential
- do it for two particles
- do it for n-particles
signed distance function
- compute gradient of sphere sdf
- compute mean and gaussian curvature of sphere
- pick SDF from https://iquilezles.org/articles/distfunctions/
  - most of them involve functions that are not differentiable everywhere
  - compute derivative in unsafe mode
  - figure out the minimal condition under which it is differentiable

2.1.6. Abstract Vector Spaces

In calculus we usually consider only functions $(f : \mathbb{R}^n \rightarrow \mathbb{R}^m) $. The issue is that on computers the type $ \mathbb{R}^n $ can have multiple different realizations. For example $ \mathbb{R}^3 $ can be modeled by Float×Float×Float, Float^[3] or Float×Float^[2]. They are all equivalent but in code we have to explicitely convert between these types. For this reason it is better to work with abstract vector spaces instead of with $ \mathbb{R}^n $.

Fortunately mathlib's derivative fderiv is already defined for a function (f : X → Y) between two abstract vector spaces X and Y over a field 𝕜. Mathlib's way of introducing an abstract vector space is rather involved and we need to spend some time talking about it. This presentation will be rather simplified. For interested reader we provide references at the end of this section that go over mathlib's algebraic stracutes in more detail.

A vector space X is a set with operations +,-,•,0 such that

  ∀ (x y z : X), x + (y + z) = (x + y) + z
  ∀ (x y : X), x + y = y + x
  ∀ (x : X), x + 0 = 0
  ∀ (x : X), x + (-x) = 0

  ∀ (r s : 𝕜) (x : X), r • (s • x) = (r * s) • x(
  ∀ (x : X), 1 • x = x
  ∀ (r : 𝕜) (x y : X), r • (x + y) = r • x + r • y
  ∀ (r s : 𝕜) (x : X), (r + s) • x = r • x + s • x

in mathlib the axioms talking about addition and negation are captured by the type class AddCommGroup and the aximps talking about scalar multiplication are captured by the type class Module. Therefore if we want to introduce a new abstract vector space over a field R we have to introduce these variables

variable
  {𝕜 : Type} [Field 𝕜]
  {X : Type} [AddCommGroup X] [Module 𝕜 X]

example (s r : 𝕜) (x y : X) :
    (s + r) • (x + y) = s • x + r • x + s • y + r • y := byx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝²:Field 𝕜X:Typeinst✝¹:AddCommGroup Xinst✝:Module 𝕜 Xs:𝕜r:𝕜x:Xy:X⊢ (s + r) • (x + y) = s • x + r • x + s • y + r • y
  simp only [add_smul,smul_add,add_assoc]All goals completed! 🐙

When we want to differentiate a function (f : X → Y) between two vector spaces we also need that X and Y are equiped with a norm. For this purpose there is NormedAddCommGroup which equips AddCommGroup with a norm and guarantees that it compatible with addition and negation, and NormedSpace which equips Module with a norm and guarentees that it is compatible with scalar multiplication. Furthermore, we have to restric to a filed 𝕜 that is either real numbers ℝ or complex numbers ℂ. The type class RCLike states exactly that. Therefore when we work with derivative in general setting the code usually looks like this

variable
  {𝕜 : Type} [RCLike 𝕜]
  {X : Type} [NormedAddCommGroup X] [NormedSpace 𝕜 X]
  {Y : Type} [NormedAddCommGroup Y] [NormedSpace 𝕜 Y]

set_default_scalar 𝕜

example (f g : X → Y) (hf : Differentiable 𝕜 f) (hg : Differentiable 𝕜 g) :
    (∂ x, (f x + g x)) = (∂ f) + (∂ g) := byx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁴:RCLike 𝕜X:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace 𝕜 XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace 𝕜 Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 g⊢ ∂ x, (f x + g x) = ∂ f + ∂ g ext x dxh.hx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁴:RCLike 𝕜X:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace 𝕜 XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace 𝕜 Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xdx:X⊢ (∂ (x:=x), (f x + g x)) dx = ((∂ f + ∂ g) x) dx; fun_transAll goals completed! 🐙

When working with gradients we also need inner product as adjoint is defined through inner product. Unfortunately, here we diverge from mathlib a little bit. Mathlib defines InnerProductSpace which equips NormedSpace with inner product. Understandably InnerProductSpace requires that the ⟪x,x⟫ = ‖x‖² however mathlib made the unfortunate decision by definin norm on produce spaces as ‖(x,y)‖ = max ‖x‖ ‖y‖ which is incompatible with the inner product structure. Therefore type like ℝ×ℝ can't be equiped with InnerProductSpace. Because of these issues, SciLean introduces AdjointSpace which is almost identical to InnerProductSpace but it only requires that the norm induced by inner product is equivalend to the existing norm i.e. ∃ (c d : ℝ⁺), ∀ x, c * ⟪x,x⟫ ≤ ‖x‖^2 ≤ d * ⟪x,x⟫. On AdjointSpace SciLean introduces L₂-norm ‖x‖₂ := sqrt ⟪x,x⟫ which you have seen already and it is the norm you most likely want to use instead of the default norm ‖x‖. Therfore when we work with gradient in general setting the code usually looks like this

open SciLean
variable
  {𝕜 : Type} [RCLike 𝕜]
  {X : Type} [NormedAddCommGroup X] [AdjointSpace 𝕜 X] [CompleteSpace X]
  {Y : Type} [NormedAddCommGroup Y] [AdjointSpace 𝕜 Y] [CompleteSpace Y]

set_default_scalar 𝕜

example (f g : X → Y) (hf : Differentiable 𝕜 f) (hg : Differentiable 𝕜 g) :
    (∇ x, (f x + g x)) = (∇ f) + (∇ g) := byx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 g⊢ ∇ x, (f x + g x) = ∇ f + ∇ g
  ext xh.hx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xx✝:Y⊢ ∇ (x:=x;x✝), (f x + g x) = (∇ f + ∇ g) x x✝; autodiffx₀✝²:ℝhx₀✝²:x₀✝² ≠ 0x₀✝¹:ℝhx₀✝¹:x₀✝¹ ≠ 0x₀✝:ℝhx₀✝:x₀✝ ≠ 0x₀:ℝ × ℝhx₀:x₀ ≠ 0ε:ℝhε:0 < ε𝕜:Typeinst✝⁶:RCLike 𝕜X:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:AdjointSpace 𝕜 Xinst✝³:CompleteSpace XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:AdjointSpace 𝕜 Yinst✝:CompleteSpace Yf:X → Yg:X → Yhf:Differentiable 𝕜 fhg:Differentiable 𝕜 gx:Xx✝:Y⊢ let ydf := <∂ f x;
let ydg := <∂ g x;
ydf.2 x✝ + ydg.2 x✝ = (<∂ f x).2 x✝ + (<∂ g x).2 x✝; rflAll goals completed! 🐙

For interested reader we recommend reading the chapter Hierachies from Mathematics in Lean which explains how mathlib approaches algebraic hierachies like monoids, groups or modules. After reading that we recommend reading Differential Calculus in Normed Spaces which how NormedSpace is structured.