Symbolic Differentiation
Differentiation is at the heart of many problems in scientific computing, from optimizing functions to solving differential equations. Computing derivatives is a mechanical process but very error prone when done manually. In this chapter, we will explore how we can compute derivatives of expressions and programs automatically.
Let's recall what derivative is, for a function f : β β β
we define it derivative as
$$f'(x) = \lim_{h β 0} \frac{f(x+h) - f(x)}{h} $$
The mathematical library Mathlib defines derivative fderiv
for a general function f : X β Y
from some vector spaces X
and Y
. The derivative of f
at point x : X
in direction dx : X
is
$$\texttt{fderiv β f x dx} = \lim_{h β 0} \frac{f(x+h\cdot dx) - f(x)}{h} $$
SciLean provides tactic fun_trans
which can compute derivatives. The name fun_trans
stands for 'function transformation' and differentiations is just one example of function transformation. We will talk about general function transformations later in the book.
One of the simplest examples of a derivative is the derivative of \(x^n\) which is equal to \(n x^{n-1}\). Let's compute it using fun_trans
tactic
variable (n : Nat) (x : β)
#check (fderiv β (fun (x : β) => x^n) x 1) rewrite_by n : β x : β
| βn * x ^ (n - 1)
βn * x ^ (n - 1) : β
Add a link to an interactive code editor and encourage reader to differentiate more complicated expressions involving sin, cos, exp, ... but warn about log or division.
Notation
Writing fderiv β (fun x => f x)
is somewhat tedious so SciLean makes our life easier by introducing a nice notation β x, f x
for differentiating (f x)
w.r.t. x
.
Before we explore this notation further we have to mention that fderiv
can also compute complex derivatives with @fderiv β
instead of @fderiv β
. However, most of the time we work exclusively with real derivative so we can inform Lean that the default choce of the scalar field are real numbers using the following command
set_default_scalar β
Now Lean knows that we want real derivative when we write β x, f x
.
Using this notation we can compute again the above derivative
variable (n : Nat)
#check (β (x : β), x^n) rewrite_by nβ : β x : β n : β
| fun x => βn * x ^ (n - 1)
fun x => βn * x ^ (n - 1) : β β β
Because we did not specify the point where we want to compute the derivative we obtained a function in x
. We can specify the point where we want to compute the derivative with β (x:=xβ), ...
variable (n : Nat) (xβ : β)
#check (β (x:=xβ), x^n) rewrite_by nβΒΉ : β x : β nβ : β n : β xβ : β
| βn * xβ ^ (n - 1)
βn * xβ ^ (n - 1) : β
Writing rewrite_by fun_trans
every time we want to diferentiate an expression gets a bit tedious. We can add an exclamation mark after β
to indicate that we want to run fun_trans
tactic to compute the derivative.
variable (n : Nat)
#check (β! (x : β), x^n)
fun x => βn * x ^ (n - 1) : β β β
We can differentiate w.r.t to a vector valued variable (x : βΓβ)
#check β! (x : βΓβ), βxββΒ²
fun x => fun dx =>L[β] 2 * βͺdx, xβ«_β : β Γ β β β Γ β βL[β] β
For derivatives w.r.t. a vector valued variable we have to also specify the direction in which we differentiate. Therefore in the above we obtained derivative as a function of the position x
and direction dx
. Furthermore, the notation fun dx =>L[β] ...
indicates that the function is linear function in dx
and similarly X βL[β] Y
stands for the space of β-linear functions from X
to Y
.
If we want to specify the position and the direction in which we want to compute the derivatives we use the notation β (x:=xβ;dxβ), f x
variable (xβ dxβ : βΓβ)
#check β! (x:=xβ;dxβ), βxββΒ²
2 * βͺdxβ, xββ«_β : β
To summarize all the different variants. For function of a scalar valued argument
variable (f : β β β) (xβ : β)
#check β f
#check β x, f x
#check β (x:=xβ), f x
For function of a vector valued argument
variable (f : βΓβ β β) (xβ dxβ : βΓβ)
#check β f
#check β x, f x
#check β (x:=xβ), f x
#check β (x:=xβ;dxβ), f x
There is nothing stopping us from applying derivative multiple times to compute higher order derivatives
#check (β! (β! (x:β), x^2))
fun x => 2 : β β β
Exercises
-
Express first derivative of
f : β β β β β
in the first and the second argument. Also express derivative in both arguments at the same time.
variable (f : β β β β β) (xβ yβ : β)
-- first argument
#check β (x:=xβ), (f x yβ)
#check β (f Β· yβ) xβ
-- second agument
#check β (y:=yβ), (f xβ y)
#check β (f xβ Β·) yβ
-- both arguments
#check β ((x,y):=(xβ,yβ)), f x y
-
For
(g : βΓβ β β)
, express derivative ofg (x,y)
inx
.
variable (g : βΓβ β β) (xβ yβ : β)
#check β (xy:=(xβ,yβ);(1,0)), g xy
#check β g (xβ,yβ) (1,0)
#check β (x:=xβ), g (x,yβ)
-
Express second derivatives of
f : β β β β β
in the first and the second argument.
variable (f : β β β β β) (xβ yβ : β)
#check β (x':= xβ), β (x'':=x'), (f x'' yβ)
#check β (β (f Β· yβ)) xβ
#check β (y':=yβ), β (y'':=y'), (f xβ y'')
#check β (β (f xβ Β·)) yβ
-
Let \(L(t,x)\) be a function of time and space and
y(t)
a function of time. Express \( \frac{d}{dt} L(t, y(t)) \) and \( \frac{\partial}{\partial t} L(t, y(t)) \) in Lean. What is the difference between these two expressions?
variable (L : β β β β β) (y : β β β) (t : β)
-- d/dt L
#check β (t':=t), L t' (y t')
-- β/βt L
#check β (t':=t), L t' (y t)
Because SciLean's notation forces you to be a bit more explicit, there is no need to distinguish between \( \frac{d}{dt} \) and \( \frac{\partial}{\partial t} \). Lots of pain and suffering has been infliced on generations of physics students because of the ambiguity of partial derivative notation.
-
Express one dimensional Euler-Lagrange equation in Lean
$$\frac{d}{dt} \frac{\partial L}{\partial \dot x}(x(t),\dot x(t)) - \frac{\partial L}{\partial x}(x(t), \dot x(t)) $$
variable (L : β β β β β) (x : β β β) (t : β)
#check
let v := β x
β (t':=t), (β (v':=v t'), L (x t') v') - β (x':=x t), L x' (v t)
Examples
Let's put computing derivatives to some practical use. We will demonstrate how to use SciLean symbolic differentiations to solve few common tasks in scientific computing and physics.
Newton's solver
Solving non-linear equation \( f(x) = 0 \) is a very common problem in scientific computing. Often this can be done only approximately and a popular tool to do so is Newton's method. It is an interative process that starts with an initial guess \(xβ\) which is incrementally improved by the following rule
$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} $$
We can use Newton's method to compute sqruare root of \(y\) by choosing \( f(x) = x^2 - y\).
def mySqrt (steps : Nat) (y : Float) : Float := Id.run do
let f := fun x => x^2 - y
let mut x := 1.0
for _ in [0:steps] do
x := x - f x / ((deriv f x) rewrite_by nβΒ² : β xβΒ³ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ³ : β β β xββΒ³ : β fβΒ² : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβΒΉ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β fβ : β β β β β xβ : β yβ : β Lβ : β β β β β yβ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β steps : β y : Float f : Float β Float := fun x => x ^ 2 - y xβΒΉ : Float := 1.0 colβ : Std.Range := { start := 0, stop := steps, step := 1 } xβ : β rβ : Float x : Float := rβ
| 2 * x)
return x
#eval mySqrt 10 2.0
1.414214
In mySqrt
we should use (β! f x)
notation but unfortunatelly it is currently broken for some reason.
You might feel a bit unconfortable here are we are differentiating a function defined on floating point numbers. If you think that can't be formally correct then you are right. We will discuss this issue in a later chapter "Working with Real Numbers".
Exercises
-
try to solve different equations, for example
exp x = y
to obtainlog
,x*exp x = y
to obtain Lambert W function or some polynomial. -
measure relative,\(\left| \frac{f(x_n)}{x_n} \right| \), and absolute error \( \left| f(x_n) \right| \) and use them for stopping criteria.
-
A difficult exercise is to define a general
newtonSolve
function that takes an arbitrary functionf : Float β Float
and uring elaboration synthesizes its derivative. Add multiple hints, 1. useinfer_var
trick, 2. state explicitly how the arguments should look like
set_default_scalar Float
def newtonSolve (steps : Nat) (xβ : Float)
(f : Float β Float) {f' : Float β Float}
(hf : f' = (β f) := by unfold deriv; fun_trans; infer_var) : Float := Id.run do
let mut x := xβ
for _ in [0:steps] do
x := x - f x / f' x
return x
#eval newtonSolve 10 1.0 (fun x => x^2 - 2.0)
Kinematics
We can also use SciLean's symbolic differentiation to prove some basic theorems from physics. For example we can state the second Newton's law
def NewtonSecondLaw (m : β) (x : β β β) (F : β β β) : Prop :=
β t, m * deriv (β x) t = F t
saying that for a particle with mass m
under the influence of force F
has trajectory x
if the mass times the acceleration deriv (β x) t
, i.e. the second derivative of trajectory, is equal to the force F t
.
We can show that under constant force f
a particle with mass m
has trajectory (fun t => 1/2 * f/m * t^2)
example (m f : β) (hm : m β 0) :
NewtonSecondLaw m (fun t => 1/2 * f/m * t^2) (fun _ => f) := nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ³ : β β β xββΒ³ : β fβΒ² : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβΒΉ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β fβ : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β f : β hm : m β 0
β’ NewtonSecondLaw m (fun t => 1 / 2 * f / m * t ^ 2) fun x => f
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ³ : β β β xββΒ³ : β fβΒ² : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβΒΉ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β fβ : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β f : β hm : m β 0
β’ β (t : β), m * β (β t, 1 / 2 * f / m * t ^ 2) t = (fun x => f) t
-- compute derivatives
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ³ : β β β xββΒ³ : β fβΒ² : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβΒΉ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β fβ : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β f : β hm : m β 0
β’ m * (2β»ΒΉ * f / m * 2) = f
-- finish algebraic simplifications
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ³ : β β β xββΒ³ : β fβΒ² : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβΒΉ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β fβ : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β f : β hm : m β 0
β’ m * (f * 2) = f * (2 * m); All goals completed! π
Warning: currently the tactic fun_trans
uses theorems that are not fully proven therefore the above proof is not completely formal proof. If you do not like this you are welcome to improve SciLean by helping out proving its theorems. Lots of theorems should be just matter of finding the right mathlib theorem.
Exercises
-
show that trajectory
x := fun t => (cos t, sin t)
satisfies differential equationβ x t = (- (x t).2, (x t).1)
open SciLean Scalar
def ode (x : β β βΓβ) := β t, deriv x t = (- (x t).2, (x t).1)
example : ode (fun t => (cos t, sin t)) := nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β
β’ ode fun t => (cos t, sin t) nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β
β’ β (t : β), (β (t:=t), (cos t, sin t)) 1 = (-((fun t => (cos t, sin t)) t).2, ((fun t => (cos t, sin t)) t).1); All goals completed! π
-
Show that trajectory \(x(t) = \sin(\omega t) \) corresponds to the force \(f(x) = - k x \) with \(\omega = \sqrt{(k/m)} \)
After differentiation you will have to show that \(m \sqrt{\frac{k}{m}}^2 = k\). Unfortunatelly Lean is not yet very powerful computer algebra system. So you can finish the proof with
ring_nf -- m * (sqrt (k / m) * (sqrt (k / m) ==> m * sqrt (k * mβ»ΒΉ) ^ 2 have h : m * sqrt (k * mβ»ΒΉ) ^ 2 = k := sorry_proof simp[h]
where we call ring_nf
to clean up the expression, then we just assume that m * sqrt (k * mβ»ΒΉ) ^ 2
is equal to k
and finally we can finish the proof by running simp
open SciLean Scalar
example (m k : β) :
let Ο := sqrt (k/m)
let x := fun t => sin (Ο*t)
NewtonSecondLaw m x (fun t => - k*x t) := nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β k : β
β’ let Ο := sqrt (k / m);
let x := fun t => sin (Ο * t);
NewtonSecondLaw m x fun t => -k * x t
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β k : β
β’ let Ο := sqrt (k / m);
let x := fun t => sin (Ο * t);
β (t : β), m * (β (x_1:=t), (β x x_1) 1) 1 = (fun t => -k * x t) t
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β k : β
β’ β (t : β), m * (sqrt (k / m) * (sqrt (k / m) * sin (sqrt (k / m) * t))) = k * sin (sqrt (k / m) * t)
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β k : β
β’ β (t : β), m * sqrt (k * mβ»ΒΉ) ^ 2 * sin (sqrt (k * mβ»ΒΉ) * t) = k * sin (sqrt (k * mβ»ΒΉ) * t)
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β m : β k : β h : m * sqrt (k * mβ»ΒΉ) ^ 2 = k
β’ β (t : β), m * sqrt (k * mβ»ΒΉ) ^ 2 * sin (sqrt (k * mβ»ΒΉ) * t) = k * sin (sqrt (k * mβ»ΒΉ) * t)
All goals completed! π
-
Show that \(u(t,x) = sin(x-t)\) is a solution to the wave equation
$$\frac{\partial^2 u}{\partial t^2} = \frac{\partial^2 u}{\partial x^2} $$
open SciLean Scalar
def WaveEquation (u : β β β β β) :=
β x t, (β (β (u Β· x)) t) = (β (β (u t Β·)) x)
example :
WaveEquation (fun t x => sin (x - t)) := nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β
β’ WaveEquation fun t x => sin (x - t)
nβΒ² : β xβ : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fβΒ² : β β β xββΒ³ : β fβΒΉ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβ : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β f : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β x : β β β t : β
β’ β (x t : β),
(β (x_1:=t), (β (x_2:=x_1), (fun t x => sin (x - t)) x_2 x) 1) 1 =
(β (x:=x), (β (x:=x), (fun t x => sin (x - t)) t x) 1) 1
All goals completed! π
-
solution to heat equation
Gradient
In many practical applications, we need to compute gradient instead of directional derivative. For a function \(f : \mathbb{R}^n \rightarrow \mathbb{R} \) the gradient of \(f\) is a vector of all its partial derivatives
$$\nabla f = \left(\frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right) $$
A more general way of defining gradient is through linear map transposition/adjoint. The derivative of a function (f : X β β)
at point x
is a linear map from X
to β
open SciLean
variable {X} [NormedAddCommGroup X] [AdjointSpace β X] [CompleteSpace X]
variable (f : X β β) (x : X)
#check (β f x)
β f x : X βL[β] β
To obtain gradient we take an adjoint and evaluate it at one. This is exactly how gradient is defined.
variable (f : X β β) (x : X)
example : (β f x) = adjoint β (β f x) 1 := nβΒ² : β xβΒ² : β nβΒΉ : β nβ : β xβββ΅ : β n : β xβββ΄ : β Γ β dxββ : β Γ β fββ΄ : β β β xββΒ³ : β fβΒ³ : β Γ β β β xββΒ² : β Γ β dxβ : β Γ β fβΒ² : β β β β β xββΒΉ : β yββΒΉ : β g : β Γ β β β xββ : β yββ : β fβΒΉ : β β β β β xβ : β yβ : β Lβ : β β β β β y : β β β tβ : β L : β β β β β xβΒΉ : β β β t : β X : Type u_1 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβ : X β β xβ : X f : X β β x : X
β’ β f x = adjoint β (β(β f x)) 1 All goals completed! π
This coincides with the standard notion of gradient that it is a vector of all its partial derivatives. For example for n=2
we have
variable {n : Nat} (f : βΓβ β β) (hf : Differentiable β f) (x y : β)
example : (β f (x,y)) = (β (x':=x), f (x',y), β (y':=y), f (x,y')) := sorry_proof
Warning for mathlib users: SciLean defines its own version of adjoint
and gradient
. The reason is that the product type βΓβ
and function type Fin n β β
are not InnerProductSpace
and therefore it is impossible do use mathlibs gradient
on functions of type βΓβ β β
or (Fin n β β) β β
. Mathlib's advice is to use WithLp 2 (βΓβ)
or EuclidianSpace n
however this is seriously inconvenient for people that just want to write some code.
SciLean solution to this is to introduce new typeclass AdjointSpace β X
that is almost the same as InnerProductSpace β X
but requires that the norm induced by inner product, βxββ = βͺx,xβ«
, is topologically equivalent to the norm βxβ
. This way we can provide instance of AdjointSpace β (XΓY)
and AdjointSpace β (ΞΉ β X)
without causing issues.
Few examples of of computing gradients
variable (xβ : βΓβ)
#check β! (x:=xβ), x.1
(1, 0) : β Γ β
variable (xβ : βΓβ)
#check β! (x:=xβ), βxββΒ²
2 β’ xβ : β Γ β
variable (xβ y : βΓβ)
#check β! (x:=xβ), βͺx,yβ«
y : β Γ β
Exercises
-
Compute gradient of
x[0]
,βxββΒ²
,βͺx,yβ«
forx y : Float^[3]
and gradient ofA[0,1]
,βAββΒ²
,βͺA,Bβ«
forA B : Float^[2,2]
. Also evaluate those results for some concrete values.
set_default_scalar Float
#eval β! (x:=β[1.0,2.0,3.0]), x[0]
#eval β! (x:=β[1.0,2.0,3.0]), βxββΒ²
#eval β! (x:=β[1.0,2.0,3.0]), βͺx, β[0.0,1.0,0.0]β«
def matrix1 := β[1.0,2.0;3.0,4.0]
#eval β! (A:=matrix1), A[0,1]
#eval β! (A:=matrix1), βAββΒ²
#eval β! (A:=matrix1), βͺA, β[0.0,1.0;0.0,0.0]β«
-
Previously we computed \(\sqrt{y}\) using Newton's method. Similarly we can
mySqrt
Computesqrt y
using gradient descent by minimizing objective functionf := fun x => (x^2 - y)^2
Add solution to gradient descent
-
Linear regression via gradient descent. Find matrix \( A \in \mathbb{R}^{2\times 2} \) that for given data \( x_i, y_i \in \mathbb{R}^2 \) minimizes
$$A = \text{argmin}_B \sum_i \| B x_i - y_i \|^2 $$
set_default_scalar Float
def linreg {n : β} (x y : Float^[2]^[n]) : Float^[2,2] :=
let loss := fun (A : Float^[2,2]) =>
β i, β(β i' => β j, A[i',j] * x[i][j]) - y[i]ββΒ²
sorry
-
Write down Euler-Lagrange equation over abstract vector space
X
and show that for lagrangianL x v := 1/2 * m * βvββΒ² - Ο x
the Euler-Langran equation ism * β (β x) t = - β Ο x
Either define the Lagrangian over βΓβ
, L : βΓβ β βΓβ β β
or you can introduce abstract vector space X
using this variable command
variable {X} [NormedAddCommGroup X] [AdjointSpace β X] [CompleteSpace X]
The explanation of these typeclasses will be discussed in the last section "Abstract Vector Spaces".
set_default_scalar β
noncomputable
def EulerLagrange (L : X β X β β) (x : β β X) (t : β) :=
let v := β x
β (t':=t), (β (v':=v t'), L (x t') v') - β (x':=x t), L x' (v t)
noncomputable
def NewtonsLaw (m : β) (Ο : X β β) (x : β β X) (t : β) :=
m β’ (β (β x) t) + (β Ο (x t))
-- example
-- (x : β β X) (hx : ContDiff β β€ x)
-- (Ο : X β β) (hΟ : Differentiable β Ο) :
-- EulerLagrange (fun x v => m/2 * βvββΒ² - Ο x) x t
-- =
-- NewtonsLaw m Ο x t := by
-- unfold EulerLagrange NewtonsLaw deriv fgradient; fun_trans [smul_smul]
-- sorry
Derivative Rules
A commong issue when fun_trans
is not doing what we expect is that there is a missing differentiation theorem.
For example, if we define a function
def foo (x : β) := x^2 + x
then nothing happens when we try to differentiate it
#check β! x, foo x
fun x => (β (x:=x), foo x) 1 : β β β
Turning on the fun_trans
trace reveals useful information
set_option trace.Meta.Tactic.fun_trans true in
#check β! x, foo x
[Meta.Tactic.fun_trans] [β] β (x:=x), foo x [Meta.Tactic.fun_trans] candidate theorems for foo: [] [Meta.Tactic.fun_trans] candidate local theorems for foo: [] [Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv] [Meta.Tactic.fun_trans] [β] applying: isContinuousLinearMap_fderiv [Meta.Tactic.fun_trans] isContinuousLinearMap_fderiv, failed to discharge hypotheses SciLean.IsContinuousLinearMap β fun x => foo x
The β
on the first line signifies that fun_trans
failed to make prograss on β (x:=x), foo x
. The next two lines
[Meta.Tactic.fun_trans] candidate theorems for foo: [] [Meta.Tactic.fun_trans] candidate local theorems for foo: []
states that there are no derivative theorems for foo
. The next line
[Meta.Tactic.fun_trans] candidate fvar theorems: [isContinuousLinearMap_fderiv]
states that there is a potentially applicable theorem isContinuousLinearMap_fderiv
which can differentiate linear functions. However the next few lines report that applying this theorem failed as fun_trans
can't prove that foo
is (continuous) linear map.
To remedy this problem we can define derivative of foo
def foo_deriv (x : β) := 2*x + 1
and add a theorem that the derivative of foo
is equal to foo_deriv
open SciLean
@[fun_trans]
theorem foo_deriv_rule :
fderiv β foo
=
fun x => fun dx =>L[β] dx β’ foo_deriv x := β’ β foo = fun x => fun dx =>L[β] dx β’ foo_deriv x
β’ β x, (x ^ 2 + x) = fun x => fun dx =>L[β] dx β’ (2 * x + 1); h.h
x : β
β’ (β (x:=x), (x ^ 2 + x)) 1 = (fun dx =>L[β] dx β’ (2 * x + 1)) 1; All goals completed! π
Because foo_deriv_rule
is marked with fun_trans
attribute it will be used when we try to differentiate foo
now
#check β! x, foo x
fun x => foo_deriv x : β β β
Unfortuantelly there is one more issue, fun_trans
will do nothing when we try to compose foo
together
#check β! x, foo (foo x)
fun x => (β (x:=x), foo (foo x)) 1 : β β β
set_option trace.Meta.Tactic.fun_trans true in
#check β! x, foo (foo x)
... [Meta.Tactic.fun_trans] trying comp theorem SciLean.fderiv.comp_rule [Meta.Tactic.fun_trans] [β] applying: SciLean.fderiv.comp_rule [Meta.Tactic.fun_trans] SciLean.fderiv.comp_rule, failed to discharge hypotheses Differentiable β fun x0 => foo x0 ...
The trace reveals that fun_trans
tries to apply composition(chain) rule SciLean.fderiv.comp_rule
but it fails as it can't prove Differentiable β fun x0 => foo x0
. We need another theorem stating that foo
is differentiable function. Mathlib has a tactic fun_prop
that can prove differentiability and many other function properties like linearity, continuity, measurability etc. and fun_trans
uses this tactic to ensure it can apply chain rule.
We need to add fun_prop
theorem for foo
@[fun_prop]
theorem foo_differentiable : Differentiable β foo := β’ Differentiable β foo β’ Differentiable β fun x => x ^ 2 + x; All goals completed! π
Now fun_trans
knows that foo
is differentiable function and can safely apply chain rule
#check (β! x, foo (foo x)) rewrite_by nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xβββΈ : β nβ : β xβββ· : β Γ β dxββ : β Γ β fββ΅ : β β β xβββΆ : β fββ΄ : β Γ β β β xβββ΅ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββ΄ : β yββΒΉ : β g : β Γ β β β xββΒ³ : β yββ : β fβΒ² : β β β β β xββΒ² : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.99 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xββΒΉ : β Γ β xββ : β Γ β xβ : β Γ β y : β Γ β
| fun x => foo_deriv x * foo_deriv (foo x)
fun x => foo_deriv x * foo_deriv (foo x) : β β β
Writing these theorems by hand is quite tedious so there is a macro def_fun_prop
and def_fun_trans
to help you with writing these theorems
def_fun_prop : Differentiable β foo by nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xβββΈ : β nβ : β xβββ· : β Γ β dxββ : β Γ β fββ΅ : β β β xβββΆ : β fββ΄ : β Γ β β β xβββ΅ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββ΄ : β yββΒΉ : β g : β Γ β β β xββΒ³ : β yββ : β fβΒ² : β β β β β xββΒ² : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.99 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xββΒΉ : β Γ β xββ : β Γ β xβ : β Γ β y : β Γ β
β’ Differentiable β fun x => x ^ 2 + x; All goals completed! π
def_fun_trans : fderiv β foo by nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xβββΈ : β nβ : β xβββ· : β Γ β dxββ : β Γ β fββ΅ : β β β xβββΆ : β fββ΄ : β Γ β β β xβββ΅ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββ΄ : β yββΒΉ : β g : β Γ β β β xββΒ³ : β yββ : β fβΒ² : β β β β β xββΒ² : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.1595 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xββΒΉ : β Γ β xββ : β Γ β xβ : β Γ β y : β Γ β
| β x, (x ^ 2 + x); nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xβββΈ : β nβ : β xβββ· : β Γ β dxββ : β Γ β fββ΅ : β β β xβββΆ : β fββ΄ : β Γ β β β xβββ΅ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββ΄ : β yββΒΉ : β g : β Γ β β β xββΒ³ : β yββ : β fβΒ² : β β β β β xββΒ² : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.1595 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xββΒΉ : β Γ β xββ : β Γ β xβ : β Γ β y : β Γ β
| fun x => fun x_1 =>L[β] 2 * x_1 * x + x_1
It generates these theorems and definition
#check foo.arg_x.Differentiable_rule
#print foo.arg_x.fderiv
#check foo.arg_x.fderiv_rule
The problem of writing appropriate theorems for fun_trans
and fun_prop
is quite involved problem and will be discussed in future chapter.
Differentiating Division, Log, Sqrt, ...
So far we have worked with functions that are smooth. However, functions like division, log
, sqrt
, βΒ·ββ
are not differentiable everywhere. We have to be a bit careful with those functions because SciLean tries to perform computations that are, at least in principle, fully formalizable. Let's try to differentiate division
#check β! (x:β), 1/x
fun x => (β (x:=x), xβ»ΒΉ) 1 : β β β
We did not get expected -xβ»Β²
. When differentiation, or any tactic, is not doing what we expect we can turn on tracing. Let's try again with tracing on
set_option trace.Meta.Tactic.fun_trans true in
#check β! (x:β), 1/x
and the beggining of the trace is saying that fun_trans
tried to apply theorem HDiv.hDiv.arg_a0a1.fderiv_rule_at
however it failed to discharge x β 0
[Meta.Tactic.fun_trans] [β] β (x:=x), 1 / x [Meta.Tactic.fun_trans] candidate theorems for HDiv.hDiv: [HDiv.hDiv.arg_a0a1.fderiv_rule_at] [Meta.Tactic.fun_trans] [β] applying: HDiv.hDiv.arg_a0a1.fderiv_rule_at [Meta.Tactic.fun_trans] [β] discharging: x β 0 [Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at, failed to discharge hypotheses x β 0
This makes sense as division 1/x
is well defined and differentiable only away from zero. Therefore we have to differentiate it at a concrete point that is not equal to zero.
variable (xβ : β) (hxβ : xβ β 0)
#check (β (x:=xβ), 1/x) rewrite_by nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xβββΉ : β nβ : β xβββΈ : β Γ β dxββ : β Γ β fββ΅ : β β β xβββ· : β fββ΄ : β Γ β β β xβββΆ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββ΅ : β yββΒΉ : β g : β Γ β β β xβββ΄ : β yββ : β fβΒ² : β β β β β xββΒ³ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.1219 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xββΒ² : β Γ β xββΒΉ : β Γ β xββ : β Γ β y : β Γ β xβ : β hxβ : xβ β 0
| -(xβ ^ 2)β»ΒΉ
-(xβ ^ 2)β»ΒΉ : β
We introduced a point xβ
and assumption hxβ
that it is not equal to zero. By default fun_trans
does not see this assumption and we have to provide discharger. A discharger is any tactic that tries to satisfy(discharge) any assumption of the theorems fun_trans
is using. In this simple case assumption
tactic is enough as it just looks through the local context and tries to directly apply any existing assumptions.
Using assumption
is not enough for a more complicated expression
variable (xβ : β) (hxβ : xβ β 0)
#check (β (x:=xβ), 1/x^2) rewrite_by nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xββΒΉβ° : β nβ : β xβββΉ : β Γ β dxββ : β Γ β fββ΅ : β β β xβββΈ : β fββ΄ : β Γ β β β xβββ· : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββΆ : β yββΒΉ : β g : β Γ β β β xβββ΅ : β yββ : β fβΒ² : β β β β β xβββ΄ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.1228 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xββΒ³ : β Γ β xββΒ² : β Γ β xββΒΉ : β Γ β y : β Γ β xββ : β hxββ : xββ β 0 xβ : β hxβ : xβ β 0
| β (x:=xβ), (x ^ 2)β»ΒΉ
β (x:=xβ), (x ^ 2)β»ΒΉ : β
tracing shows
[Meta.Tactic.fun_trans] HDiv.hDiv.arg_a0a1.fderiv_rule_at, failed to discharge hypotheses xβ ^ 2 β 0
We need a tactic that is capable of infering (xβ^2 β 0)
from (xβ β 0)
. A very general and powerful tactic is aesop
variable (xβ : β) (hxβ : xβ β 0)
#check (β (x:=xβ), 1/x^2) rewrite_by nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xββΒΉΒΉ : β nβ : β xββΒΉβ° : β Γ β dxββ : β Γ β fββ΅ : β β β xβββΉ : β fββ΄ : β Γ β β β xβββΈ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββ· : β yββΒΉ : β g : β Γ β β β xβββΆ : β yββ : β fβΒ² : β β β β β xβββ΅ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.1237 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xβββ΄ : β Γ β xββΒ³ : β Γ β xββΒ² : β Γ β y : β Γ β xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β hxβ : xβ β 0
| -(2 * xβ) / (xβ ^ 2) ^ 2
-(2 * xβ) / (xβ ^ 2) ^ 2 : β
In multiple dimensions we often want to differentiate the norm
open SciLean Scalar
variable (xβ : βΓβ) (hxβ : xβ β 0)
#check (β (x:=xβ), βxββ) rewrite_by nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββ΅ : β β β xββΒΉβ° : β fββ΄ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββΈ : β yββΒΉ : β g : β Γ β β β xβββ· : β yββ : β fβΒ² : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.1289 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0
| (<β (x:=xβ), βxββ[β]).2 1; fun_trans (disch:=All goals completed! π)
The fact that norm is not differentiable can cause issues. The common practice when writing numerical algorithms is to regularize norm using a small positive Ξ΅
.
open SciLean Scalar
variable (Ξ΅ : β) (hΞ΅ : 0 < Ξ΅)
#check (β (x:βΓβ), sqrt (Ξ΅ + βxββΒ²)) rewrite_by
nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββ΅ : β β β xββΒΉβ° : β fββ΄ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββΈ : β yββΒΉ : β g : β Γ β β β xβββ· : β yββ : β fβΒ² : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β X : Type ?u.1326 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒΉ : X fβ : X β β xβ : X n : β f : β Γ β β β hf : Differentiable β f x : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅
| β x, sqrt (Ξ΅ + βxββΒ²)
fun_trans (disch:=funProp.discharger
nβΒ³ : β xββ΄ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββ΅ : β β β xββΒΉβ° : β fββ΄ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββΈ : β yββΒΉ : β g : β Γ β β β xβββ· : β yββ : β fβΒ² : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ³ : β β β t : β X : Type ?u.1326 instβΒ² : NormedAddCommGroup X instβΒΉ : AdjointSpace β X instβ : CompleteSpace X fβΒΉ : X β β xβΒ² : X fβ : X β β xβΒΉ : X n : β f : β Γ β β β hf : Differentiable β f xβ : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅ x : β Γ β
β’ Ξ΅ + βxββΒ² β 0; All goals completed! π)
fun w => fun x =>L[β] 2 * βͺx, wβ«_β / (2 * sqrt (Ξ΅ + βwββΒ²)) : β Γ β β β Γ β βL[β] β
Figuring out the right tactic like intro x; nlinarith [norm2_nonneg β x]
can be difficult. Therefore introduce tactic/discharger unsafe_ad_disch
that assumes commong AD assumptions and reports them back to user.
Create unsafe mode differentiation which assumes that everything works out. Effectivelly this requires discharger that recognize commong goals that should be sorries or postponed.
Exercises
-
gradient of energy for n-body system, newton's potential, lenard jones potential
-
do it for two particles
-
do it for n-particles
-
-
signed distance function
-
compute gradient of sphere sdf
-
compute mean and gaussian curvature of sphere
-
pick SDF from https://iquilezles.org/articles/distfunctions/
-
most of them involve functions that are not differentiable everywhere
-
compute derivative in unsafe mode
-
figure out the minimal condition under which it is differentiable
-
-
Abstract Vector Spaces
In calculus we usually consider only functions \((f : \mathbb{R}^n \rightarrow \mathbb{R}^m) \). The issue is that on computers the type \( \mathbb{R}^n \) can have multiple different realizations. For example \( \mathbb{R}^3 \) can be modeled by FloatΓFloatΓFloat
, Float^[3]
or FloatΓFloat^[2]
. They are all equivalent but in code we have to explicitely convert between these types. For this reason it is better to work with abstract vector spaces instead of with \( \mathbb{R}^n \).
Fortunately mathlib's derivative fderiv
is already defined for a function (f : X β Y)
between two abstract vector spaces X
and Y
over a field π
. Mathlib's way of introducing an abstract vector space is rather involved and we need to spend some time talking about it. This presentation will be rather simplified. For interested reader we provide references at the end of this section that go over mathlib's algebraic stracutes in more detail.
A vector space X
is a set with operations +,-,β’,0
such that
β (x y z : X), x + (y + z) = (x + y) + z β (x y : X), x + y = y + x β (x : X), x + 0 = 0 β (x : X), x + (-x) = 0 β (r s : π) (x : X), r β’ (s β’ x) = (r * s) β’ x( β (x : X), 1 β’ x = x β (r : π) (x y : X), r β’ (x + y) = r β’ x + r β’ y β (r s : π) (x : X), (r + s) β’ x = r β’ x + s β’ x
in mathlib the axioms talking about addition and negation are captured by the type class AddCommGroup
and the aximps talking about scalar multiplication are captured by the type class Module
. Therefore if we want to introduce a new abstract vector space over a field R
we have to introduce these variables
variable
{π : Type} [Field π]
{X : Type} [AddCommGroup X] [Module π X]
example (s r : π) (x y : X) :
(s + r) β’ (x + y) = s β’ x + r β’ x + s β’ y + r β’ y := nβΒ³ : β xββ΄ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββ΅ : β β β xββΒΉβ° : β fββ΄ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fβΒ³ : β β β β β xβββΈ : β yββΒΉ : β g : β Γ β β β xβββ· : β yββ : β fβΒ² : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒ² : β β β tβ : β L : β β β β β xβΒ³ : β β β t : β Xβ : Type ?u.2030 instββ΅ : NormedAddCommGroup Xβ instββ΄ : AdjointSpace β Xβ instβΒ³ : CompleteSpace Xβ fβΒΉ : Xβ β β xβΒ² : Xβ fβ : Xβ β β xβΒΉ : Xβ n : β f : β Γ β β β hf : Differentiable β f xβ : β yβΒΉ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β yβ : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅ π : Type instβΒ² : Field π X : Type instβΒΉ : AddCommGroup X instβ : Module π X s : π r : π x : X y : X
β’ (s + r) β’ (x + y) = s β’ x + r β’ x + s β’ y + r β’ y
All goals completed! π
When we want to differentiate a function (f : X β Y)
between two vector spaces we also need that X
and Y
are equiped with a norm. For this purpose there is NormedAddCommGroup
which equips AddCommGroup
with a norm and guarantees that it compatible with addition and negation, and NormedSpace
which equips Module
with a norm and guarentees that it is compatible with scalar multiplication. Furthermore, we have to restric to a filed π
that is either real numbers β
or complex numbers β
. The type class RCLike
states exactly that. Therefore when we work with derivative in general setting the code usually looks like this
variable
{π : Type} [RCLike π]
{X : Type} [NormedAddCommGroup X] [NormedSpace π X]
{Y : Type} [NormedAddCommGroup Y] [NormedSpace π Y]
set_default_scalar π
example (f g : X β Y) (hf : Differentiable π f) (hg : Differentiable π g) :
(β x, (f x + g x)) = (β f) + (β g) := nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββΆ : β β β xββΒΉβ° : β fββ΅ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fββ΄ : β β β β β xβββΈ : β yββΒΉ : β gβ : β Γ β β β xβββ· : β yββ : β fβΒ³ : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β Xβ : Type ?u.3556 instββ· : NormedAddCommGroup Xβ instββΆ : AdjointSpace β Xβ instββ΅ : CompleteSpace Xβ fβΒ² : Xβ β β xβΒΉ : Xβ fβΒΉ : Xβ β β xβ : Xβ n : β fβ : β Γ β β β hfβ : Differentiable β fβ x : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅ π : Type instββ΄ : RCLike π X : Type instβΒ³ : NormedAddCommGroup X instβΒ² : NormedSpace π X Y : Type instβΒΉ : NormedAddCommGroup Y instβ : NormedSpace π Y f : X β Y g : X β Y hf : Differentiable π f hg : Differentiable π g
β’ β x, (f x + g x) = β f + β g h.h
nβΒ³ : β xββ΄ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββΆ : β β β xββΒΉβ° : β fββ΅ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fββ΄ : β β β β β xβββΈ : β yββΒΉ : β gβ : β Γ β β β xβββ· : β yββ : β fβΒ³ : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ³ : β β β t : β Xβ : Type ?u.3556 instββ· : NormedAddCommGroup Xβ instββΆ : AdjointSpace β Xβ instββ΅ : CompleteSpace Xβ fβΒ² : Xβ β β xβΒ² : Xβ fβΒΉ : Xβ β β xβΒΉ : Xβ n : β fβ : β Γ β β β hfβ : Differentiable β fβ xβ : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅ π : Type instββ΄ : RCLike π X : Type instβΒ³ : NormedAddCommGroup X instβΒ² : NormedSpace π X Y : Type instβΒΉ : NormedAddCommGroup Y instβ : NormedSpace π Y f : X β Y g : X β Y hf : Differentiable π f hg : Differentiable π g x : X dx : X
β’ (β (x:=x), (f x + g x)) dx = ((β f + β g) x) dx; All goals completed! π
When working with gradients we also need inner product as adjoint
is defined through inner product. Unfortunately, here we diverge from mathlib a little bit. Mathlib defines InnerProductSpace
which equips NormedSpace
with inner product. Understandably InnerProductSpace
requires that the βͺx,xβ« = βxβΒ²
however mathlib made the unfortunate decision by definin norm on produce spaces as β(x,y)β = max βxβ βyβ
which is incompatible with the inner product structure. Therefore type like βΓβ
can't be equiped with InnerProductSpace
. Because of these issues, SciLean introduces AdjointSpace
which is almost identical to InnerProductSpace
but it only requires that the norm induced by inner product is equivalend to the existing norm i.e. β (c d : ββΊ), β x, c * βͺx,xβ« β€ βxβ^2 β€ d * βͺx,xβ«
. On AdjointSpace
SciLean introduces Lβ-norm βxββ := sqrt βͺx,xβ«
which you have seen already and it is the norm you most likely want to use instead of the default norm βxβ
. Therfore when we work with gradient in general setting the code usually looks like this
open SciLean
variable
{π : Type} [RCLike π]
{X : Type} [NormedAddCommGroup X] [AdjointSpace π X] [CompleteSpace X]
{Y : Type} [NormedAddCommGroup Y] [AdjointSpace π Y] [CompleteSpace Y]
set_default_scalar π
example (f g : X β Y) (hf : Differentiable π f) (hg : Differentiable π g) :
(β x, (f x + g x)) = (β f) + (β g) := nβΒ³ : β xβΒ³ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββΆ : β β β xββΒΉβ° : β fββ΅ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fββ΄ : β β β β β xβββΈ : β yββΒΉ : β gβ : β Γ β β β xβββ· : β yββ : β fβΒ³ : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xβΒ² : β β β t : β Xβ : Type ?u.5960 instββΉ : NormedAddCommGroup Xβ instββΈ : AdjointSpace β Xβ instββ· : CompleteSpace Xβ fβΒ² : Xβ β β xβΒΉ : Xβ fβΒΉ : Xβ β β xβ : Xβ n : β fβ : β Γ β β β hfβ : Differentiable β fβ x : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅ π : Type instββΆ : RCLike π X : Type instββ΅ : NormedAddCommGroup X instββ΄ : AdjointSpace π X instβΒ³ : CompleteSpace X Y : Type instβΒ² : NormedAddCommGroup Y instβΒΉ : AdjointSpace π Y instβ : CompleteSpace Y f : X β Y g : X β Y hf : Differentiable π f hg : Differentiable π g
β’ β x, (f x + g x) = β f + β g
h.h
nβΒ³ : β xββ΅ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββΆ : β β β xββΒΉβ° : β fββ΅ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fββ΄ : β β β β β xβββΈ : β yββΒΉ : β gβ : β Γ β β β xβββ· : β yββ : β fβΒ³ : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xββ΄ : β β β t : β Xβ : Type ?u.5960 instββΉ : NormedAddCommGroup Xβ instββΈ : AdjointSpace β Xβ instββ· : CompleteSpace Xβ fβΒ² : Xβ β β xβΒ³ : Xβ fβΒΉ : Xβ β β xβΒ² : Xβ n : β fβ : β Γ β β β hfβ : Differentiable β fβ xβΒΉ : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅ π : Type instββΆ : RCLike π X : Type instββ΅ : NormedAddCommGroup X instββ΄ : AdjointSpace π X instβΒ³ : CompleteSpace X Y : Type instβΒ² : NormedAddCommGroup Y instβΒΉ : AdjointSpace π Y instβ : CompleteSpace Y f : X β Y g : X β Y hf : Differentiable π f hg : Differentiable π g x : X xβ : Y
β’ β (x:=x;xβ), (f x + g x) = (β f + β g) x xβ; h.h
nβΒ³ : β xββ΅ : β nβΒ² : β nβΒΉ : β xββΒΉΒ² : β nβ : β xββΒΉΒΉ : β Γ β dxββ : β Γ β fββΆ : β β β xββΒΉβ° : β fββ΅ : β Γ β β β xβββΉ : β Γ β dxβ : β Γ β fββ΄ : β β β β β xβββΈ : β yββΒΉ : β gβ : β Γ β β β xβββ· : β yββ : β fβΒ³ : β β β β β xβββΆ : β yβ : β Lβ : β β β β β yβΒΉ : β β β tβ : β L : β β β β β xββ΄ : β β β t : β Xβ : Type ?u.5960 instββΉ : NormedAddCommGroup Xβ instββΈ : AdjointSpace β Xβ instββ· : CompleteSpace Xβ fβΒ² : Xβ β β xβΒ³ : Xβ fβΒΉ : Xβ β β xβΒ² : Xβ n : β fβ : β Γ β β β hfβ : Differentiable β fβ xβΒΉ : β yβ : β xβββ΅ : β Γ β xβββ΄ : β Γ β xββΒ³ : β Γ β y : β Γ β xββΒ² : β hxββΒ² : xββΒ² β 0 xββΒΉ : β hxββΒΉ : xββΒΉ β 0 xββ : β hxββ : xββ β 0 xβ : β Γ β hxβ : xβ β 0 Ξ΅ : β hΞ΅ : 0 < Ξ΅ π : Type instββΆ : RCLike π X : Type instββ΅ : NormedAddCommGroup X instββ΄ : AdjointSpace π X instβΒ³ : CompleteSpace X Y : Type instβΒ² : NormedAddCommGroup Y instβΒΉ : AdjointSpace π Y instβ : CompleteSpace Y f : X β Y g : X β Y hf : Differentiable π f hg : Differentiable π g x : X xβ : Y
β’ (<β (x:=x), (f x + g x)).2 xβ = ((fun x dy => (<β f x).2 dy) + fun x dy => (<β g x).2 dy) x xβ; All goals completed! π
For interested reader we recommend reading the chapter Hierachies from Mathematics in Lean which explains how mathlib approaches algebraic hierachies like monoids, groups or modules. After reading that we recommend reading Differential Calculus in Normed Spaces which how NormedSpace
is structured.