2.3. Function Transformation

In this chapter we will look under the lid of the tactic fun_trans, a tactic for general function transformation. Explain the main idea behind it, how to define your own function transformations and how does it work internally.

So far we have encoutered couple of function transfomations fderiv, fwdFDeriv, revFDeriv, adjoint. In general, function transformation should be a higher order function F taking a function and producing a function that should satisfy

$$F(f \circ g) = F(f) \circ F(g) $$

where the composition $ \circ $ should be some kind of generalized composition.

Examples of composition rules for function transformations we have see so far

      (∂ (f ∘ g) x dx) = (∂ f (g x) (∂ g x dx))

     (∂> (f ∘ g) x dx) = let (y,dy) := ∂> g x dx
                         (∂> f y dy)

        (<∂ (f ∘ g) x) = let (y,dg) := <∂ g x;
                         let (z,df) := <∂ f y;
                         (z, dg ∘ df)

     adjoint ℝ (A ∘ B) = adjoint ℝ B ∘ adjoint ℝ A

Also F should preserve identity

$$F(\text{id}) = \text{id} $$

where $ \text{id} $ is some generalized notion of identity function

Examples of identity rules for function transformations we have see so far

  (fderiv ℝ (fun x : ℝ => x)) = (fun _ => fun dx =>L[ℝ] dx)

        (∂> (fun x : ℝ => x)) = fun x dx => (x,dx)

        (<∂ (fun x : ℝ => x)) = fun x => (x, fun dx => dx)

 (adjoint ℝ (fun x : ℝ => x)) = fun x => x

2.3.1. Implementing Forward Mode AD

Let's demonstrate fun_trans capability by defining a slight variant of forward mode derivative

@[fun_trans]
noncomputable
def fwdFDeriv' (f : X → Y) : X×X → Y×Y :=
  fun xdx => (f xdx.1, fderiv R f xdx.1 xdx.2)

The only difference is that fwdFDeriv' is uncurried version of fwdFDeriv.

My marking the definition with @[fun_trans] attribute we let Lean know that it is a function transformation. For fun_trans to work properly with this newly defined function transformation we have to state its transformation rules/theorems. There are couple types

Lambda theorems. Theorems about basic lambda terms like indentity fun x => x, constant fun x => y, composition fun x => f (g x) and few more.
Function theorems. Theorems about functions like addition, multiplication etc.
Free variable theorems. These theorems usually state a transformation rule under some special condition. Like derivative of linear function is the function itself.
Morphism theorems. Theorems about bundled morphisms. These are very special theorems that dealing with bundled morphism i.e. functions satisfying certain property like fun x : ℝ =>L[ℝ] 2*x which is a continuous linear function.

We will demonstrate

2.3.2. Lambda Theorems

The tactic fun_trans transforms a complicated functions by first decomposing them into a sequence of function compositions and then transforms each of those function separetely. For example when we want to transform function (fun (x : ℝ) => ((2*x) + x)^2) then fun_trans effectivelly rewrites it as

example :
  (fun (x : ℝ) => ((2*x) + x)^2)
  =
  (fun x => x^2) ∘ (fun (x,y) => x + y) ∘ (fun x => (2*x,x)) := byR:Typeinst✝⁶:RealScalar RX:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:NormedSpace R XY:Typeinst✝³:NormedAddCommGroup Yinst✝²:NormedSpace R YZ:Typeinst✝¹:NormedAddCommGroup Zinst✝:NormedSpace R Z⊢ (fun x => (2 * x + x) ^ 2) =
  (fun x => x ^ 2) ∘
    (fun x =>
        match x with
        | (x, y) => x + y) ∘
      fun x => (2 * x, x) rflAll goals completed! 🐙

Therefore fun_trans needs to know how to handle transfoming composition but also to transform (fun x => (2*x,x)) it ineeds to know how to transform pairing of two functions fun x => 2*x and fun x => x.

Lambda theorems are responsible for reducing the problem of transforming function like (fun (x : ℝ) => ((2*x) + x)^2) into transformation of all the elementary functions (fun x => x^2), (fun (x,y) => x+y) and (fun x => 2*x). Those are handled by function theorems and we will talk about those a bit later.

There are six basic lambda theorems saying how lambda terms are transformed

Identity rule for transforming fun (x : X) => x
Constant rule for transforming fun (x : X) => y
Composition rule for transforming fun (x : X) => f (g x)
Let rule for transforming fun (x : X) => let y := g x; f x y
Apply rule for transforming fun (f : X → Y) => f x
Pi rule for transforming fun (x : X) (y : Y) => f x y

There are also three theorems about product type which technically speaking fall under the category of "function theorems" but they are so fundamental that we list them here

Product rule for transforming fun (x : X) => (f x, g x)
First projection rule for transforming fun (xy : X×Y) => xy.1
Second projection rule for transforming fun (xy : X×Y) => xy.2

The collection of these theorems allows fun_trans to transform any complicated lambda expression if it can transform all the atomic functions in the lambda expression.

Let us give an example of few lambda theorems for fwdFDeriv'

theorem fwdFDeriv'.id_rule :
    fwdFDeriv' R (fun x : X => x) = fun xdx => xdx := byR:Typeinst✝²:RealScalar RX:Typeinst✝¹:NormedAddCommGroup Xinst✝:NormedSpace R X⊢ (fwdFDeriv' R fun x => x) = fun xdx => xdx
  unfold fwdFDeriv'R:Typeinst✝²:RealScalar RX:Typeinst✝¹:NormedAddCommGroup Xinst✝:NormedSpace R X⊢ (fun xdx => ((fun x => x) xdx.1, (∂ (x:=xdx.1), x) xdx.2)) = fun xdx => xdx; fun_transAll goals completed! 🐙

theorem fwdFDeriv'.comp_rule
    (f : Y → Z) (g : X → Y)
    (hf : Differentiable R f) (hg : Differentiable R g) :
    fwdFDeriv' R (fun x : X => f (g x))
    =
    fun xdx =>
      let ydy := fwdFDeriv' R g xdx
      fwdFDeriv' R f ydy := byR:Typeinst✝⁶:RealScalar RX:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:NormedSpace R XY:Typeinst✝³:NormedAddCommGroup Yinst✝²:NormedSpace R YZ:Typeinst✝¹:NormedAddCommGroup Zinst✝:NormedSpace R Zf:Y → Zg:X → Yhf:Differentiable R fhg:Differentiable R g⊢ (fwdFDeriv' R fun x => f (g x)) = fun xdx =>
  let ydy := fwdFDeriv' R g xdx;
  fwdFDeriv' R f ydy
  unfold fwdFDeriv'R:Typeinst✝⁶:RealScalar RX:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:NormedSpace R XY:Typeinst✝³:NormedAddCommGroup Yinst✝²:NormedSpace R YZ:Typeinst✝¹:NormedAddCommGroup Zinst✝:NormedSpace R Zf:Y → Zg:X → Yhf:Differentiable R fhg:Differentiable R g⊢ (fun xdx => ((fun x => f (g x)) xdx.1, (∂ (x:=xdx.1), f (g x)) xdx.2)) = fun xdx =>
  let ydy := (g xdx.1, (∂ g xdx.1) xdx.2);
  (f ydy.1, (∂ f ydy.1) ydy.2); fun_transAll goals completed! 🐙

theorem fwdFDeriv'.pi_rule {I} [IndexType I]
    (f : X → I → Y) (hf : ∀ i, Differentiable R (f · i)) :
    fwdFDeriv' R (fun (x : X) (i : I) => f x i)
    =
    fun xdx =>
      let ydy := fun i => fwdFDeriv' R (f · i) xdx
      (fun i => (ydy i).1, fun i => (ydy i).2) := byR:Typeinst✝⁵:RealScalar RX:Typeinst✝⁴:NormedAddCommGroup Xinst✝³:NormedSpace R XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:NormedSpace R YI:Typeinst✝:IndexType If:X → I → Yhf:∀ (i : I), Differentiable R fun x => f x i⊢ (fwdFDeriv' R fun x i => f x i) = fun xdx =>
  let ydy := fun i => fwdFDeriv' R (fun x => f x i) xdx;
  (fun i => (ydy i).1, fun i => (ydy i).2)
  unfold fwdFDeriv'R:Typeinst✝⁵:RealScalar RX:Typeinst✝⁴:NormedAddCommGroup Xinst✝³:NormedSpace R XY:Typeinst✝²:NormedAddCommGroup Yinst✝¹:NormedSpace R YI:Typeinst✝:IndexType If:X → I → Yhf:∀ (i : I), Differentiable R fun x => f x i⊢ (fun xdx => ((fun x i => f x i) xdx.1, (∂ (fun x i => f x i) xdx.1) xdx.2)) = fun xdx =>
  let ydy := fun i => ((fun x => f x i) xdx.1, (∂ (x:=xdx.1), f x i) xdx.2);
  (fun i => (ydy i).1, fun i => (ydy i).2); simp [fderiv.pi_rule f hf]All goals completed! 🐙

2.3.2.1. Exercises

Write down the constant lambda theorems of fwdFDeriv'

Solution

@[fun_trans]
theorem fwdFDeriv'.const_rule (y : Y) :
  fwdFDeriv' R (fun x : X => y)
  =
  fun _ => (y, 0) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yy:Y⊢ (fwdFDeriv' R fun x => y) = fun x => (y, 0) unfold fwdFDeriv'R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yy:Y⊢ (fun xdx => ((fun x => y) xdx.1, (∂ (x:=xdx.1), y) xdx.2)) = fun x => (y, 0); fun_transAll goals completed! 🐙

Write down the product and projection lambda theorems of fwdFDeriv'. Hint: The product rule should be formulated for (fun x : X => (g x, f x)) where g and f are differentiable functions. The projection rule can be formulated for (fun (xy : X×Y) => xy.1)

Solution

@[fun_trans]
theorem Prod.mk.arg_fstsnd.fwdFDeriv'_rule
   (g : X → Y) (f : X → Z)
   (hg : Differentiable R g) (hf : Differentiable R f) :
   fwdFDeriv' R (fun x : X => (g x, f x))
   =
   fun xdx =>
     let ydy := fwdFDeriv' R g xdx
     let zdz := fwdFDeriv' R f xdx
     ((ydy.1,zdz.1),(ydy.2,zdz.2)) := byR:Typeinst✝⁶:RealScalar RX:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:NormedSpace R XY:Typeinst✝³:NormedAddCommGroup Yinst✝²:NormedSpace R YZ:Typeinst✝¹:NormedAddCommGroup Zinst✝:NormedSpace R Zg:X → Yf:X → Zhg:Differentiable R ghf:Differentiable R f⊢ (fwdFDeriv' R fun x => (g x, f x)) = fun xdx =>
  let ydy := fwdFDeriv' R g xdx;
  let zdz := fwdFDeriv' R f xdx;
  ((ydy.1, zdz.1), ydy.2, zdz.2)
  unfold fwdFDeriv'R:Typeinst✝⁶:RealScalar RX:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:NormedSpace R XY:Typeinst✝³:NormedAddCommGroup Yinst✝²:NormedSpace R YZ:Typeinst✝¹:NormedAddCommGroup Zinst✝:NormedSpace R Zg:X → Yf:X → Zhg:Differentiable R ghf:Differentiable R f⊢ (fun xdx => ((fun x => (g x, f x)) xdx.1, (∂ (x:=xdx.1), (g x, f x)) xdx.2)) = fun xdx =>
  let ydy := (g xdx.1, (∂ g xdx.1) xdx.2);
  let zdz := (f xdx.1, (∂ f xdx.1) xdx.2);
  ((ydy.1, zdz.1), ydy.2, zdz.2); fun_transAll goals completed! 🐙

@[fun_trans]
theorem Prod.fst.arg_self.fwdFDeriv'_rule :
    fwdFDeriv' R (fun (xy : X×Y) => xy.1)
    =
    fun xydxy => (xydxy.1.1,xydxy.2.1) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Y⊢ (fwdFDeriv' R fun xy => xy.1) = fun xydxy => (xydxy.1.1, xydxy.2.1)
  unfold fwdFDeriv'R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Y⊢ (fun xdx => ((fun xy => xy.1) xdx.1, (∂ (xy:=xdx.1), xy.1) xdx.2)) = fun xydxy => (xydxy.1.1, xydxy.2.1); fun_transAll goals completed! 🐙

@[fun_trans]
theorem Prod.snd.arg_self.fwdFDeriv'_rule
    :
  fwdFDeriv' R (fun (xy : X×Y) => xy.2)
  =
  fun xydxy => (xydxy.1.2,xydxy.2.2) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Y⊢ (fwdFDeriv' R fun xy => xy.2) = fun xydxy => (xydxy.1.2, xydxy.2.2) unfold fwdFDeriv'R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Y⊢ (fun xdx => ((fun xy => xy.2) xdx.1, (∂ (xy:=xdx.1), xy.2) xdx.2)) = fun xydxy => (xydxy.1.2, xydxy.2.2); fun_transAll goals completed! 🐙

Write down the apply lambda theorems of fwdFDeriv'. Hint: Writhe the rule for the function fun (f : I → X) => f i where I has instance of [IndexType I].

Solution

@[fun_trans]
theorem fwdFDeriv'.apply_rule {I} [IndexType I] (i : I) :
    fwdFDeriv' R (fun (f : I → X) => f i)
    =
    fun fdf => (fdf.1 i, fdf.2 i) := byR:Typeinst✝³:RealScalar RX:Typeinst✝²:NormedAddCommGroup Xinst✝¹:NormedSpace R XI:Typeinst✝:IndexType Ii:I⊢ (fwdFDeriv' R fun f => f i) = fun fdf => (fdf.1 i, fdf.2 i)
  unfold fwdFDeriv'R:Typeinst✝³:RealScalar RX:Typeinst✝²:NormedAddCommGroup Xinst✝¹:NormedSpace R XI:Typeinst✝:IndexType Ii:I⊢ (fun xdx => ((fun f => f i) xdx.1, (∂ (f:=xdx.1), f i) xdx.2)) = fun fdf => (fdf.1 i, fdf.2 i); fun_transAll goals completed! 🐙

Write down the let lambda theorems of fwdFDeriv'. Pay attention the the use of let bindings on the right hand side. Careful use of let bindings is important for the efficiency of the resulting code. Also you do not want to differentiate f w.r.t. to x and y separetely but rather differentiate it w.r.t. x and y simultatinously.

Solution

@[fun_trans]
theorem fwdFDeriv'.let_rule
    (f : X → Y → Z) (hf : Differentiable R ↿f)
    (g : X → Y)  (hg : Differentiable R g) :
    fwdFDeriv' R (fun x => let y := g x; f x y)
    =
    fun xdx =>
      let ydy := fwdFDeriv' R g xdx
      let xydxy := ((xdx.1,ydy.1),(xdx.2,ydy.2))
      fwdFDeriv' R (fun (x,y) => f x y) xydxy := byR:Typeinst✝⁶:RealScalar RX:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:NormedSpace R XY:Typeinst✝³:NormedAddCommGroup Yinst✝²:NormedSpace R YZ:Typeinst✝¹:NormedAddCommGroup Zinst✝:NormedSpace R Zf:X → Y → Zhf:Differentiable R ↿fg:X → Yhg:Differentiable R g⊢ (fwdFDeriv' R fun x =>
    let y := g x;
    f x y) =
  fun xdx =>
  let ydy := fwdFDeriv' R g xdx;
  let xydxy := ((xdx.1, ydy.1), xdx.2, ydy.2);
  fwdFDeriv' R
    (fun x =>
      match x with
      | (x, y) => f x y)
    xydxy
  unfold fwdFDeriv'R:Typeinst✝⁶:RealScalar RX:Typeinst✝⁵:NormedAddCommGroup Xinst✝⁴:NormedSpace R XY:Typeinst✝³:NormedAddCommGroup Yinst✝²:NormedSpace R YZ:Typeinst✝¹:NormedAddCommGroup Zinst✝:NormedSpace R Zf:X → Y → Zhf:Differentiable R ↿fg:X → Yhg:Differentiable R g⊢ (fun xdx =>
    ((fun x =>
          let y := g x;
          f x y)
        xdx.1,
      (∂ (x:=xdx.1),
          let y := g x;
          f x y)
        xdx.2)) =
  fun xdx =>
  let ydy := (g xdx.1, (∂ g xdx.1) xdx.2);
  let xydxy := ((xdx.1, ydy.1), xdx.2, ydy.2);
  ((fun x =>
        match x with
        | (x, y) => f x y)
      xydxy.1,
    (∂ (x:=xydxy.1),
        match x with
        | (x, y) => f x y)
      xydxy.2); fun_transAll goals completed! 🐙

2.3.3. Function Theorems

Once we postulate all the lambda theorems for a new function transformation we have to also postulate theorems for functions like addition, multiplication etc.

Let's start with a function theorem for negation is it is a function of a single argument

@[fun_trans]
theorem Neg.neg.arg_a0.fwdFDeriv'_rule :
    fwdFDeriv' R (fun x : X => - x)
    =
    fun (x,dx) =>
      (-x, -dx) := byR:Typeinst✝²:RealScalar RX:Typeinst✝¹:NormedAddCommGroup Xinst✝:NormedSpace R X⊢ (fwdFDeriv' R fun x => -x) = fun x =>
  match x with
  | (x, dx) => (-x, -dx) unfold fwdFDeriv'R:Typeinst✝²:RealScalar RX:Typeinst✝¹:NormedAddCommGroup Xinst✝:NormedSpace R X⊢ (fun xdx => ((fun x => -x) xdx.1, (∂ (x:=xdx.1), -x) xdx.2)) = fun x =>
  match x with
  | (x, dx) => (-x, -dx); fun_transAll goals completed! 🐙

For a function of two or more arguments we just uncurry the function sufficiently and furmulate the theorem as for a function of single argument. For example the transformation rule for addition is

@[fun_trans]
theorem HAdd.hAdd.arg_a0a1.fwdFDeriv'_rule :
    (fwdFDeriv' R fun xy : X×X => xy.1 + xy.2)
    =
    fun ((x,y),(dx,dy)) =>
      (x + y, dx + dy) := byR:Typeinst✝²:RealScalar RX:Typeinst✝¹:NormedAddCommGroup Xinst✝:NormedSpace R X⊢ (fwdFDeriv' R fun xy => xy.1 + xy.2) = fun x =>
  match x with
  | ((x, y), dx, dy) => (x + y, dx + dy) unfold fwdFDeriv'R:Typeinst✝²:RealScalar RX:Typeinst✝¹:NormedAddCommGroup Xinst✝:NormedSpace R X⊢ (fun xdx => ((fun xy => xy.1 + xy.2) xdx.1, (∂ (xy:=xdx.1), (xy.1 + xy.2)) xdx.2)) = fun x =>
  match x with
  | ((x, y), dx, dy) => (x + y, dx + dy); fun_transAll goals completed! 🐙

Let's define function in three arguments

def fun1 (x y z : R) := exp x * sin (z + y) + z

We do not have to state the transformation rule in all of its arguments at the same time.

@[fun_trans]
theorem fun1.arg_xz.fwdFDeriv'_rule (y : R) :
    fwdFDeriv' R (fun (xz : R×R) => fun1 xz.1 y xz.2)
    =
    fun ((x,z),(dx,dz)) =>
      (fun1 x y z,
       exp x * (dz * cos (z + y))
       +
       dx * exp x * sin (z + y) + dz) := byR:Typeinst✝:RealScalar Ry:R⊢ (fwdFDeriv' R fun xz => fun1 xz.1 y xz.2) = fun x =>
  match x with
  | ((x, z), dx, dz) => (fun1 x y z, exp x * (dz * cos (z + y)) + dx * exp x * sin (z + y) + dz)
  unfold fwdFDeriv' fun1R:Typeinst✝:RealScalar Ry:R⊢ (fun xdx =>
    ((fun xz => exp xz.1 * sin (xz.2 + y) + xz.2) xdx.1, (∂ (xz:=xdx.1), (exp xz.1 * sin (xz.2 + y) + xz.2)) xdx.2)) =
  fun x =>
  match x with
  | ((x, z), dx, dz) => (exp x * sin (z + y) + z, exp x * (dz * cos (z + y)) + dx * exp x * sin (z + y) + dz); fun_transAll goals completed! 🐙

Writing rules for user defined functions like fun1 is tedious and because fun1 is defined in terms of known function the right hands side of this rule should be automatically generated anyway. We will discuss this how to do this a bit later. Here we just wanted to demonstrate that you can indeed formulate the transformation rule only for a subset of the input arguments. The important rule is that the order of the arguments should be maintained e.g. the left hand side should not be fwdFDeriv' R (fun (zx : R×R) => fun1 zx.2 y zx.1).

In some cases the rule can get a bit tricky and there might be additional requirements on the arguments. The canonical example is division, x / y, which can be differentiated only if we divide by non zero y. In this case, the theorem can't be stated as a function in ((x,y),(dx,dy)) anymore. We have to fix particular value and require that the y is non zero.

@[fun_trans]
theorem HDiv.hDiv.arg_a0a1.fwdFDeriv'_rule_simple
    (xydxy : (R×R)×(R×R)) (h : xydxy.1.2 ≠ 0) :
    (fwdFDeriv' R fun xy : R×R => xy.1 / xy.2) xydxy
    =
    let ((x,y),(dx,dy)) := xydxy
    (x/y, (dx*y - x*dy)/ (y^2)) := byR:Typeinst✝:RealScalar Rxydxy:(R × R) × R × Rh:xydxy.1.2 ≠ 0⊢ fwdFDeriv' R (fun xy => xy.1 / xy.2) xydxy =
  match xydxy, h with
  | ((x, y), dx, dy), h => (x / y, (dx * y - x * dy) / y ^ 2)
  unfold fwdFDeriv'R:Typeinst✝:RealScalar Rxydxy:(R × R) × R × Rh:xydxy.1.2 ≠ 0⊢ ((fun xy => xy.1 / xy.2) xydxy.1, (∂ (xy:=xydxy.1), xy.1 / xy.2) xydxy.2) =
  match xydxy, h with
  | ((x, y), dx, dy), h => (x / y, (dx * y - x * dy) / y ^ 2)
  fun_trans (disch:=apply hAll goals completed! 🐙)

Note that the argument to the theorem is (xydxy : (R×R)×(R×R)) instead of (x y dx dy : R). Not writing the product pattern explicitely as ((x,y),(dx,dy)) on the left hand side of the rule makes it easier for fun_trans to apply this transformation rule.

2.3.4. User Defined Functions

2.3.5. Free Variable Theorems

Often we can write down a function transformation rule for a whole class of functions at once. The canonical example of this is that the derivative of liner function is the function itself. For fwdFDeriv' this is whould be

open SciLean
@[fun_trans]
theorem fwdFDeriv'.linear_rule
    (f : X → Y) (hf : IsContinuousLinearMap R f) :
    fwdFDeriv' R f
    =
    fun xdx => (f xdx.1, f xdx.2) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsContinuousLinearMap R f⊢ fwdFDeriv' R f = fun xdx => (f xdx.1, f xdx.2)
  unfold fwdFDeriv'R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsContinuousLinearMap R f⊢ (fun xdx => (f xdx.1, (∂ f xdx.1) xdx.2)) = fun xdx => (f xdx.1, f xdx.2); fun_transAll goals completed! 🐙

The characteristic feature of these theorems is that the left hand side of the rule is about a single free variable function f and the theorem has some nontrivial conditions on the function f.

These theorems are potentially dangerous as the left hand side of the theorem can match an arbitrary function. The tactic fun_trans would be incredibly slow if it tried applying these theorems at every step. Therefore "free variable theorems" are applied only in cases when the function that is being transformed can't be written as non-trivial composition of two functions.

2.3.5.1. Exercises

Write down free variable theorem for differentiating affine function.

Solution

@[fun_trans]
theorem fwdFDeriv'.affine_rule (f : X → Y)
    (hf : IsAffineMap R f) (hf' : Continuous f) :
    fwdFDeriv' R f
    =
    fun xdx => (f xdx.1, f xdx.2 - f 0) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous f⊢ fwdFDeriv' R f = fun xdx => (f xdx.1, f xdx.2 - f 0)
  unfold fwdFDeriv'R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous f⊢ (fun xdx => (f xdx.1, (∂ f xdx.1) xdx.2)) = fun xdx => (f xdx.1, f xdx.2 - f 0); funext (x,dx)hR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:X⊢ (f (x, dx).1, (∂ f (x, dx).1) (x, dx).2) = (f (x, dx).1, f (x, dx).2 - f 0)
  let g := (fun x => f x - f 0)hR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0⊢ (f (x, dx).1, (∂ f (x, dx).1) (x, dx).2) = (f (x, dx).1, f (x, dx).2 - f 0)
  have hf'' : IsContinuousLinearMap R g := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous f⊢ fwdFDeriv' R f = fun xdx => (f xdx.1, f xdx.2 - f 0)
    simp (config:={zetaDelta:=true}) [g]R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0⊢ IsContinuousLinearMap R fun x => f x - f 0
    constructorlinearR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0⊢ IsLinearMap R fun x => f x - f 0contR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0⊢ autoParam (Continuous fun x => f x - f 0) _auto✝
    ·linearR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0⊢ IsLinearMap R fun x => f x - f 0 apply hf.1All goals completed! 🐙
    ·contR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0⊢ autoParam (Continuous fun x => f x - f 0) _auto✝ unfold autoParamcontR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0⊢ Continuous fun x => f x - f 0; fun_propAll goals completed! 🐙
  have h : f = fun x => g x + f 0 := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous f⊢ fwdFDeriv' R f = fun xdx => (f xdx.1, f xdx.2 - f 0)
    simp (config:={zetaDelta:=true})All goals completed! 🐙
  rw[h]hR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0hf'':IsContinuousLinearMap R gh:f = fun x => g x + f 0⊢ ((fun x => g x + f 0) (x, dx).1, (∂ (x:=(x, dx).1), (g x + f 0)) (x, dx).2) =
  ((fun x => g x + f 0) (x, dx).1, (fun x => g x + f 0) (x, dx).2 - (fun x => g x + f 0) 0)
  fun_transhR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XY:Typeinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace R Yf:X → Yhf:IsAffineMap R fhf':Continuous fx:Xdx:Xg:X → Y := fun x => f x - f 0hf'':IsContinuousLinearMap R gh:f = fun x => g x + f 0⊢ (∂ (x:=x), (f x - f 0)) dx = g dx - g 0 -- this does not seem to work :(
  sorry_proofAll goals completed! 🐙

2.3.6. Morphism Theorems

2.3.7. Compositional vs Simple Form

There is also an alternative way of fomulating function theorems in so called "compositional form". The idea is that we introduce and auxiliary type W and parametrize every of the function with this type. The "compositional form" of negation theorem

variable {W} [NormedAddCommGroup W] [NormedSpace R W]

@[fun_trans]
theorem Neg.neg.arg_a0.fwdFDeriv'_rule_compositional
    (f : W → X) (hf : Differentiable R f) :
    fwdFDeriv' R (fun w : W => - f w)
    =
    fun wdw =>
      let xdx := fwdFDeriv' R f wdw
      (-xdx.1, -xdx.2) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Typeinst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → Xhf:Differentiable R f⊢ (fwdFDeriv' R fun w => -f w) = fun wdw =>
  let xdx := fwdFDeriv' R f wdw;
  (-xdx.1, -xdx.2) unfold fwdFDeriv'R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Typeinst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → Xhf:Differentiable R f⊢ (fun xdx => ((fun w => -f w) xdx.1, (∂ (w:=xdx.1), -f w) xdx.2)) = fun wdw =>
  let xdx := (f wdw.1, (∂ f wdw.1) wdw.2);
  (-xdx.1, -xdx.2); fun_transAll goals completed! 🐙

The original form of formulating "function theorems" is called "simple form" or "uncurried form" as for function of multiple arguments has to be uncurried.

In simple cases, it does not really matter which form you use but in more complicated scenarios like dealing with higher order functions the "compositional form" is the only way of formulating the function transformation rules. Also certain function transformations seem to support only "compositional form". At the end of this chapter give an example of such function transformation called vectorize.

The general idea behing the "compositional form" is that we parametrize every argument of the function with some auxiliary type W and then we state the function transformation rule in terms of the argument (w : W).

The theorem for addition in "simple form" This form is also called "uncurried form" as the left hand side contains uncurried version for addition fun (x : X×X) => x.1 + x.2.

The "compositional form" of this theorem is

@[fun_trans]
theorem HAdd.hAdd.arg_a0a1.fwdFDeriv'_rule_compositional
    (f g : W → X)
    (hf : Differentiable R f) (hg : Differentiable R g) :
    (fwdFDeriv' R fun w : W => f w + g w)
    =
    fun xdx =>
      let ydy := fwdFDeriv' R f xdx
      let zdz := fwdFDeriv' R g xdx
      (ydy.1 + zdz.1, ydy.2 + zdz.2) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Typeinst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → Xg:W → Xhf:Differentiable R fhg:Differentiable R g⊢ (fwdFDeriv' R fun w => f w + g w) = fun xdx =>
  let ydy := fwdFDeriv' R f xdx;
  let zdz := fwdFDeriv' R g xdx;
  (ydy.1 + zdz.1, ydy.2 + zdz.2) unfold fwdFDeriv'R:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Typeinst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → Xg:W → Xhf:Differentiable R fhg:Differentiable R g⊢ (fun xdx => ((fun w => f w + g w) xdx.1, (∂ (w:=xdx.1), (f w + g w)) xdx.2)) = fun xdx =>
  let ydy := (f xdx.1, (∂ f xdx.1) xdx.2);
  let zdz := (g xdx.1, (∂ g xdx.1) xdx.2);
  (ydy.1 + zdz.1, ydy.2 + zdz.2); fun_transAll goals completed! 🐙

2.3.8. High Order Functions

Higher order functions are functions that accept function as an input. For example @DataArrayN.mapMono is a function that accepts an array and a function that is applied to every element of that array. It can happen that the function depends on a parameter we are differentiating with respect to. For example

∂ s, DataArrayN.mapMono (fun x => s * x + s) ⊞[1.0, 2.0, 3.0] : Float → Float^[3]#check ∂ (s : Float), ⊞[1.0,2.0,3.0].mapMono (fun x => s * x + s)

In this section we will show how to write differentiation rules for higher order functions.

Let's consider possibly the simplest high order function which is function application

def apply (f : X → X) (x : X) := f x

We want to state that this function is differentiable in f and x. There is an issue, the function f can be arbitrary and differentiability in x is saying that f is differentiable. Therefore we somehow need to constrain the function f. The easiest way of doing this is to state differentiability of apply in compositional form. The idea is that once we have a concrete example the function f and argument x will be parametrizes by some number w. For example

fun w => apply (fun x => w * x + w) (1 + w) : R → R#check fun w : R => apply (fun x : R => w * x + w) (1 + w)

Now it is sufficient to require that x is differentiable as function in w and f is differentiable in x and w. This is exactly expressed in the following theorem

@[fun_prop]
theorem apply.arg_fx.Differentiable_rule
    (f : W → X → X) (hf : Differentiable R (↿f))
    (x : W → X) (hx : Differentiable R x) :
    Differentiable R (fun w => apply (f w) (x w)) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Type u_1inst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → X → Xhf:Differentiable R ↿fx:W → Xhx:Differentiable R x⊢ Differentiable R fun w => apply (f w) (x w) unfold applyR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Type u_1inst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → X → Xhf:Differentiable R ↿fx:W → Xhx:Differentiable R x⊢ Differentiable R fun w => f w (x w); fun_propAll goals completed! 🐙

Therefore, the general idea is that if a function has a function argument f : X → Y we need to parametrize it by some axiliary type W, f : W → X → Y, and require that such function is differentiable in x and w similatiously, hf : Differentiable R (↿f).

When writing theorem for differentiation we do something similar. We compute the derivative fwdFDeriv R (↿f)

@[fun_trans]
theorem apply.arg_fx.fwdFDeriv_rule
    (f : W → X → X) (hf : Differentiable R (↿f))
    (x : W → X) (hx : Differentiable R x) :
    fwdFDeriv R (fun w => apply (f w) (x w))
    =
    fun w dw =>
      let xdx := fwdFDeriv R x w dw
      let f' := fun xdx : X×X =>
        fwdFDeriv R (↿f) (w,xdx.1) (dw,xdx.2)
      apply f' xdx := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Type u_1inst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → X → Xhf:Differentiable R ↿fx:W → Xhx:Differentiable R x⊢ ∂> w, apply (f w) (x w) = fun w dw =>
  let xdx := ∂> x w dw;
  let f' := fun xdx => ∂> (↿f) (w, xdx.1) (dw, xdx.2);
  apply f' xdx
  fun_trans[apply,Function.HasUncurry.uncurry]All goals completed! 🐙

2.3.8.1. Exercises

Define function applyTwice f x = f (f x) and state its differentiability and derivative rule

Solution

@[fun_trans]
def applyTwice (f : X → X) (x : X) := f (f x)

@[fun_prop]
theorem applyTwice.arg_fx.Differentiable_rule
    (f : W → X → X) (hf : Differentiable R (↿f))
    (x : W → X) (hx : Differentiable R x) :
    Differentiable R (fun w => applyTwice (f w) (x w)) := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Type u_1inst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → X → Xhf:Differentiable R ↿fx:W → Xhx:Differentiable R x⊢ Differentiable R fun w => applyTwice (f w) (x w)
  unfold applyTwiceR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Type u_1inst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → X → Xhf:Differentiable R ↿fx:W → Xhx:Differentiable R x⊢ Differentiable R fun w => f w (f w (x w))
  fun_propAll goals completed! 🐙

@[fun_trans]
theorem applyTwice.arg_fx.fwdFDeriv_rule
    (f : W → X → X) (hf : Differentiable R (↿f))
    (x : W → X) (hx : Differentiable R x) :
    fwdFDeriv R (fun w => applyTwice (f w) (x w))
    =
    fun w dw =>
      let xdx := fwdFDeriv R x w dw
      let f' := fun xdx : X×X =>
        (fwdFDeriv R ↿f) (w,xdx.1) (dw,xdx.2)
      applyTwice f' xdx := byR:Typeinst✝⁴:RealScalar RX:Typeinst✝³:NormedAddCommGroup Xinst✝²:NormedSpace R XW:Type u_1inst✝¹:NormedAddCommGroup Winst✝:NormedSpace R Wf:W → X → Xhf:Differentiable R ↿fx:W → Xhx:Differentiable R x⊢ ∂> w, applyTwice (f w) (x w) = fun w dw =>
  let xdx := ∂> x w dw;
  let f' := fun xdx => ∂> (↿f) (w, xdx.1) (dw, xdx.2);
  applyTwice f' xdx
  fun_trans[applyTwice,Function.HasUncurry.uncurry]All goals completed! 🐙

We can't even state the simple version of this theorems