Tensor Operations
In this chapter, we will demonstrate more advanced operations with arrays, such as transformations and reductions. To provide a concrete example, we will build a simple neural network. It's important to note that Lean/SciLean is not yet suitable for running and training neural networks, as it only runs on CPU and the current compiler does not produce the most efficient code. Nevertheless, I believe that writing a simple neural network will nicely demonstrate Lean's expressivity. My secret hope is that this text will motivate someone to write a specialized compiler that will translate a subset of Lean to GPUs.
Transformations and Reductions
One common operation is to transform every element of an array. To do that, we can write a simple for loop. Recall that anytime you want to write imperative-style code, you have to start it with Id.run do
, and to modify input x
mutably, we have to introduce a new mutable variable x'
and assign x
to it:
open SciLean
def map {I : Type} [IndexType I] (x : Float^[I]) (f : Float β Float) := Id.run do
let mut x' := x
for i in fullRange I do
x'[i] := f x'[i]
return x'
A new thing here is that we wrote this function polymorphically in the index type I
. {I : Type}
introduces a new type, and [IndexType I]
adds a requirement that I
behaves as an index. IndexType
is a type class that allows us to do a bunch of things with I
. We have already seen size I
, which tells you the number of elements in I
. There is also IndexType.toFin
and IndexType.fromFin
, which convert i : I
into toFin i : Fin (size I)
and idx : Fin (size I)
to fromFin idx : I
. So the function toFin
allows you to linearize any index I
, and it is the core function used to implement DataArrayN
, as all elements of an array have to be stored linearly in memory.
In fact, SciLean already provides this function under the name mapMono
. The "mono" stands for the fact that the function f
does not change the type; in our case, it accepts and returns Float
. Also, this function is defined in the DataArrayN
namespace, and because of that, we can use the familiar dot notation x.mapMono
. As mapMono
is polymorphic in the shape of the array, we can call it on vectors:
open SciLean Scalar
#eval β[1.0,2.0,3.0].mapMono (fun x => sqrt x)
β[1.000000, 1.414214, 1.732051]
or matrices:
open SciLean Scalar
#eval β[1.0,2.0;3.0,4.0].mapMono (fun x => sqrt x)
β[1.000000, 1.732051, 1.414214, 2.000000]
or higher-rank arrays:
open SciLean Scalar
#check (β (i j k : Fin 2) => i.1.toFloat + 2 * j.1.toFloat + 4 * k.1.toFloat)
β i j k => (βi).toFloat + 2 * (βj).toFloat + 4 * (βk).toFloat : Float^[2, 2, 2]
open SciLean Scalar
#eval (β (i j k : Fin 2) => i.1.toFloat + 2 * j.1.toFloat + 4 * k.1.toFloat).mapMono sqrt
β[0.000000, 1.000000, 1.414214, 1.732051, 2.000000, 2.236068, 2.449490, 2.645751]
where IndexType.toFin (i,j,k)
turns a structured index of type Fin 2 Γ Fin 2 Γ Fin 2
to a linear index of type Fin 8
, .toFloat
converts it to Float
, and finally .map (fun x => sqrt x)
computes the square root of every element.
An alternative to mapMono
is mapIdxMono
, which accepts a function f : I β X β X
, so you can additionally use the index value to transform the array values:
open SciLean Scalar
#eval (0 : Float^[3]) |>.mapIdxMono (fun i _ => i.1.toFloat) |>.mapMono (fun x => sqrt x)
β[0.000000, 1.000000, 1.414214]
where 0 : Float^[3]
creates a zero array of size 3, then mapIdxMono (fun i _ => i.toFloat)
initializes every element to the value of its index, and finally map (fun x => sqrt x)
computes the square root of every element.
The next important operation with arrays is reduction, which runs over elements and reduces them using a provided binary operation. There are two main reductions, x.foldl op init
and x.reduceD op default
. The difference is that SciLean.DataArrayN.foldl
uses init
as the initial value that is updated with elements of the array x
, while SciLean.DataArrayN.reduceD
uses only the elements of x
and returns default
if x
happens to be empty:
x.fold op init = (op ... (op (op init x[0]) x[1]) ...n) x.reduceD op default = (op ... (op (op x[0] x[1]) x[2]) ...)
There are also versions x.reduce
where you do not have to provide the default element, but it is required that the element type X
of the array x : X^I
has an instance Inhabited
, which allows you to call Inhabited.default
, returning a default element of X
. For example, (default : Float)
returns 0.0
.
To sum all elements of an array:
#eval β[1.0,2.0,3.0].foldl (Β·+Β·) 0
6.000000
or to find the minimal element:
#eval β[1.0,2.0,3.0].reduce (min Β· Β·)
1.000000
notice that computing the minimal element with fold
and init:=0
would give you an incorrect answer.
#eval β[1.0,2.0,3.0].foldl (min Β· Β·) 0
0.000000
Putting it all together we can implement soft-max
open SciLean Scalar
def softMax {I : Type} [IndexType I]
(r : Float) (x : Float^[I]) : Float^[I] := Id.run do
let m := x.reduce (max Β· Β·)
let x := x.mapMono fun x => x-m
let x := x.mapMono fun x => exp r*x
let w := x.reduce (Β·+Β·)
let x := x.mapMono fun x => x/w
return x
where for numerical stablity we first find the maximal element m
and subtract it from all the element. After that we procees with the standard definition of soft max. Of course, this is not the most efficient implementation of softmax. In later chapter, we will show how to transform it to a more efficient version.
Very common reduction is to sum element or to multiply them. SciLean provides familiar notation for these
def x := β[1.0,2.0,3.0,4.0]
def B := β[1.0,2.0;3.0,4.0]
#eval β i, x[i]
10.000000
#eval β i, x[i]
24.000000
#eval β i j, B[i,j]
10.000000
#eval β i j, B[i,j]
24.000000
Note for Mathlib users: For performance reasons SciLean defines sums and products with IndexType
instead of Finset
. Therefore this notation is different from the one defined in BigOperators
namespace.
We can define commong matrix operations like matrix-vector multiplication
def matMul {n m : Nat} (A : Float^[n,m]) (x : Float^[m]) :=
β i => β j, A[i,j] * x[j]
or trace
def trace {n : Nat} (A : Float^[n,n]) :=
β i, A[i,i]
Convolution and Operations on Indices
The fundamental operation in machine learning is convolution. The first attempt at writing convolution might look like this:
def conv1d {n k : Nat} (x : Float^[n]) (w : Float^[k]) :=
β (i : Fin n) => β j : Fin k, w[j] * x[i-j]
(deterministic) timeout at `typeclass`, maximum number of heartbeats (20000) has been reached Use `set_option synthInstance.maxHeartbeats <num>` to set the limit. Additional diagnostic information may be available using the `set_option diagnostics true` command.
TODO: This is a bad error :( It should say that it failed to synthesize HSub (Fin n) (Fin k) _
This error arises because Lean can't infer the subtraction operation between the types Fin n
and Fin k
, which would produce some unknown type ?m
. This makes sense, what does it mean to subtract j : Fin k
from i : Fin n
? Because we are accessing elements of x
, we probably want the result to be Fin n
, but what do we do if i - j
is smaller than zero? We need to do something more involved when performing operations on indices that have their range specified in their type.
Let's step back a bit and look at the type of the kernel w : Float^[k]
. We index it with numbers 0,...,k-1
, but that is not how we usually think of kernels. We would rather index them by -k,...,-1,0,1,...,k
. SciLean provides useful notation for this: Float^[[-k:k]]
, which stands for DataArrayN Float (Set.Icc (-k) k)
and Set.Icc (-k) k
is a closed interval with endpoints -k
and k
. Because here we consider k
to be an integer, then Set.Icc (-k) k
is exactly the set of -k,...,-1,0,1,...,k
. Recall that i : Fin n
is a pair of the value i.1 : β
and proof i.2 : i.1 < n
. Similarly, i : Set.Icc (-k) k
is a pair i.1 : β€
and proof i.2 : -k β€ i.1 β§ i.1 β€ k
. The type for a two-dimensional kernel would be Float^[[-k:k],[-k:k]]
, which stands for DataArrayN Float (Set.Icc (-k) k Γ Set.Icc (-k) k)
.
Now, instead of writing i-j
, we want to shift the index i : Fin n
by the index j
and obtain another index of type Fin n
. Let's define a general function shift
that shifts Fin n
by an arbitrary integer:
def Fin.shift {n} (i : Fin n) (j : β€) : Fin n :=
{ val := ((Int.ofNat i.1 + j) % n ).toNat, isLt := sorry_proof }
Here, %
is positive modulo on integers, and we again omitted the proof that the result is indeed smaller than n
. It is not a hard proof, but the purpose of this text is not to teach you how to prove theorems in Lean but rather how to use Lean as a programming language, and omitting proofs is a perfectly valid approach.
Now we can write one-dimensional convolution as:
def conv1d {n k : Nat} (w : Float^[[-k:k]]) (x : Float^[n]) :=
β (i : Fin n) => β j, w[j] * x[i.shift j]
This immediately generalizes to two dimensions:
def conv2d {n m k : Nat} (w : Float^[[-k:k],[-k:k]]) (x : Float^[n,m]) :=
β (i : Fin n) (j : Fin m) => β a b, w[a,b] * x[i.shift a, j.shift b]
In practice, a convolutional layer takes as input a stack of images x
, a stack of kernels w
, and a bias b
. Let's index images by an arbitrary type I
and kernels by JΓI
:
open SciLean
def conv2d {n m : Nat} (k : Nat) (J : Type) {I : Type}
[IndexType I] [IndexType J] [DecidableEq J]
(w : Float^[J,I,[-k:k],[-k:k]]) (b : Float^[J,n,m]) (x : Float^[I,n,m]) :
Float^[J,n,m] :=
β ΞΊ (i : Fin n) (j : Fin m) =>
b[ΞΊ,i,j]
+
β ΞΉ a b, w[ΞΊ,ΞΉ,a,b] * x[ΞΉ, i.shift a, j.shift b]
Exercises
-
Generalize the function
Fin.shift
. Mathlib usues notationx+α΅₯y
for "shifting"(y : Y)
by(x : X)
. This operation is provided by the typeclassVAdd X Y
. Define instanceVAdd β€ (Fin n)
for adding an integer(j : β€)
to number(i : Fin n)
that wraps around on overflow or underlow. Feel free to usesorry
for proving that the result is indeed inFin n
. Further defineVAdd (Set.Icc a b) (Fin n)
for(a b : β€)
andVAdd J I β VAdd J' I' β VAdd (JΓJ') (IΓI')
.
open SciLean
instance (n : β) : VAdd β€ (Fin n) :=
β¨fun j i => β¨((Int.ofNat i.1 + j)%n).toNat, sorry_proofβ©β©
instance (a b : β€) (n : β) : VAdd (Set.Icc a b) (Fin n) :=
β¨fun j i => j.1 +α΅₯ iβ©
open SciLean
instance {I I' J J'} [VAdd J I] [VAdd J' I'] : VAdd (JΓJ') (IΓI') :=
β¨fun (j,j') (i,i') => (j+α΅₯i, j'+α΅₯i')β©
-
Implement General convolution for arbitrary rank tensors
open SciLean
def convNd {J : Type} [IndexType J] {I : Type} [IndexType I] [VAdd J I]
(w : Float^[J]) (x : Float^[I]) : Float^[I] :=
β (i : I) => β j, w[j] * x[j +α΅₯ i]
variable (w : Float^[[-1:1],[-1:1],[-1:1]]) (x : Float^[10,10,10])
#check convNd w x
-- #eval convNd (β (i j k : Set.Icc (-1) 1) => 1.0/(1.0 + |Float.ofInt i.1| + |Float.ofInt j.1| + |Float.ofInt k.1|))
-- (β (i j k : Fin 5) => if i.1 = j.1 β§ j.1 = k.1 then 1.0 else 0.0)
-
Generalize the above to a proper general convolution layer which accepts stack of filters, images and biases. Similarly to what we have done to
conv2d
previously.
Pooling and Difficulties with Dependent Types
The next piece of neural networks is the pooling layer, a layer that reduces image resolution. Giving a good type to the pooling layer is quite challenging, as we have to divide the image resolution by two. Doing any kinds of operations in types brings out all the complexities of dependent type theory. Yes, dependent types can be really hard, but please do not get discouraged by this section. One has to be careful and not put dependent types everywhere, but when used with care, they can provide lots of benefits without causing too many troubles.
The canonical example is if you have an index i : Fin (n + m)
and you have a function f : Fin (m + n) β Float
, you can't simply call f i
as Fin (n + m)
is not obviously equal to Fin (m + n)
because we need to invoke commutativity of addition on natural numbers. We will explain how to deal with this later; for now, let's have a look at the pooling layer.
Let's start with one dimension. A function that reduces the array size by two by taking the average of x[2*i]
and x[2*i+1]
is:
def avgPool (x : Float^[n]) : Float^[n/2] :=
β (i : Fin (n/2)) =>
let iβ : Fin n := β¨2*i.1, w : Float^[β(Set.Icc (-1) 1), β(Set.Icc (-1) 1), β(Set.Icc (-1) 1)] xβ : Float^[10, 10, 10] n : β x : Float^[n] i : Fin (n / 2)
β’ 2 * βi < n All goals completed! πβ©
let iβ : Fin n := β¨2*i.1+1, w : Float^[β(Set.Icc (-1) 1), β(Set.Icc (-1) 1), β(Set.Icc (-1) 1)] xβ : Float^[10, 10, 10] n : β x : Float^[n] i : Fin (n / 2) iβ : Fin n := β¨2 * βi, β―β©
β’ 2 * βi + 1 < n All goals completed! πβ©
0.5 * (x[iβ] + x[iβ])
Given the index of the new array i : Fin (n/2)
, we need to produce indices 2*i.1
and 2*i.1+1
of the old vector, which have type Fin n
. Recall that Fin n
is just a pair of natural numbers and a proof that it is smaller than n
. So far, we always omitted the proof with sorry
, but we do not have to. Here, the proof can be easily done by calling the tactic omega
, which is very good at proving index bounds. However, remember when you are writing a program, it is usually a good strategy to inspect all proofs, see if they are plausible, and omit them with sorry
. Only once your program is capable of running, you can go back and start filling out the proofs. You can see this as an alternative way of debugging your program.
Beware! Fin n
is endowed with modular arithmetic. Naively calling 2*i
would multiply i
by two and perform modulo by n/2
. We do not want that; we have to get the underlying natural number i.1
and multiply then by two. For example:
def i : Fin 10 := 6
#eval 2*i
2
#eval 2*i.1
12
One downside of the avgPool
as written above is that if we call it multiple times, we get an array with an ugly type. For example, avgPool (avgPool x)
has type Float^[n/2/2]
. If we have a size that we already know is divisible by four, the n/2/2
does not reduce. For x : Float^[4*n]
, the term avgPool (avgPool x)
has type Float^[4*n/2/2]
and not Float^[n]
.
You might attempt to solve this by writing the type of avgPool
as:
def avgPool (x : Float^[2*n]) : Float^[n] :=
β (i : Fin n) =>
let iβ : Fin (2*n) := β¨2*i.1, w : Float^[β(Set.Icc (-1) 1), β(Set.Icc (-1) 1), β(Set.Icc (-1) 1)] xβ : Float^[10, 10, 10] n : β x : Float^[2 * n] i : Fin n
β’ 2 * βi < 2 * n All goals completed! πβ©
let iβ : Fin (2*n) := β¨2*i.1+1, w : Float^[β(Set.Icc (-1) 1), β(Set.Icc (-1) 1), β(Set.Icc (-1) 1)] xβ : Float^[10, 10, 10] n : β x : Float^[2 * n] i : Fin n iβ : Fin (2 * n) := β¨2 * βi, β―β©
β’ 2 * βi + 1 < 2 * n All goals completed! πβ©
0.5 * (x[iβ] + x[iβ])
Unfortunately, this does not work. Lean's type checking is not smart enough to allow us to call avgPool x
for x : Float^[4*m]
. It can't figure out that 4*m = 2*(2*m)
and infer that n = 2*m
when calling avgPool
. We have to do something else.
The most flexible way of writing the avgPool
function is as follows:
def avgPool (x : Float^[n]) {m} (h : m = n/2 := by infer_var) : Float^[m] :=
β (i : Fin m) =>
let i1 : Fin n := β¨2*i.1, w : Float^[β(Set.Icc (-1) 1), β(Set.Icc (-1) 1), β(Set.Icc (-1) 1)] xβ : Float^[10, 10, 10] n : β x : Float^[n] m : β h : autoParam (m = n / 2) _autoβ i : Fin m
β’ 2 * βi < n All goals completed! πβ©
let i2 : Fin n := β¨2*i.1+1, w : Float^[β(Set.Icc (-1) 1), β(Set.Icc (-1) 1), β(Set.Icc (-1) 1)] xβ : Float^[10, 10, 10] n : β x : Float^[n] m : β h : autoParam (m = n / 2) _autoβ i : Fin m i1 : Fin n := β¨2 * βi, β―β©
β’ 2 * βi + 1 < n All goals completed! πβ©
0.5 * (x[i1] + x[i2])
Here, the output dimension m
is implicitly inferred from the proof h : m = n/2
. Let's go step by step on what is going on.
When you call avgPool x
for x : Float^[4*k]
, the first argument is expected to have type Float^[n]
. From this, Lean infers that n = 4*k
. The next argument {m}
is implicit, so Lean skips it for now as it is supposed to be inferred from the following arguments. Lastly, we have the argument h : m = n/2
, which has the default value by infer_var
. The tactic infer_var
expects an expression with an undetermined variable, in our case m
, and runs normalization on n/2
and assigns the result to m
. In this case, 4*k/2
gets simplified to 2*k
, and that is the final value of m
.
You might be wondering what happens when n
is odd. Because n/2
performs natural division, for x : Float^[2*n+1]
, calling avgPool x
produces an array of type Float^[n]
. If you want to prevent calling avgPool
on arrays of odd length, you can simply modify the proof obligation to (h : 2*m = n)
. This way, you require that n
is even, and calling avgPool x
with an odd-sized array x
will produce an error.
To build a simple neural network, we need a two-dimensional version of the pooling layer:
open SciLean
variable {nβ nβ : Nat} {I : Type} [IndexType I] [DecidableEq I]
def avgPool2d
(x : Float^[I,nβ,nβ]) {mβ mβ : Nat}
(hβ : mβ = nβ/2 := by infer_var)
(hβ : mβ = nβ/2 := by infer_var) : Float^[I,mβ,mβ] :=
β (ΞΉ : I) (i : Fin mβ) (j : Fin mβ) => 0.0
-- let iβ : Fin nβ := β¨2*i.1, by omegaβ©
-- let iβ : Fin nβ := β¨2*i.1+1, by omegaβ©
-- let jβ : Fin nβ := β¨2*j.1, by omegaβ©
-- let jβ : Fin nβ := β¨2*j.1+1, by omegaβ©
-- 0.0
Simple Neural Network
We are almost ready to write a simple neural network. The only missing piece is the dense layer, which is just matrix multiplication followed by addition. We have already shown matrix multiplication previously, but it was only for multiplying by a normal matrix with n
rows and m
columns. A general dense layer takes a tensor x
of any shape, treats it as a flat array of m
elements, and multiplies that by an nΓm
matrix. Because our arrays allow indexing by an arbitrary type I
, we do not need to perform this flattening explicitly and can just multiply by the matrix Float^[n,I]
.
open SciLean
variable {I : Type} [IndexType I]
def dense (n : Nat) (A : Float^[n,I]) (b : Float^[n]) (x : Float^[I]) : Float^[n] :=
β (i : Fin n) => b[i] + β j, A[i,j] * x[j]
Finally, we have all the necessary pieces, and we can implement a simple neural network.
def nnet := fun (wβ,bβ,wβ,bβ,wβ,bβ) (x : Float^[28,28]) =>
x |>.reshape (Fin 1 Γ Fin 28 Γ Fin 28) (w : Float^[β(Set.Icc (-1) 1), β(Set.Icc (-1) 1), β(Set.Icc (-1) 1)] xβΒΉ : Float^[10, 10, 10] nβ : β nβ : β Iβ : Type instβΒ² : IndexType Iβ instβΒΉ : DecidableEq Iβ I : Type instβ : IndexType I xβ : Float^[8, 1, β(Set.Icc (-β1) β1), β(Set.Icc (-β1) β1)] Γ
Float^[8, 28, 28] Γ Float^[30, 8, 14, 14] Γ Float^[30] Γ Float^[10, 30] Γ Float^[10] x : Float^[28, 28] wβ : Float^[8, 1, β(Set.Icc (-β1) β1), β(Set.Icc (-β1) β1)] bβ : Float^[8, 28, 28] wβ : Float^[30, 8, 14, 14] bβ : Float^[30] wβ : Float^[10, 30] bβ : Float^[10]
β’ size (Fin 1 Γ Fin 28 Γ Fin 28) = size (Fin 28 Γ Fin 28) All goals completed! π)
|> conv2d 1 (Fin 8) wβ bβ
|>.mapMono (fun x => max x 0)
|> avgPool2d (mβ := 14) (mβ := 14)
|> dense 30 wβ bβ
|>.mapMono (fun x => max x 0)
|> dense 10 wβ bβ
|> softMax 0.1
When we check the type of nnet
:
#check nnet
nnet : Float^[8, 1, β(Set.Icc (-β1) β1), β(Set.Icc (-β1) β1)] Γ Float^[8, 28, 28] Γ Float^[30, 8, 14, 14] Γ Float^[30] Γ Float^[10, 30] Γ Float^[10] β Float^[28, 28] β Float^[10]
You can see that the type of all the weights is automatically inferred.
TODO: improve unexpander for β(Set.Icc (-β1) β1)
The input image has type Float^[28,28]
, and the output is an array of ten elements Float^[10]
. As you might have guessed from the dimensions, later in the book, we will train this network to classify handwritten digits from the MNIST database.