!!!! *This is just a draft of a post that I’ve published for review* !!!

## Introduction

Neural networks can be a good strategy for solving machine learning problems. Just like I did with gradient descent and linear regression, I want to try to give you an idea of how they work and how you’d implement one from scratch^{1} in Kotlin. I’ll be using the mnist handwritten digit data set.

Same disclaimer as last time: everything I cover here is covered in Andrew Ng’s course and if you’re looking for a rock solid explanation and implementation, look there instead of here.

## Our Toy Problem

The task we’re interested in will be classifying hand-written digits. There’s a standard data-set that everyone likes to use for this sort of thing. In fact, tensorflow’s basic tutorial uses this same problem, so if you’ve ever worked through that tutorial, you’re about to see the magic behind that. The data-set is just a bunch of 28 by 28 pixel images that look like this:

## Our neural network

We’re going to work with a neural network with 3 layers: an input layer, a hidden layer, and an output layer.^{2} The input layer just consists of nodes for every pixel in an image. The hidden layer will have 25 nodes, and the final layer will spit out a classification of the image:

All these nodes or “neurons” (represented as circles) are doing is taking in an input (represented as lines pointing to the circles) and doing a computation whose result is passed to the next layer.

How those values are passed to the next layer involves some basic linear algebra. There’s a matrix of weights for each non-input layer, so in our case, we have 2 theta matrices:

We can represent the nodes in the second layer of the network as a vector we’ll call `a2`

. The values for each node in the second layer are equal to this:

where `sigmoid()`

just applies the following function to each element in the matrix:

Calculating `a3`

involves a similar formula:

`a3`

is a vector that represents the output layer and it gives us the predictions we’re looking for: each element in the layer is the probability that the digit fed into the neural network is the number at that index in the vector. For example, the probability at the 2nd index in this vector, is the probability that the digit passed into the neural network is a `1`

.

Implementing matrices and matrix operations in Kotlin is actually not too bad with features like operator overloads. Here’s a nice snippet that shows how easy it is to implement matrix multiplication with a class that just wraps an array of arrays:

```
operator fun times(otherMatrix: Matrix): Matrix {
if (numColumns() != otherMatrix.numRows())
throw IllegalArgumentException(
"Cannot multiply ${raw.size} x ${raw[0].size} matrix" +
" ${otherMatrix.numRows()} x ${otherMatrix.numColumns()} matrix"
)
fun compute(rowIndex: Int, columnIndex: Int): Float {
val row = raw[rowIndex]
val column = otherMatrix.column(columnIndex)
return row.zip(column) { a, b -> a * b }.sum()
}
return Matrix(Array(raw.size) { rowIndex ->
Array(otherMatrix.numColumns()) { columnIndex ->
compute(rowIndex, columnIndex)
}
})
}
```

With this code, we can multiply matrices just like we multiply numbers:

`theta1 * a1`

## Notes

- In practice, its likely better to use a machine learning library like tensorflow to build a neural network, but I’m trying to solidify my understand of how neural networks work, so I’m building it myself.
^{[return]} - This is the same neural network architecture thats used to solve this problem in the Coursera course.
^{[return]}