This article will explain, at a very basic level, the structure of an MNIST classifier written in
Python using Pytorch. I will explain some lower level details, but I recommend reading up on the
math that goes on in the background of a basic neural network and how it works.
The “Classifier Class”
At the basic program level, you are creating a class to predict a number given an image (from the
MNIST dataset). Using Pytorch we will begin setting up the program and importing all of the modules
that we need.
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import trange
You then want to setup the constructor for the neural network.
h1 and h2 are both hidden layers with h1 really being the input layer. On each of the hidden
layers we apply an activation function. For the first 2 hidden layers we apply a simple relu activation
function as we only want to make sure we don’t get any extreme negative values and then for the
final output layer we apply a sigmoid activation function as we want an estimation percentage for
each class. (Each class being the number 0-9 to be predicted from the image).
After this we want to add a forward function to our NN class so that we can feed an image into
the model and get a prediction out.
defforward(self, X):
X = self.act1(self.h1(X))
X = self.act2(self.h2(X))
X = self.act_output(self.output(X))
return X
Importing The Dataset
For this example I’m just using keras as it was easiest for me to get quickly setup. To import the
MNIST dataset using the keras library, we are going to continue to the main function of the
program.
First we load the dataset into X and Y training sets and reshape them from 3d vectors to 2d vectors.
The X_train and X_test sets are reshaped from (60000, 28, 28) to (60000, 784). The first shape is
equivalent to (n_samples, n_pixels_x_dir, n_pixels_y_dir). After that, we simply have to change the
sets to Pytorch tensors so that Pytorch doesn’t throw any errors.
Building and Training The Model
After importing the dataset and prepping the data we can now setup and train the model. First we
have to define our model, loss function, and optimizer.
model = MNIST()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
We will store our MNIST classifier model as the model variable which will hold the weights and
biases of the network. We will use the cross entropy loss function provided by Pytorch as our
loss function and the Adam optimizer also provided by Pytorch as our optimizer. I will later explain
what these components do.
Next we will call our training loop.
epochs =100 batch_size =32for epoch in (t := trange(epochs)):
for i in range(0, len(X_train), batch_size):
X_batch = X_train[i:i+batch_size]
Y_batch = Y_train[i:i+batch_size]
out = model(X_batch)
optimizer.zero_grad()
loss = loss_fn(out, Y_batch)
loss.backward()
optimizer.step()
t.set_description("loss %.2f"% (loss))
When training our model we want to only train on a small amount of data from the training set for
each round of optimization to increase the speed and efficiency of the training phase. This is what
our batch_size is for. In each round of training, the loop goes through the batched training set,
runs the model, gets the loss of the model, and then adjusts the weights and biases slightly in the
more accurate direction.
Going through the loop, we first declare our batch training sets as X_batch and Y_batch. After
this we run the X_batch through our model with it’s current weights and biases. After this, we
calculate the loss of the model which we are trying to minimize with each loop through the training
phase. After we get the loss we calculate the gradient of each neuron in the neural network using
back propagation with loss.backward(). The optimizer then uses those calculated gradients to
adjust the model slightly in the right direction with optimizer.step(). We can’t forget to also
set all the gradients to zero again with each loop through the training phase with
optimizer.zero_grad().
In this example we’re also using tqdm to add a nicer looking progress bar and the loss in it’s
description.
Predictions and Accuracy
After successfully completing the training phase, we should test the accuracy of our model. We can
simply do this using our testing datasets and comparing them. First we need a simple accuracy
function.