Neural Network Model Basics

This article will explain, at a very basic level, the structure of an MNIST classifier written in Python using Pytorch. I will explain some lower level details, but I recommend reading up on the math that goes on in the background of a basic neural network and how it works.

The “Classifier Class”

At the basic program level, you are creating a class to predict a number given an image (from the MNIST dataset). Using Pytorch we will begin setting up the program and importing all of the modules that we need.

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import trange

You then want to setup the constructor for the neural network.

class MNIST(nn.Module):
    def __init__(self):
        super().__init__()
        self.h1 = nn.Linear(784, 128) # 784 features = 28 * 28 image
        self.act1 = nn.ReLU()
        self.h2 = nn.Linear(128, 128)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(128, 10)
        self.act_output = nn.Sigmoid()

h1 and h2 are both hidden layers with h1 really being the input layer. On each of the hidden layers we apply an activation function. For the first 2 hidden layers we apply a simple relu activation function as we only want to make sure we don’t get any extreme negative values and then for the final output layer we apply a sigmoid activation function as we want an estimation percentage for each class. (Each class being the number 0-9 to be predicted from the image).

After this we want to add a forward function to our NN class so that we can feed an image into the model and get a prediction out.

    def forward(self, X):
        X = self.act1(self.h1(X))
        X = self.act2(self.h2(X))
        X = self.act_output(self.output(X))

        return X

Importing The Dataset

For this example I’m just using keras as it was easiest for me to get quickly setup. To import the MNIST dataset using the keras library, we are going to continue to the main function of the program.

if __name__ == "__main__":
    from keras.datasets import mnist
    (X_train, Y_train), (X_test, Y_test) = mnist.load_data()

    X_train = X_train.reshape(X_train.shape[0], -1)
    X_test = X_test.reshape(X_test.shape[0], -1)

    X_train = torch.tensor(X_train, dtype=torch.float32)
    X_test = torch.tensor(X_test, dtype=torch.float32)
    Y_train = torch.tensor(Y_train, dtype=torch.long)

First we load the dataset into X and Y training sets and reshape them from 3d vectors to 2d vectors. The X_train and X_test sets are reshaped from (60000, 28, 28) to (60000, 784). The first shape is equivalent to (n_samples, n_pixels_x_dir, n_pixels_y_dir). After that, we simply have to change the sets to Pytorch tensors so that Pytorch doesn’t throw any errors.

Building and Training The Model

After importing the dataset and prepping the data we can now setup and train the model. First we have to define our model, loss function, and optimizer.

    model = MNIST()
    loss_fn = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0001)

We will store our MNIST classifier model as the model variable which will hold the weights and biases of the network. We will use the cross entropy loss function provided by Pytorch as our loss function and the Adam optimizer also provided by Pytorch as our optimizer. I will later explain what these components do.

Next we will call our training loop.

    epochs = 100
    batch_size = 32

    for epoch in (t := trange(epochs)):
        for i in range(0, len(X_train), batch_size):
            X_batch = X_train[i:i+batch_size]
            Y_batch = Y_train[i:i+batch_size]

            out = model(X_batch)
            optimizer.zero_grad()
            loss = loss_fn(out, Y_batch)
            loss.backward()
            optimizer.step()

        t.set_description("loss %.2f" % (loss))

When training our model we want to only train on a small amount of data from the training set for each round of optimization to increase the speed and efficiency of the training phase. This is what our batch_size is for. In each round of training, the loop goes through the batched training set, runs the model, gets the loss of the model, and then adjusts the weights and biases slightly in the more accurate direction.

Going through the loop, we first declare our batch training sets as X_batch and Y_batch. After this we run the X_batch through our model with it’s current weights and biases. After this, we calculate the loss of the model which we are trying to minimize with each loop through the training phase. After we get the loss we calculate the gradient of each neuron in the neural network using back propagation with loss.backward(). The optimizer then uses those calculated gradients to adjust the model slightly in the right direction with optimizer.step(). We can’t forget to also set all the gradients to zero again with each loop through the training phase with optimizer.zero_grad().

In this example we’re also using tqdm to add a nicer looking progress bar and the loss in it’s description.

Predictions and Accuracy

After successfully completing the training phase, we should test the accuracy of our model. We can simply do this using our testing datasets and comparing them. First we need a simple accuracy function.

def accuracy(pred, true):
    return np.sum(pred == true) / len(true)

Then back in the main function we can run our testing X set through our trained model and then get a prediction from each X_test.

    predictions = model(X_test)
    preds = []
    for pred in predictions:
        pred = pred.clone().detach().numpy()
        val = pred.argmax()
        preds.append(val)

And to calculate we just compare the predictions with the actual classes using the accuracy function from before.

    a = accuracy(preds, Y_test) * 100
    print(f"Accuracy: %.2f%%" % (a))

Now we’ve successfully built an MNIST classifier using Pytorch! If everything went well, you should get an accuracy of 96-99%

Full Program

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import trange

def accuracy(pred, true):
    return np.sum(pred == true) / len(true)

class MNIST(nn.Module):
    def __init__(self):
        super().__init__()
        self.h1 = nn.Linear(784, 128) # 784 features = 28 * 28 image
        self.act1 = nn.ReLU()
        self.h2 = nn.Linear(128, 128)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(128, 10)
        self.act_output = nn.Sigmoid()

    def forward(self, X):
        X = self.act1(self.h1(X))
        X = self.act2(self.h2(X))
        X = self.act_output(self.output(X))

        return X

if __name__ == "__main__":
    from keras.datasets import mnist
    (X_train, Y_train), (X_test, Y_test) = mnist.load_data()

    X_train = X_train.reshape(X_train.shape[0], -1)
    X_test = X_test.reshape(X_test.shape[0], -1)

    X_train = torch.tensor(X_train, dtype=torch.float32)
    X_test = torch.tensor(X_test, dtype=torch.float32)
    Y_train = torch.tensor(Y_train, dtype=torch.long)

    model = MNIST()
    loss_fn = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0001)

    epochs = 100
    batch_size = 32

    for epoch in (t := trange(epochs)):
        for i in range(0, len(X_train), batch_size):
            X_batch = X_train[i:i+batch_size]
            Y_batch = Y_train[i:i+batch_size]

            out = model(X_batch)
            optimizer.zero_grad()
            loss = loss_fn(out, Y_batch)
            loss.backward()
            optimizer.step()

        t.set_description("loss %.2f" % (loss))

    # make predictions
    predictions = model(X_test)
    preds = []
    for pred in predictions:
        pred = pred.clone().detach().numpy()
        val = pred.argmax()
        preds.append(val)

    a = accuracy(preds, Y_test) * 100
    print(f"Accuracy: %.2f%%" % (a))