Overfitting, Losses, and Accuracies of a Neural Network Model: A General Visualisation
While training a model, it is a good practice to split the dataset into two parts: one for training purposes and another for validation purposes. During training, the average loss should be printed for both, the training dataset and validation dataset for each epoch, and the accuracy of the model (if possible) should be obtained for the validation dataset.
The graph above represents the training and validation loss of a model versus the number of epochs. This graph is obtained while training the cifar10 dataset using convolutional and linear layer. These layers and their sequences are shown below:
The graph has been plotted using the matplotlib library of python and the code for the graph is shown below:
The raw data which is obtained while training the model and used to plot the graph is given below:
The arguments: number of epochs (num_epochs), learning rate (lr), and the gradient optimizer (opt_func) provided are given below:
While training a neural network model, the loss obtained for the training dataset is used to update the model parameters (weights and biases), to decrease the training loss which further increases the accuracy of the model. And, this process is repeated for several epochs to train the model for its best performance.
But, while simultaneously testing the model for validation or test dataset, the performance of the model can vary differently at different epochs. The general trend that can be seen is given below:
The above graph shows that the loss for validation and training dataset decreases for some epoch and then, validation/test loss starts increasing while training loss keeps on decreasing. This is known as overfitting of the model.
Overfitting basically means that the model starts to memorize the patterns of the training dataset instead of learning the characters and patterns in general. If we see the graphs of training and validation accuracy below, we will find that the model performs well on the training dataset itself for which it is trained in comparison with the validation/test dataset or real-world dataset.
We can also see the extent of overfitting from the graph. The model had reached the accuracy of over 95% for the training dataset which was obvious but for the validation dataset, it did not cross 70% and gives the limit at it. It means that for whatever number of epoch we train the model, it cannot cross the 70% accuracy limit for the given learning rate and optimizer function.
If we see the graph b/w validation losses and validation accuracies vs epoch, we will find that for the slight range of epoch at minimum validation loss, the model performs best for the validation/test dataset.
After this range, validation accuracy starts to decrease slightly and then becomes nearly constant for further epochs. Whereas, validation loss keeps on increasing to the last epoch for which the model is trained.
If we see the graph b/w training losses and training accuracies vs epoch, we will see that the graph seems symmetric and smooth in comparison to the graph above b/w validation losses and validation accuracies vs epoch.
The smoothness of the curve depends on the learning rate and the number of epochs. If the learning rate is too high then each updated gradients will fluctuate over the minima of the loss function and if the learning rate is too low, there will be no significant changes in parameters of the model which will show no significant improvement in validation loss and accuracy of the model.
One solution of overfitting from many is ‘Early Stopping’. Early stopping means stopping the training of the model when the validation loss starts increasing continuously. One way to do this is to save the model parameters for each epoch and chose the one with minimum validation loss for the final model.
Note: The accuracy of the model can further be increased by adding more convolutional layers, ReLU, and MaxPool2d functions.
The final combined graph is shown below just for fun:
References: