Emotion Recognition with Raspberry Pi
As I began to pre-process my data, I noticed that there was an imbalance in the dataset. Using MatPlotLib and Pandas, I quickly realized that the three most common emotions within the dataset were ‘neutral,’ ‘happiness,’ and ‘sadness.’ In the image below, a graph of the total number of emotions per image is shown. 0 is classified as ‘anger.’ 1 is classified as ‘disgust.’ 2 is classified as ‘fear.’ 3 is classified as ‘happiness.’ 4 is classified as ‘sadness.’ 5 is classified as ‘surprise.’ 6 is classified as ‘neutral.’
The other four emotions in the dataset were very hard to find. At this time, I had also realized that to accurately predict multiple emotions would be quite difficult because everyone displays their emotions very differently. With the lack of images that the FER-2013 dataset had in terms of emotions other than ‘neutral,’ ‘happiness,’ or ‘sadness,’ I decided to shorten the dataset. The new dataset that I had created only contained the following three emotions: ‘neutral,’ ‘happiness,’ ‘sadness.’
According to the image below, my model performed very well. The accuracy value over each epoch increased with the loss value decreasing over each epoch.
Although the performance of the model turned out to be numerically amazing, I still wanted to analyze the model with the testing set. Using Tensorflow/Keras’ built-in predict_classes function, I inputted images from the testing set into the model. Along with using Tensorflow/Keras, I also used MatPlotLib to plot the images, the true emotion of the image, and the predicted image of the model. According to the image below, the model predicted each emotion accurately.
For this project, I identified three main components: data pre-processing, machine learning, and model analysis. To complete these three main components, I needed the following libraries: NumPy (for linear algebra), Pandas (data pre-processing), MatPlotLib (data plotting), SciKit-Learn (for data pre-processing and machine learning), and Tensorflow/Keras (for machine learning). I also needed Anaconda, a distribution of Python. Anaconda allows users to utilize a virtual environment. A virtual environment would allow me to utilize a GPU (Graphical Processing Unit) and to install and only maintain the libraries that I need for my specific project. A virtual environment would make sure that the libraries that I need for this project would be able to work without any fear of it breaking. Along with Anaconda, I also used Jupyter Notebook. Jupyter Notebook is an open-source web application that allows users to perform live data-visualization and run certain blocks of code at a time.
Now that I had my new dataset (refer to the left), I needed to split the data into a train and test set. The training set would be the data that the model trained on. The testing set would be the data the model would be tested on. Using SciKit-Learn’s train-test-split function, I split the dataset into an X_train, y_train, X_test, and y_test.
After splitting the dataset into a train and test split, I began to construct my machine learning. I decided to build a convolutional neural network because of its high accuracy when working with image classification problems. To construct my model’s architecture, I decided to use Tensorflow/Keras. To build a convolutional neural network based on the FER-2013 dataset, I would need to build a sequential model and use the following layers: two-dimensional convolutional layers, two-dimensional max-pooling layers, dense layers, flattening layers, and dropout layers. A sequential model takes in one input tensor and outputs one tensor. This type of model allows for one image to be inputted to the layer and for the model to predict and output the class that it thinks it belongs to. A two-dimensional convolutional layer passes filters through an image and preserves the spatial relationship between image pixels by learning image features. A two-dimensional max-pooling layer calculates the maximum value in each image filter that the convolutional layer places to highlight the most present feature in the image. In terms of a convolutional neural network, a dense layer will often appear as the last layer in the network. The layer will take the input tensor for the image, perform a mathematical operation, and return a discrete vector or distribution that matches your class. In terms of a convolutional neural network, a flatten layer creates the output of a convolutional layer into a single-long feature vector, which can then be inputted into the dense layer. A dropout layer turns off certain neurons in a layer to prevent overfitting within a network. Overfitting occurs when the error on the training set is driven to a very small value but the error on new data is very large.
After training the network for 100 epochs (iterations) on my NVIDIA CUDA-Enabled GPU, I would be able to move onto analyzing my model (refer to the left).
Now that the model has been built, I would need to perform quantization on the model. Quantization is a conversion technique that reduces the size of the model, which allows for better overall performance on lower-performance devices, like the Raspberry Pi. Using Tensorflow’s built-in TFLiteConverter function, I was able to quantize the model. I performed dynamic range quantization, which quantized the weights of the model from floating-point values to integers, which has 8-bits of precision.
Now that I have built a model that will work for classifying emotions, I will now move on to building a GUI (graphical user interface) and a recommendation system to recommend certain things based on one’s emotion.