Smile Detection – Image Recognition

Using OpenCV, an industrial-grade library useful for building machine learning and deep learning classifiers, it is possible to construct software to detect a smiling face from a video stream. Going a step further, it can even be used to detect a variety of emotions including joy, sadness, anger, disgust, and confusion. Hopefully.

Engineer

Aaron S.

Area of Interest

Software Engineering – Computer Vision with Machine Learning and Deep Learning

School

Lynbrook High School

Grade

Rising Junior

Final Milestone

Final Milestone Video

       For my main project, I set out to create, train, and test a classifier that works in realtime to recognize a smiling face from one that is not. 

       Compared to the first milestone, the only difference is the addition of a mere one or two words, yet the process of arriving to this point took far more effort than on the surface. Since the first milestone, I have taken the existing Olivetti face dataset, used jupyter notebook and the ipywidgets library to create a simple user interface to classify and store the data in a .xml file using the json library. 

       Then using this training data and the sklearn library, I created a classifier and estimated the model’s accuracy on general data, using a method called k-fold cross evaluation. In this process, the training data is split into k-number of groups and basically, several classifiers are made by using a different portion of the original training data as the training set. For each group, it is used once as testing data and k-1 times as training data. Then looking at the mean accuracy of these k-number of classifiers reveals how a classifier trained on the entire set will probably perform. 

       I also split the data using a command in the model_selection subset of the sklearn library known as train_test_split, which splits the training data into just two groups this time: training and testing data. Then using this split, I created a confusion matrix along with a classification report.

       Though it seems like a jumble of decimals and numbers, it reveals very important information about how the classifier will perform on general data. There are two values that are particularly important: the recall and the precision.

If there is a high recall, but low precision, most of the positive examples are correctly recognized (low FN) but there are a lot of false positives. In layman’s terms, this means the classifier uses the label overly liberally, labelling not only the correct cases but also many incorrect cases with the same label.

On the other extreme, if there is Low recall, and high precision, we will miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP). In other words, the classifier uses the label sparingly but when it does it is almost always correct.

       In a smile classifier, it is best to have a balance and not lean heavily towards either extreme. Next I combined this smile classifier with the Haar Cascade face detection classifier from the first milestone to isolate a face and classify it. Though this sounds simple, because the extraction of a face can stretch the image, I ran a mapping of the vertical stretch coefficient with respect to the horizontal stretch coefficient on two testing images. By doing so, I would be able to visually identify what coefficients would distort the image in a negative way (and the maps themselves look kinda cool). At this point, I was finally ready to combine it all and add the video from the webcam, with this being the end result. 

       I learned so much from this project, mostly regarding the various machine learning methods and libraries out there. I also learned how to read a confusion matrix, what recall and precision meant, what a k-fold cross evaluation was, how to create a .xml file, how to use the pickle library to store and load classifiers, how to send and load data using json information packets, how to use the json library to send data packets from jupyter notebook to Spyder python ide, among many others small gains in knowledge. 

       The greatest struggle I had was my lack of knowledge and experience with the plethora of commands and functions from all these different libraries. Online documentation was often out-of-date and led to many error messages. Another major setback was that after building a completed project, I found the classifier to be too inaccurate, and ended up attributing this to the initial classification of the Olivetti dataset. I ended up reclassifying the dataset twice more, creating a total of 3 completed, trained classifiers, with this being the last and best-performing one. 

This was a rewarding experience that gave me my first concrete step into the door of machine learning. There is still so much out there to learn about, the various methods of detection and single-class classification and multi-class classification and the various theory behind them. My next goal will be to investigate another more powerful method of classification and apply it to a similar but more advanced multi-class problem.

       Thank you for your time.

User Interface

Confusion Matrix + Classification Report

Confusion Matrix and Classification Report

Face Extraction Stretch Coefficient Mapping

First Milestone

First Mileston - Face Detection

       For my main project, I set out to create, train, and test a classifier that works in realtime to recognize a smiling face from one that is not. The first milestone towards the completion of this project is to isolate the face from each frame in a video. 

       This was implemented using the OpenCV Machine Learning Library, where CV stands for computer vision. This industry-level library makes this task fairly simple, with many pre-trained classifiers for face detection easily accessible. One thing to point out is the distinction between face detection and recognition. What I set out to do is expression recognition, which focuses on the recognition, rather than on detecting the face itself, which would have been an equally challenging task.

       The method that the OpenCV face detection software uses is something called a Haar Cascade classifier, which uses an algorithm based on the proposed concept that some very simple features can be used to define and identify a particular object. For the case of a face, for instance, the area above the eyebrows are always distinctly brighter and lighter in color. Other examples are that the area under the eyes are also brighter than the eyesocket above or that the bridge of the nose is a vertical bright line like feature with the eyebrows and eyes on either side being a darker shade. The OpenCV Classifier is a highly optimized version of this classifier. It takes an image, decreases the resolution, and looks for these simple Haar features in order to detect objects, a method first proposed by Paul Viola and Michael Jones.

       During this task, I struggled at length with the downloading of the OpenCV Library. The library through its updates, became incompatible with Python 3.7, with a missing dependency. After opening more than 50 tabs, investigating over 18 different methods, including but not limited to using PIP to download it, using Anaconda to download it, investigating downgrading python on my computer to python 2.7, investigating what dependencies were missing, investigating Christoph Gohlke’s unofficial site where these dependencies can be downloaded as binary wheels, investigating creating a virtual environment that would run a different version of python from the rest of my computer,  and downgrading Python on just Anaconda to 2.7, until finally, I found a solution that worked for me, which was to create a new environment on Anaconda separate from the original. Through the process of filtering through mess of conflicting solutions, my most valuable lesson in the completion of this milestone was the ability to effectively troubleshoot and patience.

       My next step would be to take an existing dataset of faces, such as the Olivetti dataset or the IMDB dataset, and manually classify them as smiling or not using a simple widget based user interface. Then using this training data, I would train a classifier and test it on a completely new dataset or a live webcam video feed. 

 

Thank you for your time

Setting Everything Up

Before even beginning the project, the process of setting up OpenCV in and of itself is a huge hurdle for most people. From personal experience scavenging through online forums (stackoverflow, github, quora) discussing the downloading process of OpenCV, I have found that the solution varies depending on the system the user has.

The very first step is to consider what version of Python the project will be using. The main options up for consideration are Python 2.7, 3.6, and what ever is the newest version available at the time of reading this. Python 2.7 is the platform for many machine learning libraries, and many tutorials are also on this platform. However, over time, many of these libraries have begun to update and move towards newer versions of Python, including Tensorflow, Keras, OpenCV, and sci-kit, which are the libraries I used. Unfortunately, some of these libraries have not yet caught up to the newest version. At the time of writing this in July 2019, the newest version of Python available is Python 3.7. Unfortunately, Keras is incompatible with this version. As a result, 3.6 is a recent version of Python that seems to be the best for this project. Because I learned mostly through trial and error, I ended up creating the main intensive project in 3.7, and worked on modifications or improvements in Python 3.6. Fortunately, I only used the keras library in the modifications so this worked out until the completion of the project.

Because most users usually wish to use the most up-to-date Python version available for most of their coding projects, it is recommended to take the path of virtual environments for this project. Two options are most prevalent: the first, which I recommend, is to use Anaconda, while the second is to use virutal environments with Linux. Below is a description of the process I would recommend.

Before all else however, the most important piece of wisdom I can give to new machine learning users who wish to follow in my footsteps is this:

Before committing to the dowload, check the steps several times and look through forums for recommendations on the method of download.

Many of these softwares have several methods to be downloaded, including Pip, Pip3, Anaconda Prompt, and Miniconda. I am unable to provide much information regarding the differences in each method. I strongly recommend researching for yourself. This site is not meant to be a hand-held tutorial. Much will have to be researched for yourself.


Setting Up Libraries

  1. Download Anaconda
  2. Open a new environment in Python 3.6
  3. Download the various libraries necessary, sklearn (sci-kit), numpy, matplotlib, tensorflow, keras, opencv
  4. Open Spyder IDE or Jupyter Notebook or Jupyter Lab and start coding!

Many videos describing this downloading process exist on youtube, and here are a few articles and videos that helped me.

  • https://inmachineswetrust.com/posts/deep-learning-setup/
  • https://www.pugetsystems.com/labs/hpc/How-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-UPDATED-1419/
  • https://anaconda.org/conda-forge/opencv
  • https://www.youtube.com/watch?v=5mDYijMfSzs
  • https://medium.com/@margaretmz/anaconda-jupyter-notebook-tensorflow-and-keras-b91f381405f8
  • And always check the official websites for Anaconda and all these libraries for help on installation

Starter Project – Simon Says

Starter Project Video- Simon Says

       For my starter project, I chose Simon Says from the Sparkfun kit. I picked this project because it incorporated both electronic components and software components. I wanted to refresh myself with some light software coding, and I also thought the electronic side was interesting as it used a wide variety of electronic elements, with flashy lights and sounds.

       This small device has 4 different colored LED lights, with 4 corresponding color-coded buttons. When each button is pressed there is a flash of light and a buzzer plays an accompanying note of different frequency for each button. When the game starts, the software manipulates the lights and the buzzer to play a sequence of notes accompanied by their respective colored leds. The user then replays the sequence using the buttons and for Each round of this cycle, the length of the sequence increases by one note. When the sequence reaches 10, the game ends and plays a jingle signifying a victory. For demonstration purposes I lowered it from 10 to 5 in the video. When a wrong note is played or there is too long of a delay, the game instead plays a losing tune, and ends the game.

       I used pre-written code for the musical sequences and setting up the settings for the system with input and output channels. The core logic of the game, however, I re-programmed. From this starter project, I learned how to use a breadboard, how to strip wires, and refreshed my knowledge of programming in C. The greatest struggle of all was the functioning of the buttons. More often than not, the small metal strips on the button were too shallow to reach the metal strips of the breadboard, resulting in unresponsive activity. Attempting to debug code while simultaneously dealing with buttons popping out of the electronic was cumbersome. 

Leave a Comment

Start typing and press Enter to search