Package Detection

The Package Detection Project uses the Raspberry Pi, a camera module, and the NanoNets API to detect if an package is present. The camera first captures the image, sends it to the API which has been trained on a dataset of images, and finally sends back the image with the annotations as to where the object was located. In addition, it can send and receive text messages for when a package is received.


Dhruv A

Area of Interest

Software Engineering


Evergreen Valley High School


Rising Junior


For my final modifications before the demo, I improved the accuracy of my machine learning model and sent a text message when a package was detected to the user. To improve the accuracy of the predictions, I had to add more images to my dataset. In order to sent the text message, I used the Twilio API. In addition, I allowed the user to send text messages, and provided a menu where the user could ask for specific information. Finally, I made an csv local database which could store the old packages that had arrived and displayed them in an html table.

How To Improve the Accuracy of A Model

The accuracy of a model is dependent on the number of images that are in the dataset and the quality of the images. Typically, the more images that are in the dataset, the more accurate the model becomes. However, what one needs to be wary of is overtraining, which was a problem that I faced. After a few attempts where my accuracy was around 50%, I knew that I would have to increase the amount of images I had, from around 200 to 600. However, what I didn’t realize is that the machine learning model can become too dependent on certain features of the image, and that it wouldn’t be able to make even the smallest assumptions if it had too many images that were similar to each other. As a result, I decided to train on around 500 images, and I had to ensure that they were different types of packages and in different scenarios, such as in someone’s hand, on someone’s porch, or with a white background.

Twilio API

Twilio is an API (application program interface) that easily allows users to send and receive text messages and phone calls. The first step in using this API is to create your own phone number, for which a trial account can be used. This is the phone number that Twilio will use to send text messages and receive them. Then, you need to register, which provides you with a account sid and auth token which can be used in authenticating the user, such that there are no leaks in privacy.

After the authentication and registration are complete, the actual message can be sent to the user, which in my case included a text with words and an image of the unannotated image. One problem that I faced while doing this was figuring out how to upload an image to the internet, as Twilio is only able to take in a link as a parameter, not an image from a computer. This was fixed using the API explained below.

In addition to sending a text message, I also set up the receiving end, which accounts for all user responses, whether they ask for the menu options, the unannotated image, the annotated image, the date the package arrived, or the likelihood that the package is actually a package. It also accounts for any invalid responses, in which case it tells the user that the option is invalid and gives them the list of options.

One problem that I faced while receiving text messages was that I had to find a place where I could store the received message, as Twilio requires the use of one’s own server. As a result, I had to set up a ngrok server, which allows for you to set up a local server while having it publicly available on a custom url, which allowed for the Twilio API to find it from anywhere on the Internet.

The Imgur API

I used the Imgur API to upload an image to the internet. The first step in this process is to download dependencies related to Imgur, and then to register a client on Imgur. Then, I needed to write the code that would allow me to upload the image, and then retrieve the image for use by the Twilio API.

The CSV File & The HTML Table

A CSV file is a text file that stores data separated by a delimiter, which is often a comma. This is where I stored my data for the packages, as when the machine learning algorithm found that there was a package, it wrote to the second line of the CSV file. Then, I read back the whole CSV file with the data from all of the packages, and then displayed them in a html table, which the user could then view.


All in all, this project really helped me achieve my goals of learning machine learning and finding out how computers are able to do things that humans can do. In the future, I believe that computers will learn so much that they become smarter than any human, and I want to be part of that movement. This project has really prepared me for future, which I hope to implement in an internship!

Second Milestone

For my Second Milestone, I used my own dataset of images rather than a pretrained model, as I chose a dataset of faces which were annotated. The model was then trained on these images. In addition, I captured an image on the picamera using some Python code, which was then sent to the API where it made a prediction and was then sent back to the Raspberry Pi and was displayed on the monitor.

How Machine Learning Works

Behind the scenes, the NanoNets API is completing many of the steps required in object detection, including annotating the image that is given to it after it is captured by the camera, doing the pre-processing, and finally the training itself.

When training the data, the machine is trying to find patterns and similarities between the data. Then, after it learns from this data, it can then recognize the patterns in the new data that it is presented with, which in this case is an image. This is how it is able to make a prediction.

How the API Request Works

An API processes requests using methods such as POST and GET, which are the 2 most popular. POST requests send data to a server, which in this case was the NanoNets API, while GET is used to retrieve the data from a server.

First Milestone

My First Milestone for the Object Detection is to have the Raspberry Pi setup and to get the pretrained pre-built model working. The software focused on downloading the Raspberry Pi OS onto the raspberry pi, and using terminal to run the pretrained model. I also connected the camera module to the raspberry pi and set up the pretrained model using NanoNets.

Raspberry Pi & Camera Code

The Raspberry Pi OS is a software that must be downloaded in order for the Raspberry Pi to have many of the necessary packages. This was downloaded onto the SD card first using the Raspberry Pi imager, and then onto the Raspberry Pi itself.

The NanoNets API allowed for the use of a pretrained model, as it already had built in models with annotations. The NanoNets API took in the model, and made a prediction using a set of images that were provided by me, the user.

An API key was necessary, as it allows NanoNets to know how many users are using their API and allows for unique identification of the user, so that its a secret token. This allows for me alone to use the specific model.

Start typing and press Enter to search

Bluestamp Engineering