The problem
My own response to the covid-19 lockdown has led me to running laps of my garden instead of running my regular public routes.
Several laps later and it was obvious that the GPS on my smart watch was struggling to track my runs in the restricted space of my garden. A quick discussion with the guys at work who have worked on GPS confirmed that GPS’s accuracy is only about 4m, so GPS was not going to work in the confines of my garden.
A possible solution?
Having experience of Machine Learning (ML), we decided to try using it together with time-lapse video capture to automatically count me completing laps of my garden. All I needed to do was measure a lap, and then use ML to do the counting to give me the length of my run – an ideal solution!
For machine learning we used ML.NET. This offers several different machine earning modes including image classification and object detection and is written in Microsoft’s .Net technology which is compatible with Windows, Mac and Linux.
All that was left was to record myself running laps of my garden using time-lapse software on my phone and pick a ‘start/finish’ region within the scene to count laps.
Decisions, decisions
To detect me going through the ‘start/finish’ region, we decided to use Object Detection instead of Image classification, these are both Machine Learning features of ML.Net.
Although Image Classification seemed like a good approach (it can distinguish between valid and non-valid images), it does require a lot of work to ‘train’ the model beforehand. This would make it difficult to change the start/finish region or for use in someone else’s garden.
Object detection
Object Detection, on the other hand, uses a pre-trained model to both identify objects within an image and give a probability of identification accuracy. Handily, it can also provide the coordinates and size of the object in the image.
For image analysis we used the small and fast YOLOv2 pre-trained ONMX model that can image detect people and 19 other object types and can be imported into ML.
First run using a desktop application
Thats not a person! |
Instead of creating a console application, we opted to create a desktop application. This enabled us to provide a simple UI to browse to a folder of images and provide some on screen details (number of images etc). Later, we expanded the UI to add some additional features.
In the first version we chose to load all the images in the specified folder and pass them to the ML.NET analysers in one go. The video that I had shot had 2403 images, so each image had to be loaded into memory and processed, and then these results were looped through one by one to detect any objects.
This was taking approximately 30 seconds per frame! On top of that, the resolution of the images (1280 x 720 pixels) made the accuracy of detection low. The detail in the image was making it difficult to distinguish a person … from a tree, as you can see in the photo.
Take Two! Let’s get specific
The first version worked, but we needed to make some changes to the processing pipeline to speed it up and narrow down the search area. We identified that we needed to do the following:
- Select a smaller, targeted region of the image and use that for object detection
- Check the detected objects and see if a person has been identified
- If a person is identified, skip the next few frames (which is really seconds) to avoid double counting
- Process each image individually
To do this, we modified the UI to allow the user to draw a search region within the image and then roll the video sequence several frames back and forth to check that the person was correctly passing through this region.
We then changed the processing pipeline so that it first cropped each image, using the UI selected area, prior to processing.
To avoid double counting the laps we decided to skip five frames each time a person was detected, assuming 1 frame = 1 second.
Better and faster
Having made the changes, the process was now taking a far more respectable 7 seconds per frame, and it was only processing some of the frames per lap, not everything.
Additionally, only processing the cropped area raised the accuracy of the person detection to over 85%. It could now show an accurate lap calculation to the user and allow them to work out the distance they had run.
What’s next?
Whilst the application is working and able to calculate the number of laps there are several areas that could be improved on.
In the next version we will add a field for the length of each lap which will allow the app to calculate the distance automatically. It could also calculate some speed and various average metrics as well.
We would also look at using a pre-trained model that only looks for people, which would make the analysis quicker.
Next, we will add the ability to load videos as well as images and later implement a version that can accept a live feed from a camera.
We might add Face/Person recognition so that the app can track multiple people, not just a generic ‘human’. This will allow the application to track a family running in the garden and display an accurate lap count for each person – another opportunity to use Machine Learning.
Possibly add a real time mode with a score board so that the runner can see how well they are doing… or not.
This was a quick and fun project to complete and has proved very useful – I just need to work on repairing the lawn now!
Find out more about ML.NET from the Microsoft website: ML.Net
Thanks for reading and stay safe.