In-Browser object detection using YOLO and TensorFlow.js
ffff
ome time ago, I spent several evenings playing around with state of the art object detection model called YOLO, which is certainly known to those who are interested in Machine Learning on a daily basis. Originally written in Darknet — open source neural network framework — YOLO performs really well in the tasks of locating and recognizing objects on the pictures. Due to the fact that I have been interested in TensorFlow.js for a few weeks now, I decided to check how YOLO will handle the limitations of In-Browser computing. The entire source code as well as my previous TF.js projects can be found on GitHub. If you want to play with the demo version, visit “I Learn Machne Learning” project website.
ome time ago, I spent several evenings playing around with state of the art object detection model called YOLO, which is certainly known to those who are interested in Machine Learning on a daily basis. Originally written in Darknet — open source neural network framework — YOLO performs really well in the tasks of locating and recognizing objects on the pictures. Due to the fact that I have been interested in TensorFlow.js for a few weeks now, I decided to check how YOLO will handle the limitations of In-Browser computing. The entire source code as well as my previous TF.js projects can be found on GitHub. If you want to play with the demo version, visit “I Learn Machne Learning” project website.
Note: I also recommend reading an article that recently appeared on Towards Data Science, which was the starting point for my project. I will try not to duplicate the information contained in it but rather broaden the topics raised in it and hopefully enrich it with my experience.
Old guns for now…
A
few months ago, the third version of YOLO was released. I had the
opportunity to test its capabilities in Python and I had a great hope
that I could use it in my little project. After spending two days
browsing through repositories, forums and documentation, it turned out
that it is not possible to do so right now. As described in the
aforementioned article, to use original YOLO model in your TensorFlow.js
project youmust first make a two-step conversion. The first of steps
takes us from Darknet to TensorFlow / Keras and the second converts our
model into a form understandable for TensorFlow.js. Unfortunately, due
to the fact that YOLOv3 has introduced new layers to its architecture,
and none of the most popular tools like Darkflow or YAD2K
has yet to support their conversion to TensorFlow, we have to stick to
old guns for now. In the future I will definitely need to come back and
change v2 for a newer model.
Let’s get our hands dirty
The
procedure of cennecting model with our application is pretty much
standard and it was already described in details in the first article
of this series. However this time, there is much more dirty work
waiting for us, involved mostly in data processing both before and after
the prediction.
First
of all, our model must be provided with a tensor of appropriate
dimensions - [1, 416, 416, 1] to be exact. As it usually happens these
values are related to the dimensions of training images and batch size.
Such a square input is problematic, because typically pictures are not
cropped this way. Cutting images to meet the above condition, carries
risk of losing valuable data which may result in false recognition of
objects in the picture. To limit this undesirable effect, we use the
popular smartcrop
library, which frames the photo by selecting the most interesting
fragemnt. The picture below is a excellent example of the described
mechanism and a successful prediction that would probably fail without
this trick. Finally, we normalize the values of each pixel, so that they
are between 0 and 1. The last point is particularly important to me, as
I spend almost two hours looking for bug causing my model to perform so
badly. Better late than never…
As
a result of each prediction, model returns tensor with rather strange
dimensions [1, 13, 13, 425]. These enigmatic numbers have been
effectively exposed in this article,
which perfectly explains what is happening under the hood of YOLO. I
recommend it to anyone who would like to understand the meaning of this
beautiful algorithm. Our task now is to convert this tensor into neat
rectangles surrounding the objects in the pictures. This step is quite
extensive and could easily be the subject of a separate article. Without
going into too much detail, I will say that we will use techniques such
as Intersect over Union and Non-Maxima Suppression to get rid of
unlikely results and aggregate the remaining rectangles with high
probabilities into bounding boxes of detected objects. I recommend
viewing the source code, containing these calculations.
Inconsistency across different devices
After
finishing the work on the alpha version, I decided to show off my new
toy in front of my friends. In this way, quite accidentally, I
discovered that the model can behave quite differently on different
devices. The class of detected objects does not change but their
probability values can change by up to several dozen percent. In the
model shown below, the threshold value has been set to 0.5. This means
that all objects with lower probabilities will be filtered out. This was
the fate of the zebra in the lower left image, its probability dropped
by over 25%. TensorFlow.js is still a young library and is struggling
with certain problems - currently there are several issues
related to inconsistancy on their GitHub. Apparently, it is not easy to
make calculations identical on each device. I keep my fingers crossed
for the TensorFlow.js team and I hope that they will solve all these
problems.
Speed kills
Finally,
I would like to write just a few words about one of the important
aspects of web programming (though often overlooked) which is the speed
of the application. After converting YOLO into a form understood by
TF.js, over twenty files are created, which together weigh about 45 MB.
Loading such a large amount of data on a slow 3G connection requires
almost sacred patience. It is certainly worth paying attention to if we
decided to use this type of solution in production.
In few words
TensorFlow.js
is still very young but it gives us developers and date scientists
amazing possibilities. You should be aware of certain limitations which I
mentioned, but it is worth giving TF.js a chance, because its real
capabilities are in my opinion, unexplored.
Comments
Post a Comment