Imagine, for some reason, someone wanted to have on a mobile device a classifier that could recognize various object classes. That is, if the system recognized face it would not be able to tell to who it belonged - but it would know that it was a face. Viola and Jones provide a nice system (Haar-like classifiers) to do such classification for faces. For anyone who wants to go further, Eigenfaces are a good place to start looking.
What I ask is: is there a way to rig a classifier based on Haar-like features to classify multiple features in parallel? That is, can we efficiently detect 50+ object classes on an iPod/iPad in real time? (I like developing on iOS, I know Android is more "open", but the Apple's tools are a pleasure to use!)
Derivatives are the solution. I forget in which paper I read this, but the rectangular features that are subtracted from each other are like coarse derivatives using many pixel values for normalization purposes. This is achieved by realizing that the Haar-like features form a basis that is over-represented (it's not orthonormal).
So - in other words the Haar-like features present a reduced/compressed form of the image which is convenient for recognition purposes.
According to my arithmetic, at lower capture resolutions, 10 passes over the image have to be done to scan a 24x24 window over the image (one pass per scale). So we start with a 24x24 window that is slide across the entire image, then we increase it's size to say 28x28 pixels and slide it across the image, and so forth. According to Viola and Jones the increment should percentage of the original size (as is the shifting). I really recommend reading their paper: Rapid Object Detection using a Boosted Cascade of Simple Features.
What I started to ask myself: could a Haar-like feature be shared among multiple classes of objects? Yes. The features are differences. The varying thresholds to accept a feature may pose a problem, but that would be solved with how the data is organized.
But I have a bet. An interesting bet. If we have just a horizontal and vertical feature. Could we restrict it to a square? Yes... That means that the first feature we pick is going to be a square (either horizontal or vertical split for the subtraction). It also means that we have divided the space into 2 buckets.
Ok, next round - what is the standard deviation among the features from the training set? That should determine the next round of features that will be placed spatially around the initial feature. So we detect a rectangle, but that rectangle may belong to a larger feature which we must then scan for by checking surrounding rectangles.
However, we can not let the number of rectangles to test explode after the initial round.
By using this trick, Adaboost in the form that Viola and Jones describes would not be sufficient. Rather, I believe rigging the features so they don't overlap may provide the solution that I need.
This is as far as my thinking has gone (except for a means of accelerating the training so it could be done on the iOS device as well - I've solved it and managed to find a way to reduce storage needs, but that is the simple part of the puzzle that I'm looking at).
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment