Tracking with deep networks
Tracking is the process of locating a user selected object in different frames as it moves around the scene. It has a variety of uses such as human-computer interactions, gesture recognition, driver assistance systems, security monitoring, medical imaging and agricultural automations. There has been extensive studies for tracking during the last four decades and many different tracking algorithms have been proposed. However, all these trackers are limited to simple scenarios such as no occlusion, illumination or appearance change and no complex object motion. On the other hand we have such perfect tracker examples: humans and animals!! The human visual system object tracking performance is currently unsurpassed by engineered systems, thus our research tries to take inspiration and reverse-engineer the known principles of cortical processing during visual tracking.Inspired by recent findings on shallow feature extractors of the visual cortex, we postulate that simple tracking processes are based on a shallow neural network that can identify quickly similarities between object features repeated in time. We propose an algorithm that can track and extract motion of an object based on the similarity between local features observed in subsequent frames. The local features are initially defined as a bounding box that defines the object to track.
The Similarity Matching Ratio (SMR) Tracker
The SMR tracker achieved the state-of-the-art performance on the TLD [1] dataset as presented in Table 2. See the SMR Paperto learn more about it!! Download the code and try yourself!
Figure 1 shows snopshots from videos and Table 1 lists the properties. Detection is considered to be correct if its overlap with ground truth bounding box is larger than 25% .
Figure 1 : Snapshots from the sequences with the objects marked by the bounding box [1]
Videos of the SMR tracker on the TLD dataset
**David
**https://www.youtube.com/watch?v=FiUbhmwtASM
**Jumping
**https://www.youtube.com/watch?v=zkhv6cvK-cQ
Pedestrian1
**https://www.youtube.com/watch?v=Pdt7wti2wVw
Pedestrian 2
https://www.youtube.com/watch?v=nVhkO6ZT5sg
Pedestrian 3
https://www.youtube.com/watch?time_continue=20&v=gcsLCIGYvcA
Car
**https://www.youtube.com/watch?v=1eIV1r3tShg
References
- Z. Kalal, J. Matas, and K. Mikolajczyk. P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints. Conference on Computer Vision and Pattern Recognition. 2010
- Z. Kalal, K. Mikolajczyk. Forward-Backward Error: Automatic Detection of Tracking Failures. International Conference on Pattern Recognition. 2010.
- J. Lim, D. Ross, R. Lin, and M. Yang. Incremental learning for visual tracking. NIPS, 2005.
- R. Collins, Y. Liu, and M. Leordeanu. Online selection of discriminative tracking features. PAMI, 27(10):1631–1643, 2005.
- S. Avidan. Ensemble tracking. PAMI, 29(2):261–271, 2007.
- B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with online multiple instance learning. CVPR, 2009.
NOTE: this is an old post from our research in 2012