Abstract

This project presents a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. The AOG is discriminatively learned online to account for the appearance (e.g., lighting and partial occlusion) and structural (e.g., different poses and viewpoints) variations of the object itself, as well as the distractors (e.g., similar objects) in the scene background. In tracking, the state of the object (i.e., bounding box) is inferred by parsing with the current AOG using a spatial-temporal dynamic programming (DP) algorithm. When the AOG grows big for handling objects with large variations in long-term tracking, we propose a bottom-up/top-down scheduling scheme for efficient inference, which performs focused inference with the most stable and discriminative small sub-AOG. During online learning, the AOG is re-learned iteratively with two steps:
(i) Identifying the false positives and false negatives of the current AOG in a new frame by exploiting the spatial and temporal constraints observed in the trajectory;
(ii) Updating the structure of the AOG, and re-estimating the parameters based on the augmented training dataset. In experiments, the proposed method outperforms state-of-theart tracking algorithms on a recent public tracking benchmark with 50 testing videos and 29 publicly available trackers evaluated

 

  Demo

 

  Paper

Yang Lu, Tianfu Wu and Song-Chun Zhu, Online Object Tracking, Learning and Parsing with And-Or Graphs, CVPR, 2014

@inproceedings{LuTianfuZhu2014,
    title={Online Object Tracking, Learning and Parsing with And-Or Graphs},
    author={Lu, Yang and Wu, Tianfu and Zhu, Song-chun},
    booktitle={Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Computer Society Conference on},
    year={2014},
    organization={IEEE}
}
						

 

  Results

We test our method on a recent public benchmark consisting of 50 video clips which have different challenging aspects such illumination variation, scale variation, nonrigid deformation, occlusion, and out-of-view, etc.

Table 1. Overall performance comparison of the top 10 trackers evaluated on the 50-video benchmark. We follow the evaluation protocol proposed in to compute the precision and success rate.

Figure 1. Plots of overall performance comparison for the 50 videos in the benchmark. The proposed method (“AOGTracker”) obtains better performance in terms of precision (left) and success (right) plot.

Figure 2. Detail comparisons in different subsets divided based on main variation of the object to be tracked (e.g., objects in 15 videos have the non-grid deformation including “Basketball”, “Bolt”, “Couple”, “Crossing”, etc.). The details of the subsets refer to. The proposed method (“AOGTracker”) obtains better or comparable performance in all the subsets.

 

  Code

Code