Hierarchical Sparse FRAME

IEEE Conference on Computer Vision and Pattern Recognition 2017

Generative Hierarchical Learning of Sparse FRAME Models

Jianwen Xie ¹ Yifei Xu ² Erik Nijkamp ¹ Ying Nian Wu ¹ Song-Chun Zhu ¹

¹ University of California, Los Angeles (UCLA), USA ² Shanghai Jiao Tong University

Abstract

This paper proposes a method for generative learning of hierarchical random field models. The resulting model, which we call the hierarchical sparse FRAME (Filters, Random field, And Maximum Entropy) model, is a generalization of the original sparse FRAME model by decomposing it into multiple parts that are allowed to shift their locations, scales and rotations, so that the resulting model becomes a hierarchical deformable template. The model can be trained by an EM-type algorithm that alternates the following two steps: (1) Inference: Given the current model, we match it to each training image by inferring the unknown locations, scales, and rotations of the object and its parts by recursive sum-max maps, and (2) Re-learning: Given the inferred geometric configurations of the objects and their parts, we re-learn the model parameters by maximum likelihood estimation via stochastic gradient algorithm. Experiments show that the proposed method is capable of learning meaningful and interpretable templates that can be used for object detection, classification and clustering.

Model

Representation

Figure 1. Hierarchical representation. (a) A hierarchical sparse FRAME model with 2 × 2 parts is learned from roughly aligned observed images. The parts are visualized by the synthesized images generated by the learned model. (b) A testing image with bounding boxes showing the inferred locations, rotations and scales of the object (red) and parts (blue). (c) A mixture of hierarchical sparse FRAME models is learned by an EM-type algorithm from animal face images of four categories without manual labeling. The learned mixture model is visualized as an And-Or graph, where an OR node (in black) represents a selection between difference alternatives and an AND node (in blue) represents a composition of terminal nodes or children nodes. The object and part templates shown in the And-Or graph are synthesized images generated by the learned model via MCMC.

Learning

Figure 2. EM-type learning algorithm for hierarchical sparse FRAME models. (a) 2 × 2 parts of synthesized images generated by the learned model. (b) 2 × 2 parts of sketch templates where each Gabor wavelet is illustrated by a bar. (c) 12 examples of 26 non-aligned training images from cat category, with bounding boxes showing the inferred locations, scales, and rotations of the objects (black) and parts (colored) by the learned model in E-step. (d) Inference results of the learned model on 2 testing images.

Experiments

Exp 1 (Clustering): Evaluating mixture models by clustering tasks

Exp 2 (Detection): Object, part, and key point localization

Exp 3 (Classification): Evaluating unsupervisedly learned models via classification

Reference

[1] Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Inducing wavelets into random fields via generative boosting. Applied and Computational Harmonic Analysis.
[2] Xie, J., Hu, W., Zhu, S. C., & Wu, Y. N. (2015). Learning sparse FRAME models for natural image patterns. International Journal of Computer Vision.

The work is supported by NSF DMS 1310391, DARPA SIMPLEX N66001-15-C-4035, ONR MURI N00014-16-1-2007, and DARPA ARO W911NF-16-1-0579.