Jianwen Xie ^{1*},
Ruiqi Gao ^{2*},
Zilong Zheng ^{2},
Song-Chun Zhu ^{2},
and Ying Nian Wu ^{2}

(* Equal contributions)

^{1} Hikvision Research Institute, Santa Clara, USA

^{2} University of California, Los Angeles (UCLA), USA

Dynamic patterns are characterized by complex spatial and motion patterns. Understanding dynamic patterns requires a disentangled representational model that separates the factorial components. A commonly used model for dynamic patterns is the state space model, where the state evolves over time according to a transition model and the state generates the observed image frames according to an emission model. To model the motions explicitly, it is natural for the model to be based on the motions or the displacement fields of the pixels. Thus in the emission model, we let the hidden state generate the displacement field, which warps the trackable component in the previous image frame to generate the next frame while adding a simultaneously emitted residual image to account for the change that cannot be explained by the deformation. The warping of the previous image is about the trackable part of the change of image frame, while the residual image is about the intrackable part of the image. We use a maximum likelihood algorithm to learn the model parameters that iterates between inferring latent noise vectors that drive the transition model and updating the parameters given the inferred latent vectors. Meanwhile we adopt a regularization term to penalize the norms of the residual images to encourage the model to explain the change of image frames by trackable motion. Unlike existing methods on dynamic patterns, we learn our model in unsupervised setting without ground truth displacement fields or optical flows. In addition, our model defines a notion of intrackability by the separation of warped component and residual component in each image frame. We show that our method can synthesize realistic dynamic pattern, and disentangling appearance, trackable and intrackable motions. The learned models can be useful for motion transfer, and it is natural to adopt it to define and measure intrackability of a dynamic pattern.

The paper can be downloaded here.

The tex file can be downloaded here.

The poster can be downloaded here.

The AAAI 2020 Oral presentation can be downloaded here.

The Python code using tensorflow is comming soon

If you wish to use our code, please cite the following paper:

Jianwen Xie*, Ruiqi Gao*, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu

Contents

The work is supported by DARPA XAI project N66001-17-2-4029; ARO project W911NF1810296; ONR MURI project N00014-16-1-2007. We thank Yifei Xu for his assistance with experiments. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research

[1] Xie, Jianwen, et al. "Learning Dynamic Generator Model by Alternating Back-Propagation Through Time." *AAAI*. 2019.

[2] Xie, Jianwen, et al. "Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet." *CVPR*. 2017.

[3] Gong, Haifeng, et al. "Intrackability : Characterizing Video Statistics and Pursuing Video Representations." *IJCV*. 2012.

[4] Doretto, Gianfranco, et al. "Dynamic textures." *IJCV*. 2003.