by Alternating Back-Propagation Through Time

Jianwen Xie ^{1*},
Ruiqi Gao ^{2*},
Zilong Zheng ^{2},
Song-Chun Zhu ^{2},
and Ying Nian Wu ^{2}

(* Equal contributions)

^{1} Hikvision Research Institute, Santa Clara, USA

^{2} University of California, Los Angeles (UCLA), USA

This paper studies the dynamic generator model for spatialtemporal processes such as dynamic textures and action sequences in video data. In this model, each time frame of the video sequence is generated by a generator model, which is a non-linear transformation of a latent state vector, where the non-linear transformation is parametrized by a top-down neural network. The sequence of latent state vectors follows a non-linear auto-regressive model, where the state vector of the next frame is a non-linear transformation of the state vector of the current frame as well as an independent noise vector that provides randomness in the transition. The non-linear transformation of this transition model can be parametrized by a feedforward neural network. We show that this model can be learned by an alternating back-propagation through time algorithm that iteratively samples the noise vectors and updates the parameters in the transition model and the generator model. We show that our training method can learn realistic models for dynamic textures and action patterns.

The paper can be downloaded here.

The tex file can be downloaded here.

The poster can be downloaded here.

The AAAI 2019 Oral presentation can be downloaded here.

The Python code using tensorflow can be downloaded here

If you wish to use our code, please cite the following paper:

Jianwen Xie*, Ruiqi Gao*, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu

Contents

Part of the work was done while Ruiqi. Gao was an intern at Hikvision Research Institute during the summer of 2018. She thanks Director Jane Chen for her help and guidance. The work is supported by DARPA XAI project N66001-17-2-4029; ARO project W911NF1810296; ONR MURI project N00014-16-1-2007; and a Hikvision gift to UCLA. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

[1] Xie, Jianwen, et al. "Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet." *CVPR*. 2017.

[2] Xie, Jianwen, et al. "Cooperative Training of Descriptor and Generator Networks." *PAMI*. 2018.

[3] Han, Tian, et al. "Alternating back-propagation for generator network." *AAAI*. 2017.

[4] Tulyakov, Sergey, et al. "MoCoGAN: Decomposing Motion and Content for Video Generation." *CVPR*. 2018.

[5] Doretto, Gianfranco, et al. "Dynamic textures." *IJCV*. 2003.