Cooperative Training of Descriptor and Generator Networks



Jianwen Xie, Yang Lu, Ruiqi Gao, Song-Chun Zhu, and Ying Nian Wu

University of California, Los Angeles (UCLA), USA


Abstract

This paper studies the cooperative training of two probabilistic models of signals such as images. Both models are parametrized by convolutional neural networks (ConvNets). The first network is a descriptor network, which is an exponential family model or an energy-based model, whose feature statistics or energy function are defined by a bottom-up ConvNet, which maps the observed signal to the feature statistics. The second network is a generator network, which is a non-linear version of factor analysis. It is defined by a top-down ConvNet, which maps the latent factors to the observed signal. The maximum likelihood training algorithms of both the descriptor net and the generator net are in the form of alternating back-propagation, and both algorithms involve Langevin sampling. We observe that the two training algorithms can cooperate with each other by jumpstarting each other's Langevin sampling, and they can be naturally and seamlessly interwoven into a CoopNets algorithm that can train both nets simultaneously.

Paper

The paper can be downloaded here.

Code and Data

The code and data for CoopNet can be downloaded here.

The code and data for recovery experiment can be downloaded here.

Illustration

     

           

CoopNets Algorithm

The flow chart of Algorithm D for training the descriptor net. The updating in Step D2 is based on the difference between the observed examples and the synthesized examples. The Langevin sampling of the synthesized examples from the current model in Step D1 can be time consuming.

The flow chart of Algorithm G for training the generator net. The updating in Step G2 is based on the observed examples and their inferred latent factors. The Langevin sampling of the latent factors from the current posterior distribution in Step G1 can be time consuming.

The flow chart of the CoopNets algorithm. The part of the flow chart for training the descriptor is similar to Algorithm D, except that the D1 Langevin sampling is initialized from the initial synthesized examples supplied by the generator. The part of the flow chart for training the generator can also be mapped to Algorithm G, except that the revised synthesized examples play the role of the observed examples, and the known generated latent factors can be used as inferred latent factor.

Experiments

Contents

Exp 1 : Learning Texture Patterns (homogeneous CoopNets)
Exp 2 : Learning Object Patterns from Aligned Images (inhomogeneous CoopNets)
Exp 3 : Learning Object Patterns from Non-aligned Images (inhomogeneous CoopNets)
Exp 4 : Face Completion

Experiment 1: Learning Texture Patterns (stationary CoopNets)


Figure 1. Generating texture patterns. For each category, the first image displays the training images, and the rest of 3 display the generated images.

Experiment 2: Learning Object Patterns from Aligned Images (non-stationary CoopNets)


Figure 2. Generating object patterns. Each row displays one object experiment, where the first 4 images are 4 of the training images, and the rest are 4 of the images generated by the CoopNets algorithm.

Experiment 3: Learning Scene Patterns from Non-aligned Images (non-stationary CoopNets)


Figure 3. Generating scene patterns for desert category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 4. Generating scene patterns for forest road category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 5. Generating scene patterns for hotel room category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 6. Generating scene patterns for rock category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 7. Generating secene patterns for swimming pool category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 8. Generating scene patterns for volcano category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 9. Generating scene patterns for apartment building category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 10. Generating patterns for dinning table category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 11. Generating patterns for schoolbus category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 12. Generating patterns for zebra category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 13. Generating patterns for lemon category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 14. Generating patterns for balloon category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 15. Generating patterns for lifeboat category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.


Figure 16. Generating patterns for strawberry category. Left panel: original images. Right panel: synthesized images generated by the CoopNets algorithm.

Experiment 4: Face Completion

We conduct an experiment on learning from complete training images of human faces, and then testing the learned model on completing the occluded testing images. The training data are 10,000 human faces randomly selected from CelebA dataset (Liu et al., 2015). We run 600 cooperative learning iterations. Figure 17 displays 144 synthesized human faces by the descriptor net.


Figure 17. Generating human face pattern. The synthesized images are generated by the CoopNets algorithm that learns from 10,000 images.

We test the learned generator on the task of recovering the occluded pixels of testing images. We design 3 experiments, where we randomly place a 20×20, 30 × 30, or 40 × 40 mask on each 64 × 64 testing image. These 3 experiments are denoted by M20, M30, and M40 respectively (M for mask). We report the recovery errors and compare our method with 8 different image inpainting methods. Methods MRF-L2 and MRF-L2 are based on Markov random field prior where the nearest neighbor potential terms are L2 and L1 differences respectively. Methods interp-1 to -6 are interpolation methods. Table 1 displays the recovery errors of the 3 experiments, where the error is measured by per pixel difference between the original image and the recovered image on the occluded region, averaged over 100 testing images. Figure 18 displays some recovery results by our method. The first row shows the original images as the ground truth. The second row displays the testing images with occluded pixels. The third row displays the recovered images by the generator net trained by the CoopNets algorithm on the 10,000 training images.


Figure 18. Generating human face pattern. The synthesized images are generated by the CoopNets algorithm that learns from 10,000 images.

Table 1: Comparison of recovery errors among different inpainting methods in 3 experiments

Exp ours MRF-L2 MRF-L1 interp-1 interp-2 interp-3 interp-4 interp-5 interp-6
M20 0.0966 0.1545 0.1506 0.1277 0.1123 0.2493 0.1123 0.1126 0.1277
M30 0.1112 0.1820 0.1792 0.1679 0.1321 0.3367 0.1310 0.1312 0.1679
M40 0.1184 0.2055 0.2032 0.1894 0.1544 0.3809 0.1525 0.1526 0.1894

Acknowledgement

The code in our work is based on the Matlab code of MatConvNet of Vedaldi & Lenc (2015). We thank the authors for sharing their code with the community.

We thank Hansheng Jiang for her work on this project as a summer visiting student of the UCLA Cross-disciplinary Scholars in Science and Technology (CSST) program. We thank Tian Han for sharing the code on learning the generator network, and for helpful discussions.

The work is supported by NSF DMS 1310391, DARPA SIMPLEX N66001-15-C-4035, ONR MURI N00014-16-1-2007, and DARPA ARO W911NF-16-1-0579.

Top