Jianwen Xie,
Yang Lu,
Ruiqi Gao,
Song-Chun Zhu,
and Ying Nian Wu

University of California, Los Angeles (UCLA), USA

This paper studies the cooperative training of two probabilistic models of signals such as images. Both models are parametrized by convolutional neural networks (ConvNets). The first network is a descriptor network, which is an exponential family model or an energy-based model, whose feature statistics or energy function are defined by a bottom-up ConvNet, which maps the observed signal to the feature statistics. The second network is a generator network, which is a non-linear version of factor analysis. It is defined by a top-down ConvNet, which maps the latent factors to the observed signal. The maximum likelihood training algorithms of both the descriptor net and the generator net are in the form of alternating back-propagation, and both algorithms involve Langevin sampling. We observe that the two training algorithms can cooperate with each other by jumpstarting each other's Langevin sampling, and they can be naturally and seamlessly interwoven into a CoopNets algorithm that can train both nets simultaneously.

The paper can be downloaded here.

The code and data for CoopNet can be downloaded here.

The code and data for recovery experiment can be downloaded here.

The flow chart of Algorithm D for training the descriptor net. The updating in Step D2 is based on the difference between the observed examples and the synthesized examples. The Langevin sampling of the synthesized examples from the current model in Step D1 can be time consuming. |
The flow chart of Algorithm G for training the generator net. The updating in Step G2 is based on the observed examples and their inferred latent factors. The Langevin sampling of the latent factors from the current posterior distribution in Step G1 can be time consuming. |
---|

The flow chart of the CoopNets algorithm. The part of the flow chart for training the descriptor is similar to Algorithm D, except that the D1 Langevin sampling is initialized from the initial synthesized examples supplied by the generator. The part of the flow chart for training the generator can also be mapped to Algorithm G, except that the revised synthesized examples play the role of the observed examples, and the known generated latent factors can be used as inferred latent factor. |
---|

We conduct an experiment on learning from complete training images of human faces, and then testing the learned model on completing the occluded testing images. The training data are 10,000 human faces randomly selected from CelebA dataset (Liu et al., 2015). We run 600 cooperative learning iterations. Figure 17 displays 144 synthesized human faces by the descriptor net.

We test the learned generator on the task of recovering the occluded pixels of testing images. We design 3 experiments, where we randomly place a 20×20, 30 × 30, or 40 × 40 mask on each 64 × 64 testing image. These 3 experiments are denoted by M20, M30, and M40 respectively (M for mask). We report the recovery errors and compare our method with 8 different image inpainting methods. Methods MRF-L2 and MRF-L2 are based on Markov random field prior where the nearest neighbor potential terms are L2 and L1 differences respectively. Methods interp-1 to -6 are interpolation methods.** Table 1** displays the recovery errors of the 3 experiments, where the error is measured by per pixel difference between the original image and the recovered image on the occluded region, averaged over 100 testing images. Figure 18 ** **displays some recovery results by our method. The first row shows the original images as the ground truth. The second row displays the testing images with occluded pixels. The third row displays the recovered images by the generator net trained by the CoopNets algorithm on the 10,000 training images.

Exp | ours | MRF-L2 | MRF-L1 | interp-1 | interp-2 | interp-3 | interp-4 | interp-5 | interp-6 |

M20 | 0.0966 |
0.1545 | 0.1506 | 0.1277 | 0.1123 | 0.2493 | 0.1123 | 0.1126 | 0.1277 |

M30 | 0.1112 |
0.1820 | 0.1792 | 0.1679 | 0.1321 | 0.3367 | 0.1310 | 0.1312 | 0.1679 |

M40 | 0.1184 |
0.2055 | 0.2032 | 0.1894 | 0.1544 | 0.3809 | 0.1525 | 0.1526 | 0.1894 |

The code in our work is based on the Matlab code of MatConvNet of Vedaldi & Lenc (2015). We thank the authors for sharing their code with the community.

We thank Hansheng Jiang for her work on this project as a summer visiting student of the UCLA Cross-disciplinary Scholars in Science and Technology (CSST) program. We thank Tian Han for sharing the code on learning the generator network, and for helpful discussions.

The work is supported by NSF DMS 1310391, DARPA SIMPLEX N66001-15-C-4035, ONR MURI N00014-16-1-2007, and DARPA ARO W911NF-16-1-0579.