Yang Lu , Song-Chun Zhu , and Ying Nian Wu
University of California, Los Angeles (UCLA), USA
The convolutional neural network (ConvNet or CNN) has proven to be very successful in many tasks such as those in computer vision. In this conceptual paper, we study the generative perspective of the discriminative CNN. In particular, we propose to learn the generative FRAME (Filters, Random field, And Maximum Entropy) model using the highly expressive filters pre-learned by the CNN at the convolutional layers. We show that the learning algorithm can generate realistic and rich object and texture patterns in natural scenes. We explain that each learned model corresponds to a new CNN unit at a layer above the layer of filters employed by the model. We further show that it is possible to learn a new layer of CNN units using a generative CNN model, which is a product of experts model, and the learning algorithm admits an EM interpretation with binary latent variables.
Paper can be downloaded from here.
@inproceedings{LuZhuWu2016, title={Learning FRAME Models Using CNN Filters}, author={Lu, Yang and Zhu, Song-chun and Wu, Ying Nian}, booktitle={Thirtieth AAAI Conference on Artificial Intelligence}, year={2016} }
The code and data can be downloaded from here. For more details, please refer to the instruction.
Figure 1. For each category, the first image displays one of the training images, and the second displays one of the generated images. Please click on the category names or images for more details.
Figure 2. Generating texture patterns. For each category, the first image is the training image, and the second image is generated images. Please click on the category names or images for more details.
Figure 3. Generating hybrid object patterns. For hybrid objects, the first two images display two of the training images, and the last one displays one of the generated images. Please click on the category names or images for more details.
Figure 4. Generated non-aligned pattern. For non-aligned objects, the first image displays the training image, and the second one displays one of the generated images. Please click on the category names or images for more details.
The code in our work is based on the Matlab code of MatConvNet, and the experiments are based on the VGG filters of "Very Deep Convolutional Networks for Large-Scale Visual Recognition".
We thank Jifeng Dai for earlier collaboration on generative CNN. We thank Junhua Mao and Zhuowen Tu for sharing their expertise on CNN. We thank one reviewer for insightful comments and suggestions. The work is supported by NSF DMS 1310391, ONR MURI N00014-10-1-0933, DARPA SIMPLEX N66001-15-C-4035, and DARPA MSEE FA 8650-11-1-7149.