Research of Ying Nian Wu

Research interests: generative models, representation learning, unsupervised learning, computer vision, computational neuroscience, bioinformatics.

Full list of publications

Below are selected recent papers with annotations

Recent themes:

(1) Representation learning and modeling in latent space, in the form of algebraic structures (Gao et al. 2020) and probability models (Neurips 2020 by Pang et al.), with applications in computational neuroscience.
(2) Maximum likelihood learning of deep generative models, including top-down directed graphical models and undirected energy-based models (which we used to call Gibbs models, random field models, or descriptive models before the term energy-based models became popular) as well as their integrations (Neurips 2020 by Pang et al.), powered by short-run MCMC for inference (ECCV 2020 by Nijkamp et al.) and synthesis (Neurips 2019 by Nijkamp et al.) computation, which can be compared to attractor dynamics in neuroscience.
(3) Joint training and discriminative training of various models, e.g., energy-based model, flow-based model, generator model, and inference model, without resorting to MCMC, which is amortized by learned computation.

Earlier themes:

* Maximum likelihood learning of modern ConvNet-parametrized energy-based model, ICML 16 by Xie et al. It is seen as a multi-layer generalization of the FRAME (Filters, Random field And Maximum Entropy) model, Neural Computation 1997 by Zhu, Wu and Mumford, where sampling is accomplished by Langevin dynamics, interpreted as Gibbs Reaction And Diffusion Equations, GRADE, PAMI 1998 by Zhu and Mumford.
* Scale up maximum likelihood learning of ConvNet-EBM to big datasets, CVPR 18 by Gao et al.
* Adversarial interpretation of maximum likelihood learning of ConvNet-EBM, CVPR 17 by Xie et al. EBM serves the roles of both the generator (actor) and the discriminator (critic). The MLE learning is self-critic.
* Formulate modern ConvNet-parametrized EBM as exponential tilting of a reference distribution, and connect it to discriminative ConvNet classifier, ICLR 15 by Dai et al., ICML 16 by Xie et al. EBM is a generative version of a discriminator.
* Maximum likelihood learning of generator network, including its dynamic version, using alternating back-propagation algorithm, without resorting to inference model, AAAI 17 by Han et al., AAAI 19 by Xie et al.
* Cooperative learning of EBM and generator network, where EBM (teacher model) revises examples generated by the generator network (student model), and generator network learns from EBM revision, AAAI 18, PAMI 20 by Xie et al. The generator is learned sampler of EBM, and the generator partially amortizes MCMC sampling of EBM. Learned computation can be considered temporal difference learning from internal data produced by MCMC.
* Divergence triangle that unifies variational learning and adversarial learning, CVPR 19, 20 by Han et al. Divergence triangle avoids MCMC sampling, or amortizes MCMC by learned networks. Various forms of divergence triangle explain almost all the generative learning algorithms.
* Joint training of flow-based model and EBM by noise contrastive estimation, CVPR 20 by Gao et al., to avoid MCMC and to facilitate semi-supervised learning.
* Learning generator models for motion, deformation, sparse coding, and neuroscience, AAAI 19, 2020 by Xie et al., CVPR 19, 2020 by Xing et al., IJCAI 19, Neural Computation 2020, by Han et al.



(Synthesized images)
R Gao, Y Song, B Poole, YN Wu, and DP Kingma (2021) Learning energy-based models by diffusion recovery likelihood. International Conference on Learning Representations (ICLR). pdf


(learned grid cells)
R Gao, J Xie, X Wei, SC Zhu, and YN Wu (2021) On path integration model of grid cells: locally conformal embedding and matrix Lie algebra. pdf
Comment: The brain represents self-position by a vector. When the agent moves, the vector is transformed by a recurrent network. Locally the self-motion is represented by the derivative of the recurrent network. The isotropy of the norm of the derivative underlies the hexagon grid patterns. A locally linear recurrent model leads to matrix Lie algebra and matrix Lie group via exponential map. The matrix Lie group forms explicit representation of self-motion.


J Xie*, Z Zheng*, X Fang, SC Zhu, and YN Wu (2021) Learning cycle-consistent cooperative networks via alternating MCMC teaching for unsupervised cross-domain translation. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI) 2021. pdf



(Left: latent space EBM stands on generator. Right: Short-run MCMC in latent space)
B Pang*, T Han*, E Nijkamp*, SC Zhu, and YN Wu (2020) Learning latent space energy-based prior model. Neural Information Processing Systems (NeurIPS), 2020. pdf (one-page code in appendix) project page slides
Comment: This paper originates from an early work IJCV 2003 by Guo, Zhu, Wu, where a top-down model generates textons, and the energy-based model regulates perceptual organization of textons, or describes the Gestalt law of textons.
The latent space EBM stands on a top-down generation network. It is like a value network or cost function defined in latent space.
The scalar-valued energy function is an objective function, a cost function, an evaluator or a critic. It is about constraints, regularities, rules, perceptual organizations, and Gestalt laws. The energy-based model is descriptive instead of generative, which is the reason we used to call it the descriptive model. It only describes what it wants without bothering with how to get it. Compared to generator model, the energy-based model is like setting up an equation, whereas the generator model is like giving the solution directly. It is much easier to set up the equation than giving the answer, i.e., it is easier to specify a scalar-valued energy function than a vector-valued generation function, the latter is like a policy network.
The energy-based model in latent space is simple and yet expressive, capturing rules or regularities implicitly but effectively. The latent space seems the right home for energy-based model.
Short-run MCMC in latent space for prior and posterior sampling is efficient and mixes well. One can amortize MCMC with learned network (see our recent work on semi-supervised learning), but in this initial paper we prefer to keep it pure and simple, without mixing in tricks from VAE and GAN.



(Left: latent EBM captures chemical rules implicitly in latent space. Right: generated molecules)
B Pang, T Han, and YN Wu (2020) Learning latent space energy-based prior model for molecule generation. Machine Learning for Molecules Workshop at NeurIPS 2020. pdf
Comment: The EBM in latent space captures the chemical rules effectively (and implicitly).



(the symbolic one-hot y is coupled with dense vector z to form an associative memory, and z is the information bottleneck between x and y)
B Pang, E Nijkamp, J Cui, T Han, and YN Wu (2020) Semi-supervised learning by latent space energy-based model of symbol-vector coupling. ICBINB Workshop at NeurIPS 2020. pdf
Comment: In this paper, we jointly train the model with an inference network to amortize posterior sampling. The EBM in latent space couples dense vector for generation and one-hot vector for classification. The symbol-vector coupling is like a coin that has a symbolic side and a dense vector side, similar to the particle-wave duality. There may be many such coins and they may be organized in multiple layers. The symbol-vector coupling seeks to model the interaction between hippocampus and entorhinal cortex. The latent vector captures information bottleneck.



(VAE as alternating projection)
T Han, J Zhang and YN Wu (2020) From em-projections to variational auto-encoder. Deep Learning through Information Geometry Workshop at NeurIPS 2020. pdf



(mode traversing HMC chains)
E Nijkamp*, R Gao*, P Sountsov, S Vasudevan, B. Pang, SC Zhu, and YN Wu (2020) Learning energy-based model with flow-based backbone by neural transport MCMC. pdf
Comment: Energy-based correction of a top-down backbone model is more appealing than learning an undirected latent energy-based model from scratch. Fast mixing MCMC should be brought back to deep learning.


E Nijkamp*, B Pang*, T Han, L Zhou, SC Zhu, and YN Wu (2020) Learning multi-layer latent variable model via variational optimization of short run MCMC for approximate inference, European Conference on Computer Vision (ECCV). pdf project page
Comment: The goal is to completely do away with a learned inference network. Short-run MCMC is convenient and automatic for complex top-down models, where top-down feedback and lateral inhibition automatically emerge, and short-run MCMC can be compared to attractor dynamics in neuroscience.



(learned V1 cells)
R Gao, J Xie, S Huang, Y Ren, SC Zhu, and YN Wu (2020) Learning V1 simple cells with vector representations of local contents and matrix representations of local motions. pdf project page
Comment: Simple cells in the primary visual cortex are modeled by Gabor wavelets, and adjacent simple cells exhibit quadrature phase relations (e.g., sine and cosine pair). The local image contents are represented by vectors, and the local motions are represented by rotations of the vectors. This explains the aforementioned neuroscience observations.


Y Xu, J Xie, T Zhao, C Baker, Y Zhao, and YN Wu (2020) Energy-based continuous inverse optimal control. Machine Learning for Autonomous Driving Workshop at NeurIPS 2020. pdf
Comment: The energy function plays the role of cost function for optimal control, and it can be learned from human drivers for autonomous driving.



J Xie*, Z Zheng*, R Gao, W Wang, SC Zhu, and YN Wu (2020) Generative VoxelNet: learning energy-based models for 3D shape synthesis and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). Accepted. pdf project page


(The model generates both displacement field and appearance)
X Xing, R Gao, T Han, SC Zhu, and YN Wu (2020) Deformable generator networks: unsupervised disentanglement of appearance and geometry. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). Accepted. pdf project page
Comment: Separating geometry and appearance is crucial for vision. The model represents the displacement of the image grid explicitly.



(neural-symbolic learning)
Q Li, S Huang, Y Hong, Y Chen, YN Wu, and SC Zhu (2020) Closed loop neural-symbolic learning via integrating neural perception, grammar parsing, and symbolic reasoning. International Conference on Machine Learning (ICML). pdf project page



R Gao, E Nijkamp, DP Kingma, Z Xu, AM Dai, and YN Wu (2020) Flow contrastive estimation of energy-based model. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page
Comment: Noise contrastive estimation (NCE) with flow-based model serving as the contrastive or negative distribution. The flow-based model transports the Gaussian noise distribution to be closer to the data distribution, thus providing stronger contrast.



X Xing, T Wu, SC Zhu, and YN Wu (2020) Inducing hierarchical compositional model by sparsifying generator network. Computer Vision and Pattern Recognition (CVPR). pdf project page
Comment: By sparsifying the activities of neurons at multiple layers of a dense top-down model, the learned connections are also sparsified as a result, so that a hierarchical compositional model can emerge.



(generated by the trained model)
T Han, E Nijkamp, B Pang, L Zhou, SC Zhu, and YN Wu (2020) Joint training of variational auto-encoder and latent energy-based model. Computer Vision and Pattern Recognition (CVPR). pdf project page
Comment: Another version of divergence triangle that unifies variational learning and adversarial learning by an objective function that is of a symmetric and anti-symmetric form that consists of three Kullback-Leibler divergences between three joint distributions.


J Xie*, R Gao*, Z Zheng, SC Zhu, and YN Wu (2020) Motion-based generator model: unsupervised disentanglement of appearance, trackable and intrackable motions in dynamic patterns. AAAI-20: 34th AAAI Conference on Artificial Intelligence. pdf project page



E Nijkamp*, M Hill*, T Han, SC Zhu, and YN Wu (2020) On the anatomy of MCMC-based maximum likelihood learning of energy-based models. (* equal contribution). AAAI-20: 34th AAAI Conference on Artificial Intelligence. pdf project page


J Xie, R Gao, E Nijkamp, SC Zhu, and YN Wu (2020) Representation learning: a statistical perspective. Annual Review of Statistics and Its Application (ARSIA). pdf



J Xie, Y Lu, R Gao, SC Zhu, and YN Wu (2020) Cooperative learning of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). pdf slides project page video
Comment: The descriptor is an energy-based model, which we used to call descriptive model, random field model, or Gibbs model in our earlier work on FRAME (Filters, Random field, And Maximum Entropy) model, which was one of the earliest energy-based models before the term energy-based model was coined. The generator serves as a learned sampler of EBM to amortize the MCMC sampling of EBM. The generator learns from MCMC samples by a temporal difference scheme which we call MCMC teaching. The MCMC samples can be considered the internal data. The point is that the neural network can learn from external data of observations for modeling, but it can also learn from internal data of computations for amortized computing.



(reconstruction by short-run MCMC, yes it can reconstruct observed images)
E Nijkamp, M Hill, SC Zhu, and YN Wu (2019) On learning non-convergent non-persistent short-run MCMC toward energy-based model. Neural Information Processing Systems (NeurIPS), 2019 pdf (code in appendix)


Z Zhang*, Z Pan*, Y Ying, Z Xie, S Adhikari, J Phillips, RP Carstens, DL Black, YN Wu, and Y Xing (2019) Deep-learning augmented RNA-seq analysis of transcript splicing. Nature Methods, 16:307-10, pdf


T Han, X Xing, J Wu, and YN Wu (2019) Replicating neuroscience observations on ML/MF and AM face patches by deep generative model. Neural Computation, pdf



(learned grid cells)
R Gao*, J Xie*, SC Zhu, and YN Wu (2019) Learning grid cells as vector representation of self-position coupled with matrix representation of self-motion. (* equal contribution). International Conference on Learning Representations (ICLR). pdf project page


J Xie, SC Zhu, and YN Wu (2019) Learning energy-based spatial-temporal generative ConvNet for dynamic patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). pdf project page



(faces generated and interpolated by the learned model)
T Han*, E Nijkamp*, X Fang, M Hill, SC Zhu, YN Wu (2019) Divergence triangle for joint training of generator model, energy-based model, and inference model.(* equal contribution). Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page
Comment: The divergence triangle unifies variational learning and adversarial learning by an objective function that is of a symmetric and anti-symmetric form that consists of three Kullback-Leibler divergences between three joint distributions.


X Xing, T Han, R Gao, SC Zhu, and YN Wu (2019) Unsupervised disentanglement of appearance and geometry by deformable generator network. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page



(videos generated by the learned model)
J Xie*, R Gao*, Z Zheng, SC Zhu, and YN Wu (2019) Learning dynamic generator model by alternating back-propagation through time. AAAI-19: 33rd AAAI Conference on Artificial Intelligence. pdf project page


YN Wu, R Gao, T Han, and SC Zhu (2019) A tale of three probabilistic families: discriminative, descriptive and generative models. Quarterly of Applied Mathematics. pdf


T Han, J Wu, and YN Wu (2018) Replicating active appearance model by generator network. International Joint Conference on Artificial Intelligence (IJCAI). pdf


J Xie*, Z Zheng*, R Gao, W Wang, SC Zhu, and YN Wu (2018) Learning descriptor networks for 3D shape synthesis and analysis. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page


R Gao*, Y Lu*, J Zhou, SC Zhu, and YN Wu (2018) Learning generative ConvNets via multigrid modeling and sampling. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page
Comment: Scale up maximum likelihood learning of modern ConvNet energy-based model to big datasets.



J Xie, Y Lu, R Gao, and YN Wu (2018) Cooperative learning of energy-based model and latent variable model via MCMC teaching. AAAI-18: 32nd AAAI Conference on Artificial Intelligence. pdf slides project page
Comment: The EBM is the teacher, and the generator is the student. Student writes a draft, teacher revises it, student learns from revision. Compared to GAN, the cooperative learning has a revision process that improves the results from generator.



J Xie, SC Zhu, and YN Wu (2017) Synthesizing dynamic patterns by spatial-temporal generative ConvNet. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page
Comment: The paper gives adversarial interpretation of maximum likelihood learning of ConvNet-parametrized energy-based model.



(learning directly from occluded images. Row 1: original images, not available to model; Row 2: training images. Row 3: learning and reconstruction. )
T Han*, Y Lu*, SC Zhu, and YN Wu (2017) Alternating back-propagation for generator network. AAAI-17: 31st AAAI Conference on Artificial Intelligence. pdf project page
Comment: Maximum likelihood learning of generator network, without resorting to inference model.



(left: observed; right: synthesized.)
J Xie*, Y Lu*, SC Zhu, and YN Wu (2016) A theory of generative ConvNet. International Conference on Machine Learning (ICML). pdf project page
Comment: Maximum likelihood learning of modern ConvNet-parametrized energy-based model, with connections to Hopfield network, auto-encoder, score matching and contrastive divergence.



(Langevin dynamics for sampling ConvNet-EBM)
Y Lu, SC Zhu, and YN Wu (2016) Learning FRAME models using CNN filters. AAAI-16: 30th AAAI Conference on Artificial Intelligence. pdf project page
Comment: The modern ConvNet-parametrized energy-based model is a multi-layer generalization of FRAME (Filter, Random field, And Maximum Entropy) model, Neural Computation 1997, Zhu, Wu, and Mumford. This paper generates realistic images by Langevin sampling of modern ConvNet-EBM. The Langevin dynamics was interpreted as Gibbs Reaction And Diffusion Equations (GRADE) by PAMI 1998, Zhu and Mumford.


J Dai, Y Lu, and YN Wu (2015) Generative modeling of convolutional neural networks. International Conference on Learning Representations (ICLR). pdf project page
Comment: The paper formulates modern ConvNet-parametrized energy-based model as exponential tilting of a reference distribution, and connect it to discriminative ConvNet classifier.


Below are selected earlier papers

J Xie, Y Lu, SC Zhu, and YN Wu (2016) Inducing wavelets into random fields via generative boosting. Applied and Computational Harmonic Analysis, 41, 4-25. pdf project page

J Xie, W Hu, SC Zhu, and YN Wu (2014) Learning sparse FRAME models for natural image patterns. International Journal of Computer Vision. pdf project page

J Dai, Y Hong, W Hu, SC Zhu, and YN Wu (2014) Unsupervised learning of dictionaries of hierarchical compositional models. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page

J Dai, YN Wu, J Zhou, and SC Zhu (2013) Cosegmentation and cosketch by unsupervised learning. Proceedings of International Conference on Computer Vision (ICCV). pdf project page

Y Hong, Z Si, WZ Hu, SC Zhu, and YN Wu (2013) Unsupervised learning of compositional sparse code for natural image representation. Quarterly of Applied Mathematics. pdf project page

YN Wu, Z Si, H Gong, SC Zhu (2010) Learning active basis model for object detection and recognition. International Journal of Computer Vision, 90, 198-235. pdf project page

Z Si, H Gong, SC Zhu, YN Wu (2010) Learning active basis models by EM-type algorithms. Statistical Science, 25, 458-475. pdf project page

YN Wu, C Guo, SC Zhu (2008) From information scaling of natural images to regimes of statistical models. Quarterly of Applied Mathematics, 66, 81-122. pdf

YN Wu, Z Si, C Fleming, and SC Zhu (2007) Deformable template as active basis. Proceedings of International Conference of Computer Vision. pdf project page

M Zheng, LO Barrera, B Ren, YN Wu (2007) ChIP-chip: data, model and analysis. Biometrics, 63, 787-796. pdf

C Guo, SC Zhu, and YN Wu (2007) Primal sketch: integrating structure and texture. Computer Vision and Image Understanding, 106, 5-19. pdf project page

C Guo, SC Zhu, and YN Wu (2003) Towards a mathematical theory of primal sketch and sketchability. Proceedings of International Conference of Computer Vision. 1228-1235. pdf project page

G Doretto, A Chiuso, YN Wu, S Soatto (2003) Dynamic textures. International Journal of Computer Vision, 51, 91-109. pdf (source code given in paper) project page

C Guo, SC Zhu, and YN Wu (2003) Modeling visual patterns by integrating descriptive and generative models. International Journal of Computer Vision, 53(1), 5-29. pdf

YN Wu, SC Zhu, X Liu (2000) Equivalence of Julesz ensembles and FRAME models. International Journal of Computer Vision, 38, 247-265. pdf project page

JS Liu, YN Wu (1999) Parameter expansion for data augmentation. Journal of the American Statistical Association, 94, 1264-1274. pdf

C Liu, DB Rubin, YN Wu (1998) Parameter expansion to accelerate EM -- the PX-EM algorithm. Biometrika, 85, 755-770. pdf

SC Zhu, YN Wu, DB Mumford (1998) Minimax entropy principle and its application to texture modeling. Neural Computation, 9, 1627-1660. pdf

SC Zhu, YN Wu, DB Mumford (1997) Filter, Random field, And Maximum Entropy (FRAME): towards a unified theory for texture modeling. International Journal of Computer Vision, 27, 107-126. pdf

YN Wu (1995) Random shuffling: a new approach to matching problem. Proceedings of American Statistical Association, 69-74. Longer version pdf