(1) Representation learning and modeling in latent space, in the form of algebraic structures (Gao et al. 2020) and probability models (Neurips 2020 by Pang et al.), with applications in computational neuroscience.

(2) Maximum likelihood learning of deep generative models, including top-down directed graphical models and undirected energy-based models (which we used to call Gibbs models, random field models, or descriptive models before the term energy-based models became popular) as well as their integrations (Neurips 2020 by Pang et al.), powered by short-run MCMC for inference (ECCV 2020 by Nijkamp et al.) and synthesis (Neurips 2019 by Nijkamp et al.) computation, which can be compared to attractor dynamics in neuroscience.

(3) Joint training and discriminative training of various models, e.g., energy-based model, flow-based model, generator model, and inference model, without resorting to MCMC, which is amortized by learned computation.

* Maximum likelihood learning of modern ConvNet-parametrized energy-based model, ICML 16 by Xie et al. It is seen as a multi-layer generalization of the FRAME (Filters, Random field And Maximum Entropy) model, Neural Computation 1997 by Zhu, Wu and Mumford, where sampling is accomplished by Langevin dynamics, interpreted as Gibbs Reaction And Diffusion Equations, GRADE, PAMI 1998 by Zhu and Mumford.

* Scale up maximum likelihood learning of ConvNet-EBM to big datasets, CVPR 18 by Gao et al.

* Adversarial interpretation of maximum likelihood learning of ConvNet-EBM, CVPR 17 by Xie et al. EBM serves the roles of both the generator (actor) and the discriminator (critic). The MLE learning is self-critic.

* Formulate modern ConvNet-parametrized EBM as exponential tilting of a reference distribution, and connect it to discriminative ConvNet classifier, ICLR 15 by Dai et al., ICML 16 by Xie et al. EBM is a generative version of a discriminator.

* Maximum likelihood learning of generator network, including its dynamic version, using alternating back-propagation algorithm, without resorting to inference model, AAAI 17 by Han et al., AAAI 19 by Xie et al.

* Cooperative learning of EBM and generator network, where EBM (teacher model) revises examples generated by the generator network (student model), and generator network learns from EBM revision, AAAI 18, PAMI 20 by Xie et al. The generator is learned sampler of EBM, and the generator partially amortizes MCMC sampling of EBM. Learned computation can be considered temporal difference learning from internal data produced by MCMC.

* Divergence triangle that unifies variational learning and adversarial learning, CVPR 19, 20 by Han et al. Divergence triangle avoids MCMC sampling, or amortizes MCMC by learned networks. Various forms of divergence triangle explain almost all the generative learning algorithms.

* Joint training of flow-based model and EBM by noise contrastive estimation, CVPR 20 by Gao et al., to avoid MCMC and to facilitate semi-supervised learning.

* Learning generator models for motion, deformation, sparse coding, and neuroscience, AAAI 19, 2020 by Xie et al., CVPR 19, 2020 by Xing et al., IJCAI 19, Neural Computation 2020, by Han et al.

(Synthesized images)

(learned grid cells)

J Xie*, Z Zheng*, X Fang, SC Zhu, and YN Wu (2021) Learning cycle-consistent cooperative networks via alternating MCMC teaching for unsupervised cross-domain translation. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI) 2021. pdf

(Left: latent space EBM stands on generator. Right: Short-run MCMC in latent space)

The latent space EBM stands on a top-down generation network. It is like a value network or cost function defined in latent space.

The scalar-valued energy function is an objective function, a cost function, an evaluator or a critic. It is about constraints, regularities, rules, perceptual organizations, and Gestalt laws. The energy-based model is descriptive instead of generative, which is the reason we used to call it the descriptive model. It only describes what it wants without bothering with how to get it. Compared to generator model, the energy-based model is like setting up an equation, whereas the generator model is like giving the solution directly. It is much easier to set up the equation than giving the answer, i.e., it is easier to specify a scalar-valued energy function than a vector-valued generation function, the latter is like a policy network.

The energy-based model in latent space is simple and yet expressive, capturing rules or regularities implicitly but effectively. The latent space seems the right home for energy-based model.

Short-run MCMC in latent space for prior and posterior sampling is efficient and mixes well. One can amortize MCMC with learned network (see our recent work on semi-supervised learning), but in this initial paper we prefer to keep it pure and simple, without mixing in tricks from VAE and GAN.

(Left: latent EBM captures chemical rules implicitly in latent space. Right: generated molecules)

(the symbolic one-hot y is coupled with dense vector z to form an associative memory, and z is the information bottleneck between x and y)

(VAE as alternating projection)

T Han, J Zhang and YN Wu (2020) From em-projections to variational auto-encoder. Deep Learning through Information Geometry Workshop at NeurIPS 2020. pdf

(mode traversing HMC chains)

(learned V1 cells)

Y Xu, J Xie, T Zhao, C Baker, Y Zhao, and YN Wu (2020) Energy-based continuous inverse optimal control. Machine Learning for Autonomous Driving Workshop at NeurIPS 2020. pdf

(The model generates both displacement field and appearance)

(neural-symbolic learning)

(generated by the trained model)

J Xie*, R Gao*, Z Zheng, SC Zhu, and YN Wu (2020) Motion-based generator model: unsupervised disentanglement of appearance, trackable and intrackable motions in dynamic patterns. AAAI-20: 34th AAAI Conference on Artificial Intelligence. pdf project page

E Nijkamp*, M Hill*, T Han, SC Zhu, and YN Wu (2020) On the anatomy of MCMC-based maximum likelihood learning of energy-based models. (* equal contribution). AAAI-20: 34th AAAI Conference on Artificial Intelligence. pdf project page

J Xie, R Gao, E Nijkamp, SC Zhu, and YN Wu (2020) Representation learning: a statistical perspective. Annual Review of Statistics and Its Application (ARSIA). pdf

(reconstruction by short-run MCMC, yes it can reconstruct observed images)

Z Zhang*, Z Pan*, Y Ying, Z Xie, S Adhikari, J Phillips, RP Carstens, DL Black, YN Wu, and Y Xing (2019) Deep-learning augmented RNA-seq analysis of transcript splicing. Nature Methods, 16:307-10, pdf

T Han, X Xing, J Wu, and YN Wu (2019) Replicating neuroscience observations on ML/MF and AM face patches by deep generative model. Neural Computation, pdf

(learned grid cells)

J Xie, SC Zhu, and YN Wu (2019) Learning energy-based spatial-temporal generative ConvNet for dynamic patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). pdf project page

(faces generated and interpolated by the learned model)

X Xing, T Han, R Gao, SC Zhu, and YN Wu (2019) Unsupervised disentanglement of appearance and geometry by deformable generator network. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page

(videos generated by the learned model)

YN Wu, R Gao, T Han, and SC Zhu (2019) A tale of three probabilistic families: discriminative, descriptive and generative models. Quarterly of Applied Mathematics. pdf

T Han, J Wu, and YN Wu (2018) Replicating active appearance model by generator network. International Joint Conference on Artificial Intelligence (IJCAI). pdf

J Xie*, Z Zheng*, R Gao, W Wang, SC Zhu, and YN Wu (2018) Learning descriptor networks for 3D shape synthesis and analysis. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page

(learning directly from occluded images. Row 1: original images, not available to model; Row 2: training images. Row 3: learning and reconstruction. )

(left: observed; right: synthesized.)

(Langevin dynamics for sampling ConvNet-EBM)

J Xie, W Hu, SC Zhu, and YN Wu (2014) Learning sparse FRAME models for natural image patterns. International Journal of Computer Vision. pdf project page

J Dai, Y Hong, W Hu, SC Zhu, and YN Wu (2014) Unsupervised learning of dictionaries of hierarchical compositional models. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pdf project page

J Dai, YN Wu, J Zhou, and SC Zhu (2013) Cosegmentation and cosketch by unsupervised learning. Proceedings of International Conference on Computer Vision (ICCV). pdf project page

Y Hong, Z Si, WZ Hu, SC Zhu, and YN Wu (2013) Unsupervised learning of compositional sparse code for natural image representation. Quarterly of Applied Mathematics. pdf project page

YN Wu, Z Si, H Gong, SC Zhu (2010) Learning active basis model for object detection and recognition. International Journal of Computer Vision, 90, 198-235. pdf project page

Z Si, H Gong, SC Zhu, YN Wu (2010) Learning active basis models by EM-type algorithms. Statistical Science, 25, 458-475. pdf project page

YN Wu, C Guo, SC Zhu (2008) From information scaling of natural images to regimes of statistical models. Quarterly of Applied Mathematics, 66, 81-122. pdf

YN Wu, Z Si, C Fleming, and SC Zhu (2007) Deformable template as active basis. Proceedings of International Conference of Computer Vision. pdf project page

M Zheng, LO Barrera, B Ren, YN Wu (2007) ChIP-chip: data, model and analysis. Biometrics, 63, 787-796. pdf

C Guo, SC Zhu, and YN Wu (2007) Primal sketch: integrating structure and texture. Computer Vision and Image Understanding, 106, 5-19. pdf project page

C Guo, SC Zhu, and YN Wu (2003) Towards a mathematical theory of primal sketch and sketchability. Proceedings of International Conference of Computer Vision. 1228-1235. pdf project page

G Doretto, A Chiuso, YN Wu, S Soatto (2003) Dynamic textures. International Journal of Computer Vision, 51, 91-109. pdf (source code given in paper) project page

C Guo, SC Zhu, and YN Wu (2003) Modeling visual patterns by integrating descriptive and generative models. International Journal of Computer Vision, 53(1), 5-29. pdf

YN Wu, SC Zhu, X Liu (2000) Equivalence of Julesz ensembles and FRAME models. International Journal of Computer Vision, 38, 247-265. pdf project page

JS Liu, YN Wu (1999) Parameter expansion for data augmentation. Journal of the American Statistical Association, 94, 1264-1274. pdf

C Liu, DB Rubin, YN Wu (1998) Parameter expansion to accelerate EM -- the PX-EM algorithm. Biometrika, 85, 755-770. pdf

SC Zhu, YN Wu, DB Mumford (1998) Minimax entropy principle and its application to texture modeling. Neural Computation, 9, 1627-1660. pdf

SC Zhu, YN Wu, DB Mumford (1997) Filter, Random field, And Maximum Entropy (FRAME): towards a unified theory for texture modeling. International Journal of Computer Vision, 27, 107-126. pdf

YN Wu (1995) Random shuffling: a new approach to matching problem. Proceedings of American Statistical Association, 69-74. Longer version pdf