Applied and Computational Harmonic Analysis
Inducing Wavelets into Random Fields via Generative Boosting
Jianwen Xie
Yang Lu
SongChun Zhu
Ying Nian Wu
University of California, Los Angeles (UCLA), USA
Paper  Latex  Talk
Abstract 
This paper proposes a learning algorithm for the random field models whose energy functions are in the form of linear combinations of rectified filter responses from subsets of wavelets selected from a given overcomplete dictionary. The algorithm consists of the following two components. (1) We propose to induce the wavelets into the random field model by a generative version of the epsilonboosting algorithm. (2) We propose to generate the synthesized images from the random field model using the Gibbs sampling on the coefficients (or responses) of the selected wavelets. We show that the proposed learning and sampling algorithms are capable of generating realistic image patterns. We also evaluate our learning method on a dataset of clustering tasks to demonstrate that the models can be learned in unsupervised setting. The learned models encode the patterns in wavelet sparse coding. Moreover, they can be mapped to the secondlayer nodes of a sparsely connected convolutional neural network (CNN).

1. Introduction
Learning Process as Painting Process. Figure 1 illustrates the learning process, which is similar to the way an artist paints a picture by sequentially fleshing out more and more details. (a) displays the training images. (b) displays the sequence of synthesized images generated by the learned model as more and more wavelets are induced into the random field by the generative boosting process. Each wavelet is like a stroke in the painting process. The wavelets are selected from a given dictionary of oriented and elongated Gabor wavelets at a dense collection of locations, orientations and scales. The dictionary also includes isotropic Difference of Gaussian (DoG) wavelets. (c) displays the sequence of sketch templates of the learned model where each selected Gabor wavelet is illustrated by a bar with the same location, orientation and length as the corresponding wavelet, and each selected DoG wavelet is illustrated by a circle (in each sketch template, wavelets of smaller scales appear darker.) (d) displays more synthesized images independently generated from the final model.
Analysis by synthesis. We synthesize images by sampling from the current model, and then select the next wavelet that reveals the most conspicuous difference between the synthesized images and the observed images. As more wavelets are added into the model, the synthesized images become more similar to the observed images. In other words, the generative model is endowed with the ability of imagination, which enables it to adjust its parameters so that its imaginations match the observations in terms of statistical properties that the model is concerned with.


Fig 1. Learning process by generative boosting. (a) observed training images (b) a sequence of synthesized images generated by the learned model as more and more wavelets are induced into the model. (c) a sequence of sketch templates that illustrate the wavelets selected from an overcomplete dictionary. (d) more synthesized images. 

2. Past Work
Sparse FRAME and Twostage learning. The random field model based on wavelets is called the sparse FRAME model in our previous work [1]. The model is an inhomogeneous and sparsified generalization of the original FRAME (Filters, Random field, And Maximum Entropy) model [2]. In [1], we developed a twostage learning algorithm for the sparse FRAME model. The algorithm consists of the following two stages. Stage 1: selecting a subset of wavelets from a given dictionary by a shared sparse coding algorithm, such as shared matching pursuit. Stage 2: estimating the parameters of the model by maximum likelihood via stochastic gradient ascent where the MCMC sampling is powered by Hamiltonian Monte Carlo.
Shared Matching Pursuit. Figure 2 illustrates the shared matching pursuit algorithm. (a) displays the training images. (b) displays the sequence of sketch templates that illustrates the selected wavelets. (c) displays the sequences of reconstructed images for 2 of the 5 training images as more and more wavelets are selected by the shared matching pursuit. One conceptual concern with the twostage learning algorithm is that the selection of the wavelets in the first stage is not guided by the loglikelihood, but by a least squares criterion that is only an approximation to the loglikelihood. In other words, the first stage seeks to reconstruct the training images by selecting wavelets, without any regard for the patterns formed by the coefficients of the selected wavelets. 

Fig 2. Shared matching pursuit for the purpose of wavelet selection. (a) training images. (b) sequence of sketch templates that illustrate the wavelets selected sequentially in order to reconstruct all the training images simultaneously. (c) sequence of reconstructed images by the selected wavelets for all the 1st and 3rd training images in (a). 

3. Random Field Models
3.1 Dense random field
Let I be an image defined on the rectangular domain D. Let B_{x,s,α} denote a basis function such as Gabor wavelet (or difference of Gaussian (DoG) filter) centered at pixel x (a twodimensional vector) and tuned to scale s and orientation α. Given a dictionary of wavelets or filer bank {B_{x,s,α}, ∀x,s,α} (where the scale s and the orientation α are properly discretized), the dense version of the random field model is of the following form
which is a Markov random field model because B_{x,s,α} are localized. The model is also a Gibbs distribution, an exponential family model, a loglinear model, or an energybased model, as it is called in different contexts. In model (1), <I, B_{x,s,α}> is the inner product between I and B_{x,s,α}, or the filter response of I to the wavelet B_{x,s,α}. h() is a nonlinear rectification function, which can be h(r) = r, h(r) = max(0, r), or h(r) = max(0, rt) where t is a threshold. λ={λ_{x,s,α}, ∀x,s,α} are the weight parameters, Z(λ) is the normalizing constant, and q(I) is a white noise reference distribution whose pixel values follow independent N(0, σ^{2}) distributions,
where I is the L2 norm of I, and D denotes the number of pixels in the image domain D. In this paper, we fix σ^{2}=1. We normalize the training images to have marginal mean 0 and a fixed marginal variance (e.g., 10). We use h(r) = r, if λ_{x,s,α} > 0, then the model encourages large magnitude of the filter response <I, B_{x,s,α}>, regardless of its sign. This is often appropriate if we want to model object shape patterns.
3.2 Sparse random field
The dense model can be sparsified so that only a small number of are nonzero, i.e., only a small subset of wavelets is selected from the given dictionary. This leads to the following sparse model
where B=(B_{xi,si,αi},i=1,...,n) are the n wavelets selected from the given dictionary, and λ=(λ_{i}, i=1,...,n) are the corresponding weight parameters.
4. Generative Boosting for Inducing Wavelets into Random Field
4.1 Generative boosting
The literature on the application of boosting in learning generative models is relatively scarce. The seminal paper [3] proposed a pursuit method for inducing features into random fields. Our work can be considered a highly specialized example with a very different and specialized computational core. We propose to induce wavelets into random field by a generative version of the epsilonboosting algorithm [4]. For notational simplicity, we use the form of dense model in equation (1), but we assume λ={λ_{x,s,α}, ∀x,s,α} is a sparse vector, i.e., only a small number of λ_{x,s,α} are nonzero. In fitting the model in (3), the algorithm starts from λ=0. Then the generative boosting algorithm updates the weight parameters λ based on the Monte Carlo estimation of the gradient. However, instead of adjusting all the weight parameters according to the gradient, the generative boosting only chooses to update the particular weight parameter that corresponds to the maximal coordinate of the gradient. Specifically, at the tth step, we select
and update the corresponding weight parameter
where (Δx , Δα) are the perturbations of the location and orientation of the basis function B_{x,s,α}. They are introduced to account for shape deformation. {_{m}, m = 1,...,} are synthesized images drawn from p(I; λ^{(t)} ). The local max pooling is only applied to the observed images to filter out shape deformations, and we assume p(I; λ^{(t)} ) models the images prior to shape deformations. is the step size, assumed to be a sufficiently small value. Figures 3 and Figure 4 illustrate the learning process by displaying the synthesized images generated by the learned model as more and more wavelets are included.
Fig 3. Learning sequence by generative epsilonboosting (Flamingo). First row: training images. Second row: synthesized images are generated when the number of selected wavelets = 1, 40, 60, 80, 100, 200, 300, 400, 500, 600, and 700.
Fig 4. Learning sequence by generative epsilonboosting (bike). First row: training images. Second row: synthesized images are generated when the number of selected wavelets = 1, 40, 60, 80, 100, 200, 300, 500, 600, and 700.
4.2 L1 solution path
The epsilonboosting has as interesting relationship with L1 regularization in the Lasso and the basis pursuit. As pointed out by [5], under a monotonicity condition, such an algorithm approximately traces the solution path of the L1 regularized minimization problem:
where the regulariztion parameter starts from a big value so that all the components of are zero, and gradually lowers itself to allow more components to be nonzero and to be induced the model.
4.3 Correspondence to CNN
The filer bank {B_{x,s,α}, ∀x,s,α} can be mapped to the first layer of Gaborlike filters of a CNN. The max_{∆x,∆α} operations correspond to the max pooling layer. Each sparse random field model (3) can be mapped to a node in the second layer of a CNN, where each second layer node is sparsely and selectively connected to the nodes of the first layer, with λ_{i} being the connection weights. The generative boosting algorithm is used to sparsely wire the connections. − log Z(λ) can be mapped to the bias term of the secondlayer node.
5. Gibbs Sampler on Wavelets Coefficients
5.1 Wavelet sparse coding
In order to power the generative boosting algorithm, we need to sample from the currently learned model. We develop a Gibbs sampling algorithm that generates the synthesized images by exploiting the generative form of the model. The sparse model in equation (3) can be translated into a generative model where the selected wavelets serve as linear basis functions. Given the selected basis functionsB=(B_{i}=B_{xi,si,αi},i=1,...,n), we can represent the image I by
where c_{i} are the least squares reconstruction coefficients, and Ɛ is the residual image after we project I onto the subspace spanned by B. We can write I, B_{i}, and C as column vectors, so that B=(B_{i}, i=1,...,n) is a matrix. It can be shown that under the model p(I; B, λ) in (3), the distribution of C is
where
is a multivariate Gaussian distribution, i.e., under q_{C}(C), C ~ N(0, σ^{2}(B^{T}B)^{1}). The notation  •  is the operation taking elementwise absolute values.
5.2 Moving along basis vectors
For Gibbs sampling of C~q_{C}(C; λ), an equivalent but simpler way of updating the ith component is by first sampling from
and let
where d_{i} is a scalar increment of c_{i}. The change from I to I+d_{i} B_{i} only changes c_{i} while the other coefficients remained fixed. So this scheme implements the same move as the update in the Gibbs sampler.
6. Synthesis by Generative Boosting
Figure 5 displays some images generated by the sparse random field models, which are learned by the generative epsilonboosting from roughly aligned images. The proposed learning algorithm is capable of modeling a wide variety of image patterns.

Fig 5. Synthesis by generative boosting. Images generated by the random field models learned from different categories of objects. Typical size of the images are 100x100. Typical number of training images for each category is around 5. Number of the selected wavelets is 700. 

References
[1] 
Xie, J., Hu, W., Zhu, S. C., & Wu, Y. N. Learning Sparse FRAME Models for Natural Image Patterns. International Journal of Computer Vision, 114(23), 91112, 2015 
[2] 
Zhu, S. C., Wu, Y. N., & Mumford, D. Minimax entropy principle and its application to texture modeling. Neural computation, 9(8), 16271660, 1997 
[3] 
Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. Inducing features of random fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19(4):380–393, 1997 
[4] 
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.

[5] 
Saharon Rosset, Ji Zhu, and Trevor Hastie. Boosting as a regularized path to a maximum margin classifier. The Journal of Machine Learning Research, 5:941–973, 2004. 
The work is supported by NSF DMS 1310391, ONR MURI N000141010933, DARPA MSEE FA86501117149.