Note: Below were experiments done at the early stage of this project. See THIS PAGE for more recent experiments where allowing rotation and flipping leads to cleaner templates.


Experiment 5

Learning from non-aligned images

In experiment 5, we learn a template from training images that are not aligned.

Negative experience in Experiments 5a-c. When there are cluttered edges in the background, the detection step may fail to locate the objects. When the objects have large deformations or pose changes, the learned template may not be clean, and may fail to sketch the objects in the training images correctly. In Experiments 5b and 5c, if the objects do not occupy significant portions of the training images, our method may fail to establish correct alignment.

Negative experience in Experiment 5d. When the part-template is small relative to the whole objects, the method often fails to establish correct correspondence among the images. The above difficulty suggests that we should add constraints for more reliable learning of the parts. If the bounding boxes are given as in Experiment 1, we can restrict the ranges of movements of parts in the training images, so that in the detection step, we do not need to search over the whole images. If the bounding boxes are not given as in Experiment 5a, we can simultaneously learn multiple parts while restricting their relative positions, and this is very much like a recursion of Experiment 5a or 5c.

Experiment 5.a

In experiment 5.a, we assume that the bounding box of the object in the first image is given. This requirement is eliminated in Experiment 5.c. The subsampling rate is 1 pixel. The allowed activity in location is 3 pixels, except in the horse example, where the subsampling rate is 2 pixels, and the allowed activity in location is 6 pixels.

(1) data, codes, and readme for learning from non-aligned images (March 2009)
(1.1) Code with within-window normalization (May 2009)
(1.2) Code that pools q() from 2 large natural images (June 2009)


Experiment 5.a.1. The bounding box of the first image is given. The size of the bounding box is 136*140. The number of elements in the active basis is 60. eps 1 2 3 4 5

(1.3) Code with local normalization (August 2009)


Experiment 5.a.1. Sclae = 1, Half window size = 20, Number of element = 30. eps 1 2 3 4 5

(1.4) Code with multi-scale Gabors


Lengths of Gabor wavelets: 17, 25, 33, 39. Number of elements at the lowest scale is 60. The numbers of elements are inverse proportional to the scales. eps1 2 3 4

(2) another example (March 2009)
(2.1) Code with within-window normalization (May 2009)
(2.2) Code that pools q() from 2 large natural images (June 2009)


Experiment 5.a.2. The bounding box is 117*117. Number of elements is 60. eps 1 2 3 4 5

(2.3) Code with local normalization (August 2009)


Experiment 5.a.2. Sclae = 1, Half window size = 20, Number of element = 30. eps 1 2 3 4 5

(3) another example (March 2009)
(3.1) Code with within-window normalization (May 2009)
(3.2) Code that pools q() from 2 large natural images (June 2009)


Experiment 5.a.3. The bounding box is 103*129. Number of elements is 30. eps 1 2 3 4 5

(4) another example (March 2009)
(4.1) Code with within-window normalization (May 2009)
(4.2) Code that pools q() from 2 large natural images (June 2009)


Experiment 5.a.4. The bounding box is 129*178. Number of elements is 50. eps 1 2 3 4 5

(4.3) Code with multi-scale Gabors


Lengths of Gabor wavelets: 17, 25, 33, 39. Number of elements at the lowest scale is 50. The numbers of elements are inverse proportional to the scales. eps1 2 3 4

(5) another example (March 2009)
(5.1) Code with within-window normalization (May 2009)
(5.2) Code that pools q() from 2 large natural images (June 2009)


Experiment 5.a.5. The bounding box is 103*158. Number of elements is 60. eps 1 2 3 4 5

(6) another example (March 2009)
(6.1) Code with within-window normalization (May 2009)
(6.2) Code that pools q() from 2 large natural images (June 2009)


Experiment 5.a.6. The bounding box is 143*149. Number of elements is 50. eps 1 2 3 4 5

(6.3) Code with local normalization (August 2009)


Experiment 5.a.1. Sclae = 1, Half window size = 20, Number of element = 40. eps 1 2 3 4 5

Experiment 5.b

In Experiment 5.b, we learn a template from the first image (on the left) with no activity and without any given bounding box. Then we restore the activity and deform the learned template to sketch the second image (on the right). We scan the template over 7 resolutions of the second image, from .7 to 1.3 times the size of the original image. This aligns the first image to the second image. In this experiment, we use active correlation for learning and recognition.

(1) data, codes, and readme for image alignment

Experiment 5.a.1. The number of elements is 50. eps1 eps2


Experiment 5.a.2. The number of elements is 50. eps1 eps2

(2) another example

Experiment 5.a.3. The number of elements is 50. eps1 eps2


Experiment 5.a.4. The number of elements is 100. eps1 eps2

(3) another example

Experiment 5.a.5. The number of elements is 80. eps1 eps2

Experiment 5.c

In experiment 5.c, we do not assume that the bounding box of the object in the first image is given. We simply start from the template learned from the whole image of the first example.

data, codes, and readme for learning from non-aligned images (March 2009)

Experiment 5.c.1. The first image is the starting template. The second image is the learned template. The number of elements in the active basis is 40. Scale of Gabors is 1. eps0 eps 1 2 3 4 5
Code with window normalization with q() pooled from 2 large natural images (June 2009)


Experiment 5.c.2. The first image is the starting template. The second image is the learned template. The number of elements in the active basis is 50. Allowed displacement in location is up to 4 pixels. eps0 eps 1 2 3 4 5
Code with window normalization with q() pooled from 2 large natural images (June 2009)


Experiment 5.c.3. The first image is the starting template. The second image is the learned template. The number of elements in the active basis is 30. The allowed activity in location is 3. The sub-sampling rate is 1. eps0 eps 1 2 3 4 5
Code with window normalization with q() pooled from 2 large natural images (June 2009)


Experiment 5.c.4. The first image is the starting template. The second image is the learned template. The number of elements in the active basis is 50. eps0 eps 1 2 3 4 5
Code with window normalization with q() pooled from 2 large natural images (June 2009)


Experiment 5.c.5. The first image is the starting template. The second image is the learned template. The number of elements in the active basis is 50. eps0 eps 1 2 3 4 5
Code with window normalization with q() pooled from 2 large natural images (June 2009)


Experiment 5.c.6. The first image is the starting template. The second image is the learned template. The numberof elements in the active basis is 60. eps0 eps 1 2 3 4 5
Code with window normalization with q() pooled from 2 large natural images (June 2009)

Code with local normalization (August 2009)


Experiment 5.c.7. Sclae = 1, Half window size = 20, Number of element = 50. eps 1 2 3 4 5 6 7

another example

Experiment 5.c.8. The first plot is the starting template, learned from the first image with no activity, and with no given bounding box. The remaining plots display the learned templates in the first 10 iterations. The images are rescaled to .5 times the original sizes. The number of elements in the active basis is 60. eps0 1 2 3 4 5 6 7 8 9 10


Experiment 5.c.8. We use the first 20 images of the Weizmann horse images. The above plots display the superposed deformed templates in the last iteration. The Weizmann horse images have big deformations. So it is more reasonable to learn part templates first and then compose them into a recursive active basis. We do not claim that a single layer active basis can model articulate objects. eps1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Experiment 5.c.8. Testing results on the remaining 308 images.


another example

Experiment 5.c.8. We double the scale of the Gabor filters. The number of elements is 20. Learning takes 3 minutes. eps0 1 2 3 4 5 6 7 8 9 10


eps1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Code with local normalization

Experiment 5.c.8. Scale = 1. Half window size for local normalization = 20. The first plot is the starting template, learned from the first image with no activity, and with no given bounding box. The remaining plots display the learned templates in the first 10 iterations. The images are rescaled to .5 times the original sizes. The number of elements in the active basis is 40. The detection step searchs over 15 resolutions: .3 to 1.7 times the input images. eps0 1 2 3 4 5 6 7 8 9 10


eps1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Code with local normalization at a smaller scale

another example

Experiment 5.c.9. The number of elements is 50. eps0 1 2 3 4 5


eps1 2 3 4 5 6 7 8 9 10 11 12 13

Code with local normalization

Experiment 5.c.9. Scale = 1. Half window size for local normalization = 20. Number of elements is 30. 10


eps1 2 3 4 5 6 7 8 9 10 11 12 13

another example

The number of elements is 60. eps0 1 2 3 4 5


eps1 2 3 4 5 6 7 8 9 10 11

Code with local normalization

Scale = 1. Half window size for local normalization = 20. 10


eps1 2 3 4 5 6 7 8 9 10 11

another example

The number of elements is 50. eps0 1 2 3


eps1 2 3 4 5 6 7

Code with local normalization

Scale = 1. Half window size for local normalization = 20. eps0 10


eps1 2 3 4 5 6 7

Code with local normalization

Scale = 1. Half window size for local normalization = 20. 5


eps1 2 3 4 5 6 7 8 9

Code with local normalization

Scale = 1. Half window size for local normalization = 20. 5


eps1 2 3 4 5 6 7 8 9 10 11 12 13 14

another example (March 2009)
Code with within-window normalization (May 2009)
Code that pools q() from 2 large natural images (June 2009)


The number of elements is 60. 5


eps1 2 3 4 5 6 7 8 9

Code with multi-scale Gabors


Lengths of Gabor wavelets: 17, 25, 33. Number of elements at the lowest scale is 60. The numbers of elements are inverse proportional to the scales. eps1 2 3

Code with local normalization

Scale = 1. Half window size for local normalization = 20. 10


eps1 2 3 4 5 6 7 8 9

Lion

Scale = 1. Half window size for local normalization = 20. 5


eps1 2 3 4 5 6 7 8 9 10 11 12 13

Tiger

The number of elements is 50. Scale = 1. Local normalization within 41x41 window. 5


1 2 3 4 5 6 7

Experiment 5.d

In experiment 5.d, we learn visual words using essentially the same code as in experiment 5.a. We start from a large number of bounding boxes cropped from the training images, and learn the templates. Then we select the first K templates with the highest alignment scores. We did not perform spatial inhibition. The size of the bounding box is 100*100. The number of elements is 40. The allowed activity in location is up to 3 pixels. The allowed activity in orientation is up to pi/15. We learn the visual words at two resolutions. The second one doubles the first one.

data, codes, and readme for learning visual words







double the resolution





Back to active basis homepage