Note: Below were experiments done at the early stage of this project.
See THIS PAGE
for more recent experiments where allowing rotation and flipping leads to cleaner templates.
Experiment 5
Learning from non-aligned images
In experiment 5, we learn a template from training images that are not aligned.
Negative experience in Experiments 5a-c. When there are cluttered edges in the
background, the detection step may fail to locate the objects. When the objects
have large deformations or pose changes, the learned template may not be clean,
and may fail to sketch the objects in the training images correctly. In Experiments
5b and 5c, if the objects do not occupy significant portions of the training images,
our method may fail to establish correct alignment.
Negative experience in Experiment 5d. When the part-template is small relative to
the whole objects, the method often fails to establish correct correspondence among
the images. The above difficulty suggests that we should add constraints for more
reliable learning of the parts. If the bounding boxes are given as in Experiment 1,
we can restrict the ranges of movements of parts in the training images, so that in
the detection step, we do not need to search over the whole images. If the bounding
boxes are not given as in Experiment 5a, we can simultaneously learn multiple parts
while restricting their relative positions, and this is very much like a recursion
of Experiment 5a or 5c.
Experiment 5.a
In experiment 5.a, we assume that the bounding box of the object in the first image is
given. This requirement is eliminated in Experiment 5.c. The subsampling rate is 1 pixel.
The allowed activity in location is 3 pixels, except in the horse example, where the
subsampling rate is 2 pixels, and the allowed activity in location is 6 pixels.
(1)
data, codes, and readme for learning from non-aligned images (March 2009)
(1.1)
Code with within-window normalization (May 2009)
(1.2)
Code that pools q() from 2 large natural images (June 2009)
Experiment 5.a.1. The bounding box of the first image is
given. The size of the bounding box is 136*140. The number
of elements in the active basis is 60.
eps
1
2
3
4
5
(1.3)
Code with local normalization (August 2009)
Experiment 5.a.1. Sclae = 1, Half window size = 20, Number of element = 30.
eps
1
2
3
4
5
(1.4)
Code with multi-scale Gabors
Lengths of Gabor wavelets: 17, 25, 33, 39. Number of elements at the lowest
scale is 60. The numbers of elements are inverse proportional to the scales.
eps1
2
3
4
(2) another example (March 2009)
(2.1)
Code with within-window normalization (May 2009)
(2.2)
Code that pools q() from 2 large natural images (June 2009)
Experiment 5.a.2. The bounding box is 117*117. Number of elements is 60.
eps
1
2
3
4
5
(2.3)
Code with local normalization (August 2009)
Experiment 5.a.2. Sclae = 1, Half window size = 20, Number of element = 30.
eps
1
2
3
4
5
(3) another example (March 2009)
(3.1)
Code with within-window normalization (May 2009)
(3.2)
Code that pools q() from 2 large natural images (June 2009)
Experiment 5.a.3.
The bounding box is 103*129. Number of elements is 30.
eps
1
2
3
4
5
(4) another example (March 2009)
(4.1)
Code with within-window normalization (May 2009)
(4.2)
Code that pools q() from 2 large natural images (June 2009)
Experiment 5.a.4.
The bounding box is 129*178. Number of elements is 50.
eps
1
2
3
4
5
(4.3)
Code with multi-scale Gabors
Lengths of Gabor wavelets: 17, 25, 33, 39. Number of elements at the lowest
scale is 50. The numbers of elements are inverse proportional to the scales.
eps1
2
3
4
(5) another example (March 2009)
(5.1)
Code with within-window normalization (May 2009)
(5.2)
Code that pools q() from 2 large natural images (June 2009)
Experiment 5.a.5.
The bounding box is 103*158. Number of elements is 60.
eps
1
2
3
4
5
(6) another example (March 2009)
(6.1)
Code with within-window normalization (May 2009)
(6.2)
Code that pools q() from 2 large natural images (June 2009)
Experiment 5.a.6. The bounding box is 143*149.
Number of elements is 50.
eps
1
2
3
4
5
(6.3)
Code with local normalization (August 2009)
Experiment 5.a.1. Sclae = 1, Half window size = 20, Number of element = 40.
eps
1
2
3
4
5
Experiment 5.b
In Experiment 5.b, we learn a template from the first image (on the left) with no activity and
without any given bounding box. Then we restore the activity and deform the learned template to sketch
the second image (on the right). We scan the template over 7 resolutions of the second image, from .7
to 1.3 times the size of the original image. This aligns the first image to the second image. In this
experiment, we use active correlation for learning and recognition.
(1)
data, codes, and readme for image alignment
Experiment 5.a.1. The number of elements is 50.
eps1
eps2
Experiment 5.a.2. The number of elements is 50.
eps1
eps2
(2)
another example
Experiment 5.a.3. The number of elements is 50.
eps1
eps2
Experiment 5.a.4. The number of elements is 100.
eps1
eps2
(3)
another example
Experiment 5.a.5. The number of elements is 80.
eps1
eps2
Experiment 5.c
In experiment 5.c, we do not assume that the bounding box of the object in the
first image is given. We simply start from the template learned from the
whole image of the first example.
data, codes, and readme for learning from non-aligned images (March 2009)
Experiment 5.c.1. The first image is the starting template. The second image
is the learned template. The number of elements in the active basis is 40. Scale
of Gabors is 1.
eps0
eps
1
2
3
4
5
Code with window normalization with q() pooled from 2 large natural images (June 2009)
Experiment 5.c.2. The first image is the starting template. The second image
is the learned template. The number of elements in the active basis is 50.
Allowed displacement in location is up to 4 pixels.
eps0
eps
1
2
3
4
5
Code with window normalization with q() pooled from 2 large natural images (June 2009)
Experiment 5.c.3. The first image is the starting template. The second image
is the learned template. The number of elements in the active basis is 30.
The allowed activity in location is 3. The sub-sampling rate is 1.
eps0
eps
1
2
3
4
5
Code with window normalization with q() pooled from 2 large natural images (June 2009)
Experiment 5.c.4. The first image is the starting template. The second image
is the learned template. The number of elements in the active basis is 50.
eps0
eps
1
2
3
4
5
Code with window normalization with q() pooled from 2 large natural images (June 2009)
Experiment 5.c.5. The first image is the starting template. The second image
is the learned template. The number of elements in the active basis is 50.
eps0
eps
1
2
3
4
5
Code with window normalization with q() pooled from 2 large natural images (June 2009)
Experiment 5.c.6. The first image is the starting template. The second image
is the learned template. The numberof elements in the active basis is 60.
eps0
eps
1
2
3
4
5
Code with window normalization with q() pooled from 2 large natural images (June 2009)
Code with local normalization (August 2009)
Experiment 5.c.7. Sclae = 1, Half window size = 20, Number of element = 50.
eps
1
2
3
4
5
6
7
another example
Experiment 5.c.8. The first plot is the starting template, learned from the
first image with no activity, and with no given bounding box. The remaining plots
display the learned templates in the first 10 iterations. The images are rescaled
to .5 times the original sizes. The number of elements in the active basis is 60.
eps0
1
2
3
4
5
6
7
8
9
10
Experiment 5.c.8. We use the first 20 images of the Weizmann horse images. The
above plots display the superposed deformed templates in the last iteration.
The Weizmann horse images have big deformations. So it is more reasonable to learn
part templates first and then compose them into a recursive active basis. We do
not claim that a single layer active basis can model articulate objects.
eps1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Experiment 5.c.8. Testing results on the
remaining 308 images.
another example
Experiment 5.c.8. We double the scale of the Gabor filters. The number of elements is 20.
Learning takes 3 minutes.
eps0
1
2
3
4
5
6
7
8
9
10
eps1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Code with local normalization
Experiment 5.c.8. Scale = 1. Half window size for local normalization = 20.
The first plot is the starting template, learned from the
first image with no activity, and with no given bounding box. The remaining plots
display the learned templates in the first 10 iterations. The images are rescaled
to .5 times the original sizes. The number of elements in the active basis is 40.
The detection step searchs over 15 resolutions: .3 to 1.7 times the input images.
eps0
1
2
3
4
5
6
7
8
9
10
eps1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Code with local normalization at a smaller scale
another example
Experiment 5.c.9. The number of elements is 50.
eps0
1
2
3
4
5
eps1
2
3
4
5
6
7
8
9
10
11
12
13
Code with local normalization
Experiment 5.c.9. Scale = 1. Half window size for local normalization = 20.
Number of elements is 30.
10
eps1
2
3
4
5
6
7
8
9
10
11
12
13
another example
The number of elements is 60.
eps0
1
2
3
4
5
eps1
2
3
4
5
6
7
8
9
10
11
Code with local normalization
Scale = 1. Half window size for local normalization = 20.
10
eps1
2
3
4
5
6
7
8
9
10
11
another example
The number of elements is 50.
eps0
1
2
3
eps1
2
3
4
5
6
7
Code with local normalization
Scale = 1. Half window size for local normalization = 20.
eps0
10
eps1
2
3
4
5
6
7
Code with local normalization
Scale = 1. Half window size for local normalization = 20.
5
eps1
2
3
4
5
6
7
8
9
Code with local normalization
Scale = 1. Half window size for local normalization = 20.
5
eps1
2
3
4
5
6
7
8
9
10
11
12
13
14
another example (March 2009)
Code with within-window normalization (May 2009)
Code that pools q() from 2 large natural images (June 2009)
The number of elements is 60.
5
eps1
2
3
4
5
6
7
8
9
Code with multi-scale Gabors
Lengths of Gabor wavelets: 17, 25, 33. Number of elements at the lowest
scale is 60. The numbers of elements are inverse proportional to the scales.
eps1
2
3
Code with local normalization
Scale = 1. Half window size for local normalization = 20.
10
eps1
2
3
4
5
6
7
8
9
Lion
Scale = 1. Half window size for local normalization = 20.
5
eps1
2
3
4
5
6
7
8
9
10
11
12
13
Tiger
The number of elements is 50. Scale = 1.
Local normalization within 41x41 window.
5
1
2
3
4
5
6
7
Experiment 5.d
In experiment 5.d, we learn visual words using essentially the same code
as in experiment 5.a. We start from a large number of bounding boxes cropped
from the training images, and learn the templates. Then we select the first
K templates with the highest alignment scores. We did not perform spatial
inhibition. The size of the bounding box is 100*100. The number of elements
is 40. The allowed activity in location is up to 3 pixels. The allowed activity
in orientation is up to pi/15. We learn the visual words at two resolutions.
The second one doubles the first one.
data, codes, and readme for learning visual words
double the resolution
Back to active basis homepage