iFRAME (inhomogeneous Filters Random Field And Maximum Entropy)

Experiment 5.5: Object Classification in Domain Adaption Data Sets

Code and dataset:version1 version2

Experiment setup

In real-world visual recognition, many factors (such as illumination or pose) can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. Without explicitly using domain adaption techniques, the objects representated by codewords, which are learnt from our sparse-FRAME codebook learning framwork, can be transferred to different domains automatically. In this experiment, we test the domain adaptibility of our represtation by image classification on four datasets, and compare directly to the published results [1] [2] [3] [4] [5] [6] [7]. The four dataset are: Amazon (images downloaded from online merchants), Webcam (low-resolution images by a web camera), DSLR (high-resolution images by a digital SLR camera), and Caltech-256 [8]. Each dataset is regarded as a domain. For the experiment with single source traning, 10 classes common to all four datasets are extracted: BACKPACK, TOURING-BIKE, CALCULATOR, HEAD-PHONES, COMPUTER-KEYBOARD, LAPTOP-101, COMPUTER-MONITOR, COMPUTER-MOUSE, COFFEE MUG, AND VIDEO-PROJECTOR, while as to the experiment with multiple sources training, all 31 classes in Amazon, Webcam, and DSLR are used. We use the evaluation protocol in [3] to randomly sample labeled data in the source domain as training examples, and unlabeled data in the target domain as testing examples, and then learn 3 codewords (templates) for each of the object category to construct a codebook. The codebook response extracted from each image are then fed to pyramid matching method, which equally divide an image into 1, 4, 16 areas, and concatenates the maximum codewords responses at different image areas into a feature vector. We use multi-class SVM to train image classifiers from these feature vectors, and then evaluate the classification accuracies of these classifiers on the testing domain. For each pair of source and target domains, we conduct experiments in 8 random trials and report averaged accuracies on target domains as well as standard errors. Table 1 and Table 2 show the comparison of recognition accuracies on target domains separately for single source training and multiple source training. It can be seen that our method performs significantly better than other methods on 7 out of 11 sub tasks, and on-par with the best performing method on the rest tasks.

Parameters setting

General Parameters: nOrient = 16; sizeTemplatex = 108; sizeTemplatey = 108; GaborScaleList = [0.7]; DoGScaleList = []; sigsq = 10; locationShiftLimit = 2; orientShiftLimit = 1; numSketch = 40; isGlobalNormalization = true; isLocalNormalize = true; minHeightOrWidth = 150;
HMC Parameters: lambdaLearningRate = 0.1/sqrt(sigsq); epsilon = 0.03; L = 10; nIteraton = 40; 12x12 chains;
Codebook Parameters: flipOrNot = false; rotateShiftLimit = 1; allResolution = [0.8, 1, 1.2]; #EM iteration = 12; numCluster = 3; maxNumClusterMember = 50; LocationPerturbationFraction = 0.4;

Comparison 1: Object Classification by Codebook Learnt From Single Source

Table 1. Recognition accuracies on target domains by sparse-FRAME codebook in different random trials (10 categories)
(C: Caltech, A:Amazon, W:Webcam, and D:DSLR)

Trial
C→A
C→D
A→C
A→W
W→C
W→A
D→A
D→W
1 (seed=1)
60.1293
51.1811
45.1052
56.9811
39.8902
55.4957
50.2155
76.9811
2 (seed=2)
63.3621
58.2677
44.9222
56.6038
35.4986
49.0302
55.4957
69.4340
3 (seed=3)
61.9612
48.8189
48.8564
60.0000
37.6944
43.7500
56.5733
72.8302
4 (seed=4)
61.5302
52.7559
47.1180
55.4717
39.5242
51.5086
58.1897
75.8491
5 (seed=5)
62.9310
55.1181
47.8500
57.3585
36.7795
55.1724
57.2198
70.9434
6 (seed=6)
61.9612
48.8189
44.2818
46.7925
38.7008
55.8190
59.5905
69.8113
7 (seed=7)
60.5603
54.3307
48.9478
45.6604
35.9561
55.7112
54.5259
71.6981
8 (seed=8)
61.0991
48.8189
42.0860
50.9434
43.0924
59.1595
51.8319
75.8491
9 (seed=9)
65.8405
57.4803
47.3010
48.6792
38.8838
54.6336
53.1250
73.2075
10 (seed=10)
62.3922
46.4567
50.4117
53.9623
45.1052
51.9397
56.0345
67.5472

Table 2. Recognition accuracies on target domains
(C: Caltech, A:Amazon, W:Webcam, and D:DSLR)

Method
C→A
C→D
A→C
A→W
W→C
W→A
D→A
D→W
Metric [1]
33.7 ± 0.8
35.0 ± 1.1
27.3 ± 0.7
36.0 ± 1.0
21.7 ± 0.5
32.3 ± 0.8
30.3 ± 0.8
55.6 ± 0.7
SGF [2]
40.2 ± 0.7
36.6 ± 0.8
37.7 ± 0.5
37.9 ± 0.7
29.2 ± 0.7
38.2 ± 0.6
39.2 ± 0.7
69.5 ± 0.9
GFK [3]
46.1 ± 0.6
55.0 ± 0.9
39.6 ± 0.4
56.9 ± 1.0
32.8 ± 0.7
46.2 ± 0.7
46.2 ± 0.6
80.2 ± 0.4
FDDL [4]
39.3 ± 2.9
55.0 ± 2.8
24.3 ± 2.2
50.4 ± 3.5
22.9 ± 2.6
41.1 ± 2.6
36.7 ± 2.5
65.9 ± 4.9
MMDT [5]
49.4 ± 0.8
56.5 ± 0.9
36.4 ± 0.8
64.6 ± 1.2
32.2 ± 0.8
47.7 ± 0.9
46.9 ± 1.0
74.1 ± 0.8
SDDL [6]
49.5 ± 2.6
76.7 ± 3.9
27.4 ± 2.4
72.0 ± 4.8
29.7 ± 1.9
49.4 ± 2.1
48.9 ± 3.8
72.6 ± 2.1
Our method
62.2 ± 1.6
52.2 ± 4.0
46.7 ± 2.5
53.2 ± 4.9
39.1 ± 3.0
53.2 ± 4.4
55.3 ± 2.9
72.4 ± 3.1

Comparison 2: Object Classification by Codebook Learnt From Multiple Sources

Table 3. Recognition accuracies on target domains by sparse-Frame codebook in different random trials (31 categories)

Source
Target
Trial
1 (seed=1)
2 (seed=2)
3 (seed=3)
4(seed=4)
5(seed=5)
6 (seed=6)
7(seed=7)
8(seed=8)
9(seed=9)
10 (seed=10)
DLSR, amazon
webcom
54.1311
50.9972
50.2849
52.8490
53.1339
51.5670
51.8519
53.9886
52.2792
50.5698
amazon, webcam
DSLR
50.8642
52.8395
56.5432
60.9877
52.3457
54.3210
51.8519
53.3333
53.0864
59.0123
webcam, DSLR
amazon
30.1395
33.9574
30.6902
31.2775
32.7093
30.4699
31.5345
32.4890
32.5624
35.0220

Table 4. Performance comparison on multiple sources domain adaption

Source
Target
SGF [2]
RDALR [7]
FDDL [4]
our method
DLSR, amazon
webcam
52 ± 2.5
36.9 ± 1.1
41.0 ± 2.4
52.2 ± 1.4
amazon, webcam
DSLR
39 ± 1.1
31.2 ± 1.3
38.4 ± 3.4
54.5 ± 3.3
webcam, DSLR
amazon
28 ± 0.8
20.9 ± 0.9
19.0 ± 1.2
32.1 ± 1.6

Reference

[1] Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010) Adapting visual category models to new domains. ECCV: 213-226.
[2] Gopalan, R., Li, R., & Chellappa, R. (2011) Domain adaptation for object recognition: An unsupervised approach. ICCV: 999-1006.
[3] Gong, B., Shi, Y., Sha, F. & Grauman, K. (2012) Geodesic flow kernel for unsupervised domain adaptation. CVPR: 2066-2073.
[4] Yang, M., Zhang, L., Feng, X., & Zhang, D. (2011) Fisher discrimination dictionary learning for sparse representation. ICCV: 543-550.
[5] J. Hoffman, E. Rodner, J. Donahue, K. Saenko, and T. Darrell. Efficient learning of domain-invariant image representations. ICLR, 2013.
[6] S. Shekhar, V. M. Patel, H. V. Nguyen, and R. Chellappa. Generalized domain adaptive dictionaries. CVPR, 2013.
[7] Jhou, I., Liu, D., Lee, D.T., & Chang, S. (2012) Robust visual domain adaptation with low-rank reconstruction. CVPR: 2168-2175
[8] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical report, Caltech, 2007. 5