iFRAME (inhomogeneous Filters Random Field And Maximum Entropy)

Experiment 5.5: Object Classification in Domain Adaption Data Sets

Code and dataset：version1 version2

Experiment setup

In real-world visual recognition, many factors (such as illumination or pose) can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. Without explicitly using domain adaption techniques, the objects representated by codewords, which are learnt from our sparse-FRAME codebook learning framwork, can be transferred to different domains automatically. In this experiment, we test the domain adaptibility of our represtation by image classification on four datasets, and compare directly to the published results [1] [2] [3] [4] [5] [6] [7]. The four dataset are: Amazon (images downloaded from online merchants), Webcam (low-resolution images by a web camera), DSLR (high-resolution images by a digital SLR camera), and Caltech-256 [8]. Each dataset is regarded as a domain. For the experiment with single source traning, 10 classes common to all four datasets are extracted: BACKPACK, TOURING-BIKE, CALCULATOR, HEAD-PHONES, COMPUTER-KEYBOARD, LAPTOP-101, COMPUTER-MONITOR, COMPUTER-MOUSE, COFFEE MUG, AND VIDEO-PROJECTOR, while as to the experiment with multiple sources training, all 31 classes in Amazon, Webcam, and DSLR are used. We use the evaluation protocol in [3] to randomly sample labeled data in the source domain as training examples, and unlabeled data in the target domain as testing examples, and then learn 3 codewords (templates) for each of the object category to construct a codebook. The codebook response extracted from each image are then fed to pyramid matching method, which equally divide an image into 1, 4, 16 areas, and concatenates the maximum codewords responses at different image areas into a feature vector. We use multi-class SVM to train image classifiers from these feature vectors, and then evaluate the classification accuracies of these classifiers on the testing domain. For each pair of source and target domains, we conduct experiments in 8 random trials and report averaged accuracies on target domains as well as standard errors. Table 1 and Table 2 show the comparison of recognition accuracies on target domains separately for single source training and multiple source training. It can be seen that our method performs significantly better than other methods on 7 out of 11 sub tasks, and on-par with the best performing method on the rest tasks.

Parameters setting

General Parameters: nOrient = 16; sizeTemplatex = 108; sizeTemplatey = 108; GaborScaleList = [0.7]; DoGScaleList = []; sigsq = 10; locationShiftLimit = 2; orientShiftLimit = 1; numSketch = 40; isGlobalNormalization = true; isLocalNormalize = true; minHeightOrWidth = 150;
HMC Parameters: lambdaLearningRate = 0.1/sqrt(sigsq); epsilon = 0.03; L = 10; nIteraton = 40; 12x12 chains;
Codebook Parameters: flipOrNot = false; rotateShiftLimit = 1; allResolution = [0.8, 1, 1.2]; #EM iteration = 12; numCluster = 3; maxNumClusterMember = 50; LocationPerturbationFraction = 0.4;

Comparison 1: Object Classification by Codebook Learnt From Single Source

Table 1. Recognition accuracies on target domains by sparse-FRAME codebook in different random trials (10 categories)
(C: Caltech, A:Amazon, W:Webcam, and D:DSLR)
Trial	C→A	C→D	A→C	A→W	W→C	W→A	D→A	D→W
1 (seed=1)	60.1293	51.1811	45.1052	56.9811	39.8902	55.4957	50.2155	76.9811
2 (seed=2)	63.3621	58.2677	44.9222	56.6038	35.4986	49.0302	55.4957	69.4340
3 (seed=3)	61.9612	48.8189	48.8564	60.0000	37.6944	43.7500	56.5733	72.8302
4 (seed=4)	61.5302	52.7559	47.1180	55.4717	39.5242	51.5086	58.1897	75.8491
5 (seed=5)	62.9310	55.1181	47.8500	57.3585	36.7795	55.1724	57.2198	70.9434
6 (seed=6)	61.9612	48.8189	44.2818	46.7925	38.7008	55.8190	59.5905	69.8113
7 (seed=7)	60.5603	54.3307	48.9478	45.6604	35.9561	55.7112	54.5259	71.6981
8 (seed=8)	61.0991	48.8189	42.0860	50.9434	43.0924	59.1595	51.8319	75.8491
9 (seed=9)	65.8405	57.4803	47.3010	48.6792	38.8838	54.6336	53.1250	73.2075
10 (seed=10)	62.3922	46.4567	50.4117	53.9623	45.1052	51.9397	56.0345	67.5472

Table 2. Recognition accuracies on target domains
(C: Caltech, A:Amazon, W:Webcam, and D:DSLR)
Method	C→A	C→D	A→C	A→W	W→C	W→A	D→A	D→W
Metric [1]	33.7 ± 0.8	35.0 ± 1.1	27.3 ± 0.7	36.0 ± 1.0	21.7 ± 0.5	32.3 ± 0.8	30.3 ± 0.8	55.6 ± 0.7
SGF [2]	40.2 ± 0.7	36.6 ± 0.8	37.7 ± 0.5	37.9 ± 0.7	29.2 ± 0.7	38.2 ± 0.6	39.2 ± 0.7	69.5 ± 0.9
GFK [3]	46.1 ± 0.6	55.0 ± 0.9	39.6 ± 0.4	56.9 ± 1.0	32.8 ± 0.7	46.2 ± 0.7	46.2 ± 0.6	80.2 ± 0.4
FDDL [4]	39.3 ± 2.9	55.0 ± 2.8	24.3 ± 2.2	50.4 ± 3.5	22.9 ± 2.6	41.1 ± 2.6	36.7 ± 2.5	65.9 ± 4.9
MMDT [5]	49.4 ± 0.8	56.5 ± 0.9	36.4 ± 0.8	64.6 ± 1.2	32.2 ± 0.8	47.7 ± 0.9	46.9 ± 1.0	74.1 ± 0.8
SDDL [6]	49.5 ± 2.6	76.7 ± 3.9	27.4 ± 2.4	72.0 ± 4.8	29.7 ± 1.9	49.4 ± 2.1	48.9 ± 3.8	72.6 ± 2.1
Our method	62.2 ± 1.6	52.2 ± 4.0	46.7 ± 2.5	53.2 ± 4.9	39.1 ± 3.0	53.2 ± 4.4	55.3 ± 2.9	72.4 ± 3.1

Comparison 2: Object Classification by Codebook Learnt From Multiple Sources

Table 3. Recognition accuracies on target domains by sparse-Frame codebook in different random trials (31 categories)
Source	Target	Trial
Source	Target	1 (seed=1)	2 (seed=2)	3 (seed=3)	4(seed=4)	5(seed=5)	6 (seed=6)	7(seed=7)	8(seed=8)	9(seed=9)	10 (seed=10)
DLSR, amazon	webcom	54.1311	50.9972	50.2849	52.8490	53.1339	51.5670	51.8519	53.9886	52.2792	50.5698
amazon, webcam	DSLR	50.8642	52.8395	56.5432	60.9877	52.3457	54.3210	51.8519	53.3333	53.0864	59.0123
webcam, DSLR	amazon	30.1395	33.9574	30.6902	31.2775	32.7093	30.4699	31.5345	32.4890	32.5624	35.0220

Table 4. Performance comparison on multiple sources domain adaption
Source	Target	SGF [2]	RDALR [7]	FDDL [4]	our method
DLSR, amazon	webcam	52 ± 2.5	36.9 ± 1.1	41.0 ± 2.4	52.2 ± 1.4
amazon, webcam	DSLR	39 ± 1.1	31.2 ± 1.3	38.4 ± 3.4	54.5 ± 3.3
webcam, DSLR	amazon	28 ± 0.8	20.9 ± 0.9	19.0 ± 1.2	32.1 ± 1.6

Reference

[1] Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010) Adapting visual category models to new domains. ECCV: 213-226.
[2] Gopalan, R., Li, R., & Chellappa, R. (2011) Domain adaptation for object recognition: An unsupervised approach. ICCV: 999-1006.
[3] Gong, B., Shi, Y., Sha, F. & Grauman, K. (2012) Geodesic ﬂow kernel for unsupervised domain adaptation. CVPR: 2066-2073.
[4] Yang, M., Zhang, L., Feng, X., & Zhang, D. (2011) Fisher discrimination dictionary learning for sparse representation. ICCV: 543-550.
[5] J. Hoffman, E. Rodner, J. Donahue, K. Saenko, and T. Darrell. Efficient learning of domain-invariant image representations. ICLR, 2013.
[6] S. Shekhar, V. M. Patel, H. V. Nguyen, and R. Chellappa. Generalized domain adaptive dictionaries. CVPR, 2013.
[7] Jhou, I., Liu, D., Lee, D.T., & Chang, S. (2012) Robust visual domain adaptation with low-rank reconstruction. CVPR: 2168-2175
[8] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical report, Caltech, 2007. 5

iFRAME (inhomogeneous Filters Random Field And Maximum Entropy)

Experiment 5.5: Object Classification in Domain Adaption Data Sets

Code and dataset：version1 version2

Experiment setup

Parameters setting

Comparison 1: Object Classification by Codebook Learnt From Single Source

Table 1. Recognition accuracies on target domains by sparse-FRAME codebook in different random trials (10 categories) (C: Caltech, A:Amazon, W:Webcam, and D:DSLR)

Table 2. Recognition accuracies on target domains (C: Caltech, A:Amazon, W:Webcam, and D:DSLR)

Comparison 2: Object Classification by Codebook Learnt From Multiple Sources

Table 3. Recognition accuracies on target domains by sparse-Frame codebook in different random trials (31 categories)

Table 4. Performance comparison on multiple sources domain adaption

Reference

Table 1. Recognition accuracies on target domains by sparse-FRAME codebook in different random trials (10 categories)
(C: Caltech, A:Amazon, W:Webcam, and D:DSLR)

Table 2. Recognition accuracies on target domains
(C: Caltech, A:Amazon, W:Webcam, and D:DSLR)