iFRAME (inhomogeneous Filters Random Field And Maximum Entropy)

Experiment 5.3: Codebook Learning for Object Classification (Binary Classification)

Code and dataset

Experiment description

We evaluate the above "bag-of-word" representation extracted by a codebook of sparse FRAME templates on a binary classification task. We test it on a mixed datasets published in [1], which is a collection of 16 categories from Caltech-101 [2], all 5 categories from ETHZ Shape [3] and all 3 categories from Graz-02 [4] datasets. The task is to separate each category from a negative category. We resize all images to 150 x 150 pixels without changing their aspect ratios and convert them to grey level images. We randomly choose 30 positive and 30 negative images respectively as training data, and keep the rest as testing data. For Caltech-101 and Graz-02, negative images are chosen from background category, while for ETHZ, negative examples are chosen from images other than the target category. For each category, we learn a codebook of T = 10 sparse FRAME templates. Each template is of the size 100 x 100 and has n = 40 wavelets. We set scale S in {0.8,1,1.2} and orientation A in {-1, 0, +1} times Pi/16. Binary classification is done with linear logistic regression with regularization by L2 norm [5]. We compare our results with those obtained by SIFT [6] features and SVM classifier, where SIFT features are quantized into “words” by K-means clustering (K = 50, 100, 500) and fed into linear or kernel SVM. The best result among these six combinations (3 numbers of words x two types of SVM) is then reported. Table 2 shows the comparison results of the binary classification experiments. All experiments are repeated five times with different randomly selected training and test images, and the average accuracies and the 95% confident intervals are reported. It can be seen that our method generally outperforms the SIFT + SVM method, despite the fact that we use much smaller codebooks (10 “words” versus 50, 100, 500 “words”).

Parameters setting

General Parameters: nOrient = 16; sizeTemplatex = 100; sizeTemplatey = 100; GaborScaleList = [0.7]; DoGScaleList = []; sigsq = 10; locationShiftLimit = 2; orientShiftLimit = 1; numSketch = 40; isGlobalNormalization = true; isLocalNormalize = true; minHeightOrWidth = 150x150;
HMC Parameters: lambdaLearningRate = 0.1/sqrt(sigsq); epsilon = 0.03; L = 10; nIteraton = 40; 12x12 chains;
Codebook Parameters: flipOrNot = false; rotateShiftLimit = 1; allResolution = [0.8, 1, 1.2]; #EM iteration = 12; numCluster = 10; maxNumClusterMember = 50; LocationPerturbationFraction = 0.4;

Table 1. Accuracies (%) on binary classification tasks for 24 categories from Caltech-101, ETHZ Shape and Graz-02 data
Datasets	SIFT+SVM	Our method
Caltech-Watch	90.1 ± 1.0	89.1 ± 1.6
Caltech-Sunflower	76.0 ± 2.5	89.6 ± 3.7
Caltech-Laptop	73.5 ± 5.3	89.8 ± 2.7
Caltech-Chair	62.5 ± 5.0	82.9 ± 4.7
Caltech-Piano	84.5 ± 4.2	93.8 ± 2.6
Caltech-Lamp	61.5 ± 4.5	86.6 ± 4.3
Caltech-Ketch	82.2 ± 0.8	83.3 ± 6.5
Caltech-Dragonfly	66.0 ± 4.0	89.9 ± 5.7
Caltech-Motorbike	93.9 ± 1.2	92.2 ± 2.9
Caltech-Umbrella	73.4 ± 4.4	90.0 ± 0.7
Caltech-Guitar	70.0 ± 2.4	77.3 ± 6.3
Caltech-Cellphone	68.7 ± 5.1	95.7 ± 1.8
Caltech-Schooner	64.3 ± 2.2	87.7 ± 2.8
Caltech-Face	91.8 ± 2.3	94.4 ± 2.3
Caltech-Ibis	67.8 ± 6.0	85.3 ± 2.7
Caltech-Starfish	73.1 ± 6.7	90.0 ± 2.3
ETHZ-Bottle	68.6 ± 3.2	77.5 ± 5.6
ETHZ-Cup	66.0 ± 3.3	62.5 ± 3.0
ETHZ-Swans	64.2 ± 1.5	74.2 ± 7.5
ETHZ-Giraffes	61.5 ± 6.4	73.3 ± 4.8
ETHZ-Apple	55.0 ± 1.8	65.8 ± 6.1
Graz02-Person	70.4 ± 1.2	68.2 ± 3.8
Graz02-Car	64.0 ± 6.7	59.6 ± 5.5
Graz02-Bike	68.5 ± 2.8	71.3 ± 5.1

Reference

[1] Y. Hong, Z. Si, W. Hu, S. C. Zhu, and Y. N. Wu, Unsupervised learning of compositional sparse code for natural image representation. Quarterly of Applied Mathematics, 72, 373-406, 2013.
[2] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR Workshop, 2004.
[3] V. Ferrari, F. Jurie, and C. Schmid. From images to shape models for object detection. IJCV, 87, 284-303, 2010.
[4] M. Marszalek and C. Schmid. Accurate object localization with shape masks. CVPR, 2007.
[5] R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. LIBLINEAR: A library for large linear classiÞcation. Journal of Machine Learning Research, 9, 1871-1874, 2008.
[6] D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60, 91-110, 2004.

iFRAME (inhomogeneous Filters Random Field And Maximum Entropy)

Experiment 5.3: Codebook Learning for Object Classification (Binary Classification)

Code and dataset

Experiment description

Parameters setting

Table 1. Accuracies (%) on binary classification tasks for 24 categories from Caltech-101, ETHZ Shape and Graz-02 data

Reference