Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

Shuo Wang1,2     Jungseock Joo2     Yizhou Wang1     Song-Chun Zhu2
1Peking University        2University of California, Los Angeles

In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection of images associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the inferred parse trees. For evaluation, we propose a new outdoor scene dataset and we evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.
  • Outdoor Scene Attributes (SceneAtt) [web]: a new dataset created by the authors containing 1226 outdoor scene images and their text descriptions.
  • Pre-processed LabelMe Outdoor dataset (LMO) [download]: we proprocess the LMO[1] dataset by (i) Merging synonyms, such as grass and field, plant and tree, water and rivers, etc. (ii) Filling unlabeled regions. (iii) Ignore tiny areas of small objects, such as birds, poles and street lights, which can appear everywhere, are not considered. We provide the pre-processed the annotations in XML files.
author = {Shuo Wang and Jungseock Joo and Yizhou Wang and Song-Chun Zhu},
title = {Weakly Supervised Learning for Attribute Localization in Outdoor Scenes},
journal = {CVPR},
year = {2013},
[1] C.Liu, J.Yuen, and A.Torralba. Nonparametric scene parsing: label transfer via dense scene alignment. CVPR, 2009.