Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations:
i) The functionality is the most essential property to define an indoor object, e.g. "a chair to sit on";
ii) The geometry (3D shape) of an object is designed to serve its function.
We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometry-appearance (FGA) hierarchy. The hierarchical structure includes a scene category, functional groups, functional objects, functional parts and 3D geometric shapes.
We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes; ii) assign functional labels to these 3D primitive shapes; iii) fill in missing objects/parts according to the functional labels; and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate the proposed approach not only significantly widens the scope of indoor scene parsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.