This project studies an algorithm to parse indoor images based on two observations:
i) The functionality is the most essential property to define an indoor object, e.g. "a chair to sit on";
ii) The geometry (3D shape) of an object is designed to serve its function.
We formulate the nature of the functionality i.e. object affordance and contextural relations into a stochastic grammar model, which characterizes a joint distribution over the function-geometry-appearance hierarchy.


This project is based on a simple observation:
-- by human design, objects in static scenes should be stable with respect to gravity and mild disturbances.
Given the input 3D point cloud, our work consists of two aspects:
i) Stability reasoning: recovering solid 3D volumetric primitives and pursuing a physically stable interpretation (parses) of the 3D scene;
ii) Safety reasoning: detecting possible potential falling objects, i.e. physically unsafe objects in the scene. We first infers hidden and situated "causes" (disturbances) of the scene, and then introduces intuitive physical mechanics to predict possible "effects" (falls) as consequences of the causes.

Stochastic Scene Grammar

This project studies a parsing algorithm for scene understanding which includes four aspects: computing 3D scene layout, detecting 3D objects (e.g. furniture), detecting 2D faces (windows, doors etc.), and segmenting background. We use a generative Stochastic Scene Grammar (SSG) to represents the compositional structures of visual entities from scene categories, 3D foreground/background, 2D faces, to 1D lines. The grammar includes three types of production rules and two types of contextual relations.

Interactive Image Segmentation

This project studies an interactive image segmentation framework which is ultra-fast and accurate. Our framework, termed "CO3", consists of three components: coupled representation, conditional model and convex inference.


PhD committee:

Song-Chun Zhu, Yingnian Wu, Demetri Terzopoulos
Hongjing Lu, Keith Holyoak

MIT, Cogsci:

Dr. Peter Battaglia, Dr. Tao Gao, Prof. Josh Tenenbaum

U of Tokyo, 3D Vision:

Prof. Bo Zheng, Prof. Ikeuchi Kasushi


Dr. Xiaobai Liu, Dr. Wei Liang, Dr. Ping Wei

PhD students: Yixin Zhu, Joey C. Yu

MS students: Siyuan Qi, Michael C. Wang, Nawin Waree

UG students: Steven Holzen, Sam Freitas

Professional Activities

Workshop Chair / Organizer:

CVPR Workshop on Vision Meets Cognition (FPIC2014)
CVPR Workshop on Vision Meets Cognition (FPIC2015)
CVPR Workshop on Language and Vision (2015)
CogSci Workshop on Physical and Social Scene Understanding (2015)

Program Committee / Reviewer:

International Journal of Computer Vision (IJCV); Transactions on Image Processing (TIP);
Pattern Recognition Letters (PRL); Neural Computing and Applications (NCA)

The Conference of CVPR, ICCV, ECCV, ICRA, ICME, CogSci, EAPCogSci, 3DV.