• image1
  • image2
  • image3
  • image4
  • image5
  • image6

I am a postdoctoral research associate at MIT in Josh Tenenbaum's Computational Cognitive Science (cocosci) group in the Brain and Cognitive Sciences department.

I got my PhD from UCLA in Song-Chun Zhu's Vision, Cognition, Learning and Autonomous (VCLA) lab in the Department of Statistics.

I work on cognitive robots for understanding 3D physical scenes and collaborating with humans. My research interests lie in the fields of computer vision and cognitive science, A.I. and robotics, machine learning and statistical inference.


77 Massachusetts Ave.
MIT Building 46-4053
Cambridge, MA 02139

ybz at mit.edu


1. I am co-organizing two interdisciplinary workshops:
-- Siggraph Asia 2016 Virtual Reality meets Physical Reality: Modeling and Simulating Virtual Humans and Environments, December 5,2016, Macao.
-- CogSci 2016 Physical and Social Scene Understanding, July 22, Philadelphia, PA.

2. Chris Baker and I presented our work on "Intent inference for human-robot coordination" at the Toyota-CSAIL Joint Research Center, May 4 2016.

3. Our paper “Inferring Forces and Learning Human Utilities From Videos” was accepted as an Oral presentation at CVPR 2016. [ Demo ] [ Paper ] [ code ]

4. I gave an invited talk at MIT CSAIL Computer Vision Group, April 19 2016.

5. Our paper “What is Where: Inferring Containment Relations from Videos” was accepted by IJCAI 2016.

6. I gave an invited talk at MIT Media Lab, Personal Robots Group, and met brothers and sisters of JIBO, March 22 2016.

7. I gave an invited talk at Princeton Vision and Robotics Group, February 5, 2015.

8. I successfully defended my Ph.D. thesis on September 9 2016, Woo-hoo!
-- A Quest for Visual Commonsense: Scene Understanding by Functional and Physical Reasoning

9. I was co-organizing three interdisciplinary workshops:
-- CVPR 2015 Workshop on Vision Meets Cognition: Functionality, Physics, Intentionality and Causality, June 11, Boston MA.
-- CVPR 2015 Workshop on Language and Vision, June 11, Boston MA.
-- CogSci 2015 Workshop on Physical and Social Scene Understanding, July 22, Pasadena CA.

10. Our paper "Understanding Tool Use: a Task-oriented Vision Problem" to appear at CVPR 2015. [ Demo ] [ Paper ][ Abstract ]

11. I present an oral presentation of our paper "Evaluating Human Cognition of Containing Relations with Physical Simulation" at CogSci 2015. [ Paper ]

12. I was a teaching assistant for "Introduction to Statistical Models and Data Mining" (Stat101C), Spring 2015.

13. Our paper Scene Understanding by Reasoning Stability and Safety was accpeted by IJCV.

14. I was honored to receive the Outstanding Reviewer Award of ECCV 2014.

15. Our work was featured on the UCLA Statistics department home page.

16. I was a teaching assistant for "Monte Carlo Methods for Optimization" (Stat202C), Spring 2014.

17. Two papers was accepted by CVPR 2014, Columbus, Ohio. Watch the demo of my work with my collaborator Xiaobai Liu if you are interested.

18. The first CVPR Workshop on Vision Meets Cognition: Functionality, Physics, Intentionality and Causality on Jun 23, 2014 at Columbus, Ohio.

Current Projects


This project studies an algorithm to parse indoor images based on two observations:
i) The functionality is the most essential property to define an indoor object, e.g. "a chair to sit on";
ii) The geometry (3D shape) of an object is designed to serve its function.
We formulate the nature of the functionality i.e. object affordance and contextural relations into a stochastic grammar model, which characterizes a joint distribution over the function-geometry-appearance hierarchy.


This project is based on a simple observation:
-- by human design, objects in static scenes should be stable with respect to gravity and mild disturbances.
Given the input 3D point cloud, our work consists of two aspects:
i) Stability reasoning: recovering solid 3D volumetric primitives and pursuing a physically stable interpretation (parses) of the 3D scene;
ii) Safety reasoning: detecting possible potential falling objects, i.e. physically unsafe objects in the scene. We first infers hidden and situated "causes" (disturbances) of the scene, and then introduces intuitive physical mechanics to predict possible "effects" (falls) as consequences of the causes.

Stochastic Scene Grammar

This project studies a parsing algorithm for scene understanding which includes four aspects: computing 3D scene layout, detecting 3D objects (e.g. furniture), detecting 2D faces (windows, doors etc.), and segmenting background. We use a generative Stochastic Scene Grammar (SSG) to represents the compositional structures of visual entities from scene categories, 3D foreground/background, 2D faces, to 1D lines. The grammar includes three types of production rules and two types of contextual relations.