Modeling 4D Human-Object Interactions
Overview and Demo
We present a 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization.
The 4DHOI model represents the geometric, temporal, and semantic relations in daily events involving human-object interactions. In 3D space, the interactions of human poses and contextual objects are modeled by semantic co-occurrence and geometric compatibility. On the time axis, the interactions are represented as a sequence of atomic event transitions. The 4DHOI model is a hierarchical spatial-temporal graph, whose structures and parameters are learned using an ordered expectation maximization algorithm. The inference is performed by a dynamic programming beam search algorithm which simultaneously carries out event segmentation, recognition, and object localization.
Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu. Modeling 4D human-object interactions for joint event segmentation, recognition, and object localization, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(6): 1165-1179, 2017
Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu. Modeling 4D human-object interactions for event and object recognition, IEEE International Conference on Computer Vision (ICCV), 2013:3272-3279. [PDF]
We collected a large-scale multiview RGB-D event dataset which contains 8 event categories, 11 object classes, 3,815 video sequences, and 383,036 RGB-D frames captured by three RGB-D cameras. A subset (about 8GB) of the dataset, with the same event category and viewpoint numbers but fewer sequence samples for each event category, can be downloaded from here.