Modeling Concurrent Actions

Ping Wei, Nanning Zheng, Yibiao Zhao, and Song-Chun Zhu


Action recognition has often been posed as a classification problem, which assumes that a video sequence only has one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem.


In this project, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.



The Concurrent Action Dataset includes 12 action classes and 61 long sequences. Each sequence includes multiple concurrent actions. All the sequences were manually labeled. It can be downloaded from here.