Benjamin (Zhenyu) Yao

Center for Image and Vision Science
Department of Statistics
University of California Los Angeles

Office: 8145 Math Sciences Building, UCLA

I am currently a third year Ph.D. student in the Center for Image and Vision Sciences at the Department of Statistics, UCLA. My academic advisor is Prof. Song-Chun Zhu. My research interests include human action detection & recognition, abnormal event detection and object tracking. Before I joined UCLA, I worked on the LHI image dataset project and video surveillance project at the Lotus Hill Research Institute from 2006-2007.

I got my B.S. degree in EE from the Univsersity of Science and Technology of China (USTC), Hefei, China and my M.S. degree in EE from the Institute of Electronics, Chinese Academy of Sciences, Beijing, China.

[ Projects | Publications | Teaching | Courses | Links | CV ]


Animated Templates with HMM Model

[project page]

Learning Animated Basis Model for Action Recognition

We present an deformable action template model that is learnable from cluttered real-world videos with weak supervisions. In our generative model, an action template is a sequence of image templates each of which consists of a set of shape and motion primitives (Gabor bases and optical-flow patches) at selected orientations and locations. These primitives are allowed to slightly perturb their locations and orientations to account for spatial deformations. We use a semi-supervised learning procedure to learn from weakly labeled real-world videos... [project page]

Learning Scene Contextual Model for Anomaly Detection

We present an algorithm to learn contextual model involving multiple objects in far-field surveillance scene for tracking and anomaly detection. The algorithm pursues the most informative relationships between object trajectories following a minimax entropy principle. We demonstrate result by synthesizing entirely new cartoon sequences that reproduce the observed statistics of training videos ... [project page]

Learning Compositional Models for Object Categories

We present a method for learning a compositional model in a minimax entropy framework for modeling object categories with large intra-class variance. The model we learn incorporates the flexibility of a stochastic context free grammar (SCFG) to account for the variation in object structure with the neighborhood constraints of a MRF to enforce spatial context. We learn the model through a generalized minimax entropy framework that accounts for the dynamic structure of the hierarchical model by pursuing relations according to their frequency of occurrence. The learned model can generalize from a small set of training samples (n<100) to generate a combinatorially large number of novel instances using stochastic sampling ... [project page]

[Projects before 2008]

Tracking and Recognition in a Surveillance System

I involved in the developement of a surveillance system at the Lotus Hill research institute. We developed an algorithm to track object in outdoor surveillance scene and classify them into four types: pedestrian (red), car (green), bicycle (yellow) and other (dark). The tracking algorithm is based on the GMM foreground segmentaion and/or head-shoulder detection. The recognition algorithm is based on linear SVM classifiers using HOG features.

LHI Large-scale Grountruth Image Database

Developed a large-scale general-purpose image database with human annotated ground truth. We proposed an annotation framework to group visual knowledge of all three levels: scene level (global geometric description), object level (segmentation, sketch representation, hierarchical decomposition), and low-mid level (2.1D layered representation, object boundary attributes, curve completion, etc.). Much of this data has not appeared in previous databases. In addition, we use an And-Or Graph representation to organize visual elements and facilitate top-down labeling ... [project page]


  • Benjamin Yao, Xiong Yang, Liang Lin, Mun Wai Lee and Song-Chun Zhu I2T: Image Parsing to Text Description, Proceedings of IEEE (invited for the special issue on Internet Vision) [pdf|project page].
  • Benjamin Yao and Song-chun Zhu, Learning Deformable Action Templates from Cluttered Videos, IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, 2009 [paper|poster|project page].
  • Jake Porway, Benjamin Yao, and Song-chun Zhu, Learning compositional models for object categories from small sample sets, Book Chapter in Sven Dickinson et al (eds.) Object Categorization: Computer and Human Vision Perspectives, Cambridge University Press. 2009 [pdf|Project Page].
  • Benjamin Yao, Liang Wang and Song-chun Zhu, Learning a Scene Contextual Model for Tracking and Abnormality Detection, Proc. 3rd Int'l Workshop on Semantic Learning and Applications in Multimedia, Anchorage, Alaska, June, 2008 [pdf].
  • Benjamin Yao, Xiong Yang and Song-chun Zhu, Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks, EMMCVPR, Springer LNCS 4679, Ezhou, China, Aug 2007 [pdf|project page].


Time & location
Office hours
Stat110 Applied statistics (Instructor Nicolas Christou) Sec. 1A
Tuesday 2:00-2:50 p.m.
MS 5127
Monday 4:00-6:00 p.m.
Fall'09 Stat100A Introduction to Probability (Instructor Nicolas Christou), Sec. 3A Thursday 12:00-12:50 p.m.
Franz Hall 2258A
Thursday 3:00-5:00 p.m.