Ruiqi Gao 1,
Jianwen Xie 2,
Song-Chun Zhu 1,
and Ying Nian Wu 1
1 University of California, Los Angeles (UCLA), USA
2 Hikvision Research Institute, Santa Clara, USA
Simple cells in primary visual cortex (V1) can be approximated by Gabor filters, and adjacent simple cells tend to have quadrature phase relationship. This paper entertains the hypothesis that a key purpose of such simple cells is to perceive local motions, i.e., displacements of pixels, caused by the relative motions between the agent and the surrounding environment. Specifically, we propose a representational model that couples the vector representations of local image contents with the matrix representations of local pixel displacements. When the image changes from one time frame to the next due to pixel displacements, the vector at each pixel is rotated by a matrix that represents the displacement of this pixel. We show that by learning from pair of images that are deformed versions of each other, we can learn both vector and matrix representations. The units in the learned vector representations reproduce properties of V1 simple cells. The learned model enables perceptual inference of local motions.
The paper can be downloaded here.
The TensorFlow code is coming soon!
If you wish to use our code or results, please cite the following paper:
methods | FlowNetC | FlowNetS | FlowNetSD | FlowNetCS | FlowNet2 | Our model (no-mixing) | Ours model (local mixing) |
Inference error | 1.324 | 1.316 | 0.799 | 0.713 | 0.686 | 0.884 | 0.444 |