# Learning V1 cells with vector representations of local contents and matrix representations of local motions

Ruiqi Gao ^{1},
Jianwen Xie ^{2},
Song-Chun Zhu ^{1},
and Ying Nian Wu ^{1}

^{1} University of California, Los Angeles (UCLA), USA

^{2} Hikvision Research Institute, Santa Clara, USA

## Abstract

Simple cells in primary visual cortex (V1) can be approximated by Gabor filters, and adjacent simple cells tend to have quadrature phase relationship. This paper entertains the hypothesis that a key purpose of such simple cells is to perceive local motions, i.e., displacements of pixels, caused by the relative motions between the agent and the surrounding environment. Specifically, we propose a representational model that couples the vector representations of local image contents with the matrix representations of local pixel displacements. When the image changes from one time frame to the next due to pixel displacements, the vector at each pixel is rotated by a matrix that represents the displacement of this pixel. We show that by learning from pair of images that are deformed versions of each other, we can learn both vector and matrix representations. The units in the learned vector representations reproduce properties of V1 simple cells. The learned model enables perceptual inference of local motions.

## Paper

The paper can be downloaded here.

## Code

The TensorFlow code is coming soon!

If you wish to use our code or results, please cite the following paper:

**Learning V1 cells with vector representations of local contents and matrix representations of local motions**

@article{gao2019learning,

title={Learning V1 cells with vector representations of local contents and matrix representations of local motions},

author={Gao, Ruiqi and Xie, Jianwen and Zhu, Song-Chun and Wu, Ying Nian},

journal={arXiv preprint arXiv:1902.03871},

year={2019}}

## Background

## Representational Model

## Experiments

**Exp 1 **:

Learned units
**Exp 2 **:

Inference of displacement field
**Exp 3 **:

Unsupervised learning
**Exp 4 **:

Multi-step frame animation
**Exp 5 **:

Frame interpolation

### Experiment 1: Learned units

### Experiment 2: Inference of displacement field

### Experiment 3: Unsupervised learning

**Table 1.** Average distance between inferred and ground truth displacements
methods |
FlowNetC |
FlowNetS |
FlowNetSD |
FlowNetCS |
FlowNet2 |
Our model (no-mixing) |
Ours model (local mixing) |

Inference error |
1.324 |
1.316 |
0.799 |
0.713 |
0.686 |
0.884 |
**0.444** |

### Experiment 4: Multi-step frame animation