Skip to content

Latest commit

 

History

History
45 lines (39 loc) · 2.7 KB

README.md

File metadata and controls

45 lines (39 loc) · 2.7 KB

Tensorflow implementation of Recurrent Models of Visual Attention (Mnih et al. 2014), with additional research. Code based off of https://github.com/zhongwen/RAM.

Results

60 by 60 Translated MNIST

Model Error
FC, 2 layers (64 hiddens each) 6.78%
FC, 2 layers (256 hiddens each) 2.65%
Convolutional, 2 layers 1.57%
RAM, 4 glimpses, 12 x 12, 3 scale 1.54%
RAM, 6 glimpses, 12 x 12, 3 scale 1.08%
RAM, 8 glimpses, 12 x 12, 3 scale 0.94%

60 by 60 Cluttered Translated MNIST

Model Error
FC, 2 layers (64 hiddens each) 29.13%
FC, 2 layers (256 hiddens each) 11.36%
Convolutional, 2 layers 8.37%
RAM, 4 glimpses, 12 x 12, 3 scale 5.15%
RAM, 6 glimpses, 12 x 12, 3 scale 3.33%
RAM, 8 glimpses, 12 x 12, 3 scale 2.63%

100 by 100 Cluttered Translated MNIST

Model Error
Convolutional, 2 layers 16.22%
RAM, 4 glimpses, 12 x 12, 3 scale 14.86%
RAM, 6 glimpses, 12 x 12, 3 scale 8.3%
RAM, 8 glimpses, 12 x 12, 3 scale 5.9%

60 by 60 Cluttered MNIST 6 Glimpses Examples

Solid square is first glimpse, line is path of attention, circle is last glimpse.
Mean output Sampled output
mean0 samp0
mean1 samp1
mean2 samp2
mean3 samp3
mean4 samp4