Tensorflow implementation of Recurrent Models of Visual Attention (Mnih et al. 2014), with additional research. Code based off of https://github.com/zhongwen/RAM.
Model | Error |
---|---|
FC, 2 layers (64 hiddens each) | 6.78% |
FC, 2 layers (256 hiddens each) | 2.65% |
Convolutional, 2 layers | 1.57% |
RAM, 4 glimpses, 12 x 12, 3 scale | 1.54% |
RAM, 6 glimpses, 12 x 12, 3 scale | 1.08% |
RAM, 8 glimpses, 12 x 12, 3 scale | 0.94% |
Model | Error |
---|---|
FC, 2 layers (64 hiddens each) | 29.13% |
FC, 2 layers (256 hiddens each) | 11.36% |
Convolutional, 2 layers | 8.37% |
RAM, 4 glimpses, 12 x 12, 3 scale | 5.15% |
RAM, 6 glimpses, 12 x 12, 3 scale | 3.33% |
RAM, 8 glimpses, 12 x 12, 3 scale | 2.63% |
Model | Error |
---|---|
Convolutional, 2 layers | 16.22% |
RAM, 4 glimpses, 12 x 12, 3 scale | 14.86% |
RAM, 6 glimpses, 12 x 12, 3 scale | 8.3% |
RAM, 8 glimpses, 12 x 12, 3 scale | 5.9% |
Mean output | Sampled output |
---|---|