Skip to content

Latest commit

 

History

History
75 lines (49 loc) · 3.46 KB

README.rst

File metadata and controls

75 lines (49 loc) · 3.46 KB

VGG

This repo is an attempt to implement the paper

Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
ICLR 2015 (oral)

in tensorflow. The initial data.py, utils.py, logs.py is taken from AlexNet.

Dataset

Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014. paper | bibtex

Dataset info:

  • Link: ILSVRC2010
  • Training size: 1261406 images
  • Validation size: 50000 images
  • Test size: 150000 images
  • Dataset size: 124 GB

To save up time:

I got one corrupted image (n02487347_1956.JPEG). The error read: Can not identify image file '/path/to/image/n02487347_1956.JPEG n02487347_1956.JPEG. This happened when I read the image using PIL. Before using this code, please make sure you can open n02487347_1956.JPEG using PIL. If not delete the image, you won't loose anything if you delete 1 image out of 1 million.

So I trained on 1261405 images using 8 GB GPU.

How to Run

  • To train: python model.py <path-to-training-data> --train true --test false
  • To test: python model.py <path-to-training-data> --train false --test true
  • screenlog-train.0: The log file after running python model.py <path-to-training-data> --train true in screen
  • model and logs: google drive

Preprocessing

The following preprocessing steps are performed

  1. Rescaling: Isotropically rescale the image such that the smallest size is randomly drawn from [256, 512]. In short isotropically means the ratio of width to height of the original image should match with that of the new image.
  2. Cropping: Randomly crop the image from the rescaled image to get a size of (224, 224).
  3. Augmentation: Augment the data in two ways
    1. Horizontally flip the image with 50 % probability
    2. Add PCA as calculated by AlexNet to the processed image to give color shifting.
  4. Subtract mean: Finally subtract the mean activity from the processed image.

Note: To calculate eigenvalues and eigenvectors for the imagenet dataset will require significant amount of RAM. So the values are taken from stackoverflow and hardcoded while adding PCA.

Tensorflow Generated Graphs

top1 accuracy:

pictures/top1.png

top5 accuracy:

pictures/top5.png

loss:

pictures/loss.png

Accuracies

  • Top1 accuracy: 67.1013%
  • Top5 accuracy: 85.1460%