-
Notifications
You must be signed in to change notification settings - Fork 0
Image reader
Image reader can be used to read images and apply some image-specific transforms like random cropping, scaling, horizontal flipping and others. It takes as an input a simple text file where each line contains a tab-separated mapping between image file (e.g. JPEG, PNG etc) and 0-based label. For example:
/home/user/data/train/n01440764/n01440764_10026.JPEG<tab>0
/home/user/data/train/n01440764/n01440764_10027.JPEG<tab>0
As some image datasets can be pretty large (e.g. millions of images), it is sometimes beneficial to put these images into a container. Image reader currently supports one container type: ZIP files. In this case it is recommended to use STORE compression mode as it provides fastest access to the contents of the .zip file, also, images usually do not require compression anyway. For example (note the @ sign at the end of .zip file name):
/home/user/data/train.zip@/n01440764/n01440764_10026.JPEG<tab>0
/home/user/data/train.zip@/n01440764/n01440764_10027.JPEG<tab>0
/home/user/data/train_01.zip@/n01534433/n01534433_7285.JPEG<tab>0
/home/user/data/train/n01534433/n01534433_7285.JPEG<tab>0
Note that you can mix-and-match containers and files, however, it is recommended to limit the number of containers for the better performance i.e. put all images into a single (or few) .zip files.
Please also note that in Windows environment Unicode characters in file names are not supported (all characters should be standard ASCII). This limitation comes from OpenCV library that Image reader relies on.
The following example demonstrates usage of ImageReader
:
reader=[
readerType=ImageReader
file=$ConfigDir$/train_map.txt
randomize=Auto
features=[
# Below are the required parameters.
width=224
height=224
channels=3
# Below are the optional parameters.
# Possible values: Center, Random, MultiView10. Default: Center
cropType=Random
# Horizontal random flip, will be enabled by default if cropType=Random
#hflip=0
# Crop scale ratio. Examples: cropRatio=0.9, cropRatio=0.7:0.9. Default: 1.
cropRatio=0.875
# Crop scale ratio jitter type.
# Possible values: None, UniRatio, UniLength, UniArea. Default: UniRatio
jitterType=UniRatio
# Interpolation to use when scaling image to width x height size.
# Possible values: nearest, linear, cubic, lanczos. Default: linear.
interpolations=Linear
# Aspect ratio jitter radius. Default is 0 (disabled).
aspectRatioRadius=0:0.2
# Brightness, contrast and color jittering. Default is 0 (disabled).
# Using 0 in the first epoch so the network can process original images.
brightnessRadius=0:0.2
contrastRadius=0:0.2
saturationRadius=0:0.4
# Intensity jittering: enabled if file is specified and intensityStdDev > 0.
# The file stores 1x3 vector (eigenvalues) and 3x3 matrix (eigenvectors) in OpenCV XML format.
intensityFile="$ConfigDir$/ImageNet1K_intensity.xml"
# StdDev for intensity jittering. Start from the second epoch.
intensityStdDev=0:0.1
# Mean subtraction: enabled if file is specified.
# The file stores mean values for each pixel in OpenCV matrix XML format.
meanFile=$ConfigDir$/ImageNet1K_mean.xml
]
labels=[
labelDim=1000
]
]
There are very few mandatory parameters, like width
, height
and channels
and several optional parameters which configure image transformations.
cropType
specifies which part of the image is cropped and currently can be either Center
or Random
. Random
is usually used during training while Center
- during testing. Random cropping is a popular data augmentation technique used to improve generalization of the DNN. MultiView10
is usually used during testing/evaluation phase and enables 10-view image evaluation: (4(corners) + 1(center)) * 2(original + horizontal flip) = 10.
hflip
parameter specifies whether the image will be randomly flipped in horizontal direction. Horizontal flipping is another popular data augmentation technique and should be used if images exhibit vertical symmetry, for example, like many real-world objects.
cropRatio
specifies the ratio of final image dimension, e.g. width
, to the size of the random crop taken from the image. For example, the ratio 224 / 256 = 0.875 means crop of size 224 will be taken from the image rescaled to 256 (implementation detail: ImageReader
takes the crop and then rescales instead of doing the other way around).
To enable scale jitter (another popular data augmentation technique), use colon-delimited values like cropRatio=0.875:0.466
which means 224 crop will be taken from images randomly scaled to have size in [256, 480] range.
aspectRatioRadius
allows to apply aspect ratio random jitter. For example, a value of 0.2 means aspect ratio of the image can be randomly changed (uniform distribution) to be 80% to 120% of the original.
brightnessRadius
and contrastRadius
allow to apply brightness and contrast jitter. These transforms apply the following formula to every pixel of an image: Yij = alpha * Xij + beta
where alpha
is a contrast adjustment and beta
- brightness. For example, if contrastRadius
is 0.2
then alpha will be randomly chosen (uniform distribution) from 0.8
to 1.2
. brightnessRadius
randomly selects values as a proportion of the image mean, for example, brightnessRadius
equals 0.2
randomly selects values from -0.2 * mean
to 0.2 * mean
.
saturationRadius
allows to apply saturation random jitter. For example, a value of 0.3
means saturation of the image can be changed from 70% to 130% of the original.
intensityFile
and intensityStdDev
allow to apply PCA-based intensity jitter as described in this paper.
intensityFile
is a path to a file which contains eigenvalues and eigenvectors in OpenCV XML format. intensityStdDev
specifies a standard deviation of random coefficients of eigenvalues.
meanFile
parameter is a path to a file in OpenCV XML matrix format which contains per-pixel mean of the whole dataset. For example, for the input layer of size 224x224x3
the file will contain 150528 entries which will be subtracted during the training (or testing) from the input image. The same or similar effect can be achieved by using few other techniques, for example, Mean
node or even just subtracting some fixed value, like 128, from the input.
Getting Started
Additional Documentation
How to use CNTK
Using CNTK Models in Your Code
- Overview
- Nuget Package for Evaluation
- C++ Evaluation Interface
- C# Evaluation Interface
- Evaluating Hidden Layers
- C# Image Transforms for Evaluation
- C# Multi-model Evaluation
- Evaluate in Azure
Advanced topics
Licenses
Source Code & Development