-
Notifications
You must be signed in to change notification settings - Fork 0
Understanding and Extending Readers
Starting from version 1.5, CNTK is moving away from the monolithic reader design towards a more composable model that allows you to specify and compose input data of different formats.
Before, each and every reader was responsible for different aspects of data reading, including but not limited to:
- Deserialization of the data from external storage into in-memory representation
- Randomization of the whole corpus
- Different transformations of input sequences/samples (i.e. cropping or scaling of images)
- Creation of minibatches for different modes (i.e. frame, sequence or truncated BPTT) with a layout that can be consumed by GPU
- Prefetch on the level of minibatches and IO chunks
In version 1.5, the major pieces of the above functionality were factored out and moved to core CNTK to be shared between different readers. This version also introduces two main abstractions that can be extended in order to support new data formats:
- deserializer - it is responsible for deserialization of input from external storage into in-memory sequences
- transform - it transforms an input sequence into an output sequence
In the next sections we discuss these abstractions in more detail.
Let's have a look at the following fragment of configuration for the HTKMLFReader from the end-to-end LSTM/FullUtterance test (full config here):
...
# Old reader config. For illustration only.
reader = [
readerType = "HTKMLFReader"
readMethod = "blockRandomize"
nbruttsineachrecurrentiter = 32
randomize = "auto"
verbosity = 0
features = [
dim = 363
type = "real"
scpFile = "$DataDir$/glob_0000.scp"
]
labels = [
mlfFile = "$DataDir$/glob_0000.mlf"
labelMappingFile = "$DataDir$/state.list"
labelDim = 132
labelType = "category"
]
]
This fragment of configuration declares a reader that produces two streams of data with names "features"
and "labels"
. It takes as input two types of files:
- a list of feature files known in HTK parlance as an
scp
file (“script” file) - a label file known as
mlf
file (“master label file”)
In the above configuration fragment there are no explicit entities that would define how scp
or mlf
formats are deserialized.
Everything is encapsulated in the HTKMLFReader configuration.
So if you need to expose yet another input stream of different data format together with scp
and mlf
, you would need to change HTKMLFReader and add support there.
To increase composability and reuse, the new configuration for the same input explicitly defines deserializers and the input streams that they produce:
reader = [
verbosity = 0
randomize = true
# A list of deserializers the reader uses.
deserializers = (
[
# Type of deserializer, in this case the one that knows
# how to deserialize HTK feature files.
type = "HTKFeatureDeserializer"
# Module (.dll or .so) where this deserializer is implemented
module = "HTKDeserializers"
# Description of input streams the deserializer provides,
# can be one or many, depending on a particular
# deserializer implementation
# For HTKFeatureDeserializer, just one stream can be described.
input = [
# Description of input stream to feed the Input node named "features"
features = [
dim = 363
scpFile = "$DataDir$/glob_0000.scp"
]
]
]:
[
# Type of deserializer, in this case the one
# that knows how to deserialize mlf files.
type = "HTKMLFDeserializer"
module = "HTKDeserializers"
# Description of input streams the deserializer provides,
# For HTKMLFDeserializer, just one stream can be described.
input = [
# Description of input stream to feed the Input node named "labels"
labels = [
dim = 132
mlfFile = "$DataDir$/glob_0000.mlf"
labelMappingFile = "$DataDir$/state.list"
]
]
]
)
]
The sequences produced by the mlf
and htk
deserializers are combined together based on their logical key
(which is a string that uniquely identifies a speech utterance and is present in both scp
and mlf
files).
When you need another stream of different format, you can simply add the corresponding deserializer to the configuration
(it is not possible with the HTK feature and HTK MLF deserializers right now to expose more than one input stream each).
NOTE:
Currently, both old and new reader configurations are supported.
When the "deserializers" key is used in the reader configuration, the reader type is implicitly set to "CompositeDataReader",
so please make sure that CompositeDataReader module can be loaded
(on Windows, CompositeDataReader.dll should be located in the same directory as CNTK executable;
on Linux CompositeDataReader.so should be lib
folder that sits side-by-side to the bin
folder containing the CNTK executable).
Currently CNTK supports the below deserializers:
Deserializer type | Module | Description |
---|---|---|
HTKFeatureDeserializer | HTKDeserializers | Deserializer for HTK feature files |
HTKMLFDeserializer | HTKDeserializers | Deserializer for HTK MLF files |
ImageDeserializer | ImageReader | Deserializer for images that uses OpenCV |
CNTKTextFormatDeserializer | CNTKTextFormatReader | Deserializer for CNTK text format files |
Please refer to the tables below for the full description of the configuration parameters.
A transform is a simple abstraction that take a sequence as an input, performs some transformation of samples in the sequence and returns the output sequence. Typical examples of transforms are different transformations of images such as crop, scale or transpose. Transforms can be configured on per input basis.
Let's have a look how transforms can be applied to the input (the config is taken from the Tests/EndToEndTests/Image/AlexNet test):
deserializers = ([
type = "ImageDeserializer"
module = "ImageReader"
# Map file which maps images to labels
file = "$ConfigDir$/train_map.txt"
# Description of input streams
input = [
# Description of input stream to feed the Input node named "features"
features = [
transforms = (
[
type = "Crop"
# Possible values: Center, Random. Default: Center
cropType = "Random"
# Crop scale ratio.
cropRatio = 0.875
# Crop scale ratio jitter type
jitterType = "uniRatio"
]:[
type = "Scale"
width = 224
height = 224
channels = 3
# Interpolation to use when scaling image to width x height size.
interpolations = "linear"
]:[
type = "Mean"
# Stores mean values for each pixel in OpenCV matrix XML format.
meanFile = "$ConfigDir$/ImageNet1K_mean.xml"
]:[
# Changes the image layout from HWC to CHW
type = "Transpose"
]
)
]
# Description of input stream to feed the Input node named "labels"
labels = [
labelDim = 1000
]
]
]
])
In this configuration four transforms are applied to the input stream features
.
Initially, the image data deserializer produces sequences consisting of a single image in HWC representation.
After that the ordered list of transforms is applied to the image: firstly the Crop transform, followed by Scale and Mean.
The last transformation is Transpose that changes the image layout from HWC to CHW.
Currently the following transforms are implemented. For their detailed description please see ImageReader.
Transform type | Module |
---|---|
Crop | ImageReader |
Scale | ImageReader |
Color | ImageReader |
Mean | ImageReader |
Transpose | ImageReader |
A reader configuration section to compose several data deserializers looks like follows:
reader = [
randomize = true|false
verbosity = 0|1|2
...
deserializers = (
[<deserializerConfiguration1>]:
[<deserializerConfiguration2>]:
...
[<deserializerConfigurationN>]
)
]
Each deserializer configuration is specified as:
[
module = "<readerModuleName>" # Name of the external module (.dll or .so) where this particular deserializer is implemented
type = "<deserializerType>" # The type of the deserializer
# There could be more deserializer-specific options in this section
# Date deserializer input - describes a set of streams this deserializer produces.
# It can be one (as in HTK) or many (as in CNTKTextFormat)
input = [
# Replace 'InputNameN' by the name of the corresponding input node in the network.
InputName1 = [<inputConfiguration>]
InputName2 = [<inputConfiguration>]
...
]
]
An input configuration contains input-specific options and, optionally, an ordered list of transforms that should be applied to the input:
[
# Per-input data deserializer-specific options
# Optionally a pipeline of transformations, to be implemented by data deserializer's reader module:
transforms = (
[<transformationConfiguration1>]:
[<transformationConfiguration2>]:
...
[<transformationConfigurationN>]
)
]
Transform configuration identifies the transform type and any transform-specific options:
[
type = "<transformName>"
# Transform-specific options
]
Parameter | Mandatory | Accepted values | Default value | Description |
---|---|---|---|---|
verbosity |
No | Integer |
0 , 1 , 2
|
Verbosity level, controls the diagnostic output of different components (Randomizer, Deserializer, Bundler, etc.) |
randomize |
No |
true , false
|
true |
Specifies whether the input should be randomized. Randomization method is identical to blockRandomize of the HTKMLFReader. |
randomizationWindow |
No | Integer | Size of the dataset | Specifies the randomization range (in number of samples)1 |
truncationLength |
If truncated is set |
Positive integer | Specifies the truncation length in samples for BPTT. Ignored if truncated is false . |
|
multiThreadedDeserialization |
No | Boolean | false |
Whether to use multiple-threads when getting the sequence for each minibatch from the deserializers. |
frameMode |
No | Boolean | false |
Whether data should be randomized and returned at the frame or sequence level. When set, input sequence are split into frames. If set, truncated needs to be false . |
truncated |
No | Boolean | false |
This enables truncated back-propagation through time (BPTT). If set, frameMode needs to be false . |
1 If no randomizationWindow
is specified, the randomization range is set to be equal to the size of the dataset
(i.e., the input is randomized across the whole dataset).
randomizationWindow
is ignored when randomize
is set to false
.
Parameter | Mandatory | Accepted values | Default value | Description |
---|---|---|---|---|
module |
Yes | Reader module name | Specifies the reader module (DLL / SO) implementing the data deserializer | |
type |
Yes | Deserializer name | Specifies a data deserializer exposed by the given reader module |
Parameter | Mandatory | Accepted values | Default value | Description |
---|---|---|---|---|
type |
Yes | Transform name | Specifies a transform name exposed by the reader module implementing the data deserializer |
Parameter | Mandatory | Accepted values | Default value | Description |
---|---|---|---|---|
scpFile |
Yes | Array of paths | A list of paths to SCP files to be processed. The files should be HTK compatible files and must be specified in the “archive” format. The details of using an archive are described in HTKMLF Reader. | |
dim |
Yes | Positive integer | An integer that specifies the full feature vector dimension with the desired context window.1 | |
contextWindow |
No | positive integer or pair of positive integers | 1 | A single integer is interpreted as a pair of the same integer. Specifies left and right size (first and second integer of the pair) of the context window in samples. |
prefixPathInSCP |
No | A path prefix | empty string | A prefix to apply to the paths specified within the SCP files. |
1 For example, if you had 72-dimensional features (24-dimensional filterbank features plus delta and delta-delta coefficients) and the network is designed to process a context window of 11 frames, the specified dimension should be 792.
Parameter | Mandatory | Accepted values | Default value | Description |
---|---|---|---|---|
mlfFile |
If mlfFileList is not specified |
Path | Path to an HTK-style mlf file that contains the labels for all utterances specified in the scp file(s). |
|
mlfFileList |
If mlfFile is not specified |
Array of paths | Paths to HTK-style mlf file(s) that contains the labels for all utterances specified in the scp file(s). |
|
dim |
If labelDim is not specified |
Positive integer | Total cardinality of the label set | |
labelMappingFile |
Yes | Path | Path to a file which lists all the labels seen in the mlf file, one per line. |
Same options that can be used with CNTKTextFormatReader
-
file
: a simple text file where each line contains a tab-separated mapping between logical sequence key, image file (e.g. JPEG, PNG etc) and 0-based label.
For more information please see ImageReader
You will find complete network definitions and the corresponding data set examples in the CNTK Repository. There you will also find Unit and End-to-End Tests that use deserializers, i.e.
- https://github.com/Microsoft/CNTK/tree/master/Tests/EndToEndTests/Speech/HTKDeserializers/LSTM/FullUtterance
- https://github.com/Microsoft/CNTK/tree/master/Tests/EndToEndTests/Image/AlexNet
- https://github.com/Microsoft/CNTK/blob/master/Tests/UnitTests/ReaderTests/Config/ImageAndTextReaderSimple_Config.cntk
- https://github.com/Microsoft/CNTK/blob/master/Tests/UnitTests/ReaderTests/Config/CNTKTextFormatReader/dense.cntk
Getting Started
Additional Documentation
How to use CNTK
Using CNTK Models in Your Code
- Overview
- Nuget Package for Evaluation
- C++ Evaluation Interface
- C# Evaluation Interface
- Evaluating Hidden Layers
- C# Image Transforms for Evaluation
- C# Multi-model Evaluation
- Evaluate in Azure
Advanced topics
Licenses
Source Code & Development