Understanding and Extending Readers

Starting from version 1.5, CNTK is moving away from the monolithic reader design towards a more composable model that allows you to specify and compose input data of different formats.

Before, each and every reader was responsible for different aspects of data reading, including but not limited to:

Deserialization of the data from external storage into in-memory representation
Randomization of the whole corpus
Different transformations of input sequences/samples (i.e. cropping or scaling of images)
Creation of minibatches for different modes (i.e. frame, sequence or truncated BPTT) with a layout that can be consumed by GPU
Prefetch on the level of minibatches and IO chunks

In version 1.5, the major pieces of the above functionality were factored out and moved to core CNTK to be shared between different readers. This version also introduces two main abstractions that can be extended in order to support new data formats:

deserializer - it is responsible for deserialization of input from external storage into in-memory sequences
transform - it transforms an input sequence into an output sequence

In the next sections we discuss these abstractions in more detail.

Deserializers

Let's have a look at the following fragment of configuration for the HTKMLFReader from the end-to-end LSTM/FullUtterance test (full config here):

...
# Old reader config. For illustration only.
reader = [
    readerType = "HTKMLFReader"
    readMethod = "blockRandomize"
    nbruttsineachrecurrentiter = 32
    randomize = "auto"
    verbosity = 0

    features = [
        dim = 363
        type = "real"
        scpFile = "$DataDir$/glob_0000.scp"
    ]

    labels = [
        mlfFile = "$DataDir$/glob_0000.mlf"
        labelMappingFile = "$DataDir$/state.list"

        labelDim = 132
        labelType = "category"
    ]
]

This fragment of configuration declares a reader that produces two streams of data with names "features" and "labels". It takes as input two types of files:

a list of feature files known in HTK parlance as an scp file (“script” file)
a label file known as mlf file (“master label file”)

In the above configuration fragment there are no explicit entities that would define how scp or mlf formats are deserialized. Everything is encapsulated in the HTKMLFReader configuration. So if you need to expose yet another input stream of different data format together with scp and mlf, you would need to change HTKMLFReader and add support there.

To increase composability and reuse, the new configuration for the same input explicitly defines deserializers and the input streams that they produce:

reader = [
    verbosity = 0
    randomize = true

    # A list of deserializers the reader uses.
    deserializers = (
        [
            # Type of deserializer, in this case the one that knows
            # how to deserialize HTK feature files.
            type = "HTKFeatureDeserializer"
            # Module (.dll or .so) where this deserializer is implemented
            module = "HTKDeserializers"

            # Description of input streams the deserializer provides,
            # can be one or many, depending on a particular
            # deserializer implementation
            # For HTKFeatureDeserializer, just one stream can be described.
            input = [
                # Description of input stream to feed the Input node named "features"
                features = [
                    dim = 363
                    scpFile = "$DataDir$/glob_0000.scp"
                ]
            ]
        ]:
        [
            # Type of deserializer, in this case the one
            # that knows how to deserialize mlf files.
            type = "HTKMLFDeserializer"
            module = "HTKDeserializers"
            # Description of input streams the deserializer provides,
            # For HTKMLFDeserializer, just one stream can be described.
            input = [
                # Description of input stream to feed the Input node named "labels"
                labels = [
                    dim = 132
                    mlfFile = "$DataDir$/glob_0000.mlf"
                    labelMappingFile = "$DataDir$/state.list"
                ]
            ]
        ]
    )
]

The sequences produced by the mlf and htk deserializers are combined together based on their logical key (which is a string that uniquely identifies a speech utterance and is present in both scp and mlf files). When you need another stream of different format, you can simply add the corresponding deserializer to the configuration (it is not possible with the HTK feature and HTK MLF deserializers right now to expose more than one input stream each).

NOTE: Currently, both old and new reader configurations are supported. When the "deserializers" key is used in the reader configuration, the reader type is implicitly set to "CompositeDataReader", so please make sure that CompositeDataReader module can be loaded (on Windows, CompositeDataReader.dll should be located in the same directory as CNTK executable; on Linux CompositeDataReader.so should be lib folder that sits side-by-side to the bin folder containing the CNTK executable).

Currently CNTK supports the below deserializers:

Deserializer type	Module	Description
HTKFeatureDeserializer	HTKDeserializers	Deserializer for HTK feature files
HTKMLFDeserializer	HTKDeserializers	Deserializer for HTK MLF files
ImageDeserializer	ImageReader	Deserializer for images that uses OpenCV
CNTKTextFormatDeserializer	CNTKTextFormatReader	Deserializer for CNTK text format files

Please refer to the tables below for the full description of the configuration parameters.

Transforms

A transform is a simple abstraction that take a sequence as an input, performs some transformation of samples in the sequence and returns the output sequence. Typical examples of transforms are different transformations of images such as crop, scale or transpose. Transforms can be configured on per input basis.

Let's have a look how transforms can be applied to the input (the config is taken from the Tests/EndToEndTests/Image/AlexNet test):

deserializers = ([
    type = "ImageDeserializer"
    module = "ImageReader"

    # Map file which maps images to labels
    file = "$ConfigDir$/train_map.txt"

    # Description of input streams
    input = [
            # Description of input stream to feed the Input node named "features"
            features = [
                transforms = (
                    [
                        type = "Crop"
                        # Possible values: Center, Random. Default: Center
                        cropType = "Random"
                        # Crop scale ratio.
                        cropRatio = 0.875
                        # Crop scale ratio jitter type
                        jitterType = "uniRatio"
                    ]:[
                        type = "Scale"
                        width = 224
                        height = 224
                        channels = 3
                        # Interpolation to use when scaling image to width x height size.
                        interpolations = "linear"
                    ]:[
                        type = "Mean"
                        # Stores mean values for each pixel in OpenCV matrix XML format.
                        meanFile = "$ConfigDir$/ImageNet1K_mean.xml"
                    ]:[
                        # Changes the image layout from HWC to CHW
                        type = "Transpose"
                    ]
                )
            ]
            # Description of input stream to feed the Input node named "labels"
            labels = [
                labelDim = 1000
            ]
        ]
    ]
])

In this configuration four transforms are applied to the input stream features. Initially, the image data deserializer produces sequences consisting of a single image in HWC representation. After that the ordered list of transforms is applied to the image: firstly the Crop transform, followed by Scale and Mean. The last transformation is Transpose that changes the image layout from HWC to CHW.

Currently the following transforms are implemented. For their detailed description please see ImageReader.

Transform type	Module
Crop	ImageReader
Scale	ImageReader
Color	ImageReader
Mean	ImageReader
Transpose	ImageReader

New Reader Configuration Format Description

A reader configuration section to compose several data deserializers looks like follows:

reader = [
    randomize = true|false
    verbosity = 0|1|2
    ...

    deserializers = (
        [<deserializerConfiguration1>]:
        [<deserializerConfiguration2>]:
        ...
        [<deserializerConfigurationN>]
    )
]

Each deserializer configuration is specified as:

[
    module = "<readerModuleName>"   # Name of the external module (.dll or .so) where this particular deserializer is implemented
    type = "<deserializerType>"     # The type of the deserializer

    # There could be more deserializer-specific options in this section

    # Date deserializer input - describes a set of streams this deserializer produces.
    # It can be one (as in HTK) or many (as in CNTKTextFormat)
    input = [
        # Replace 'InputNameN' by the name of the corresponding input node in the network.
        InputName1 = [<inputConfiguration>]
        InputName2 = [<inputConfiguration>]
        ...
    ]
]

An input configuration contains input-specific options and, optionally, an ordered list of transforms that should be applied to the input:

[
    # Per-input data deserializer-specific options

    # Optionally a pipeline of transformations, to be implemented by data deserializer's reader module:
    transforms = (
       [<transformationConfiguration1>]:
       [<transformationConfiguration2>]:
       ...
       [<transformationConfigurationN>]
    )
]

Transform configuration identifies the transform type and any transform-specific options:

[
    type = "<transformName>"
    # Transform-specific options
]

Configuration Options

General Reader Configuration

Parameter	Mandatory	Accepted values	Default value	Description
`verbosity`	No	Integer	`0`, `1`, `2`	Verbosity level, controls the diagnostic output of different components (Randomizer, Deserializer, Bundler, etc.)
`randomize`	No	`true`, `false`	`true`	Specifies whether the input should be randomized. Randomization method is identical to blockRandomize of the HTKMLFReader.
`randomizationWindow`	No	Integer	Size of the dataset	Specifies the randomization range (in number of samples)¹
`truncationLength`	If `truncated` is set	Positive integer		Specifies the truncation length in samples for BPTT. Ignored if `truncated` is `false`.
`multiThreadedDeserialization`	No	Boolean	`false`	Whether to use multiple-threads when getting the sequence for each minibatch from the deserializers.
`frameMode`	No	Boolean	`false`	Whether data should be randomized and returned at the frame or sequence level. When set, input sequence are split into frames. If set, `truncated` needs to be `false`.
`truncated`	No	Boolean	`false`	This enables truncated back-propagation through time (BPTT). If set, `frameMode` needs to be `false`.

¹ If no randomizationWindow is specified, the randomization range is set to be equal to the size of the dataset (i.e., the input is randomized across the whole dataset). randomizationWindow is ignored when randomize is set to false.

General Deserializer Configuration

Parameter	Mandatory	Accepted values	Default value	Description
`module`	Yes	Reader module name		Specifies the reader module (DLL / SO) implementing the data deserializer
`type`	Yes	Deserializer name		Specifies a data deserializer exposed by the given reader module

General Transform Configuration

Parameter	Mandatory	Accepted values	Default value	Description
`type`	Yes	Transform name		Specifies a transform name exposed by the reader module implementing the data deserializer

HTKFeatureDeserializer options

Parameter	Mandatory	Accepted values	Default value	Description
`scpFile`	Yes	Array of paths		A list of paths to SCP files to be processed. The files should be HTK compatible files and must be specified in the “archive” format. The details of using an archive are described in HTKMLF Reader.
`dim`	Yes	Positive integer		An integer that specifies the full feature vector dimension with the desired context window.¹
`contextWindow`	No	positive integer or pair of positive integers	1	A single integer is interpreted as a pair of the same integer. Specifies left and right size (first and second integer of the pair) of the context window in samples.
`prefixPathInSCP`	No	A path prefix	empty string	A prefix to apply to the paths specified within the SCP files.

¹ For example, if you had 72-dimensional features (24-dimensional filterbank features plus delta and delta-delta coefficients) and the network is designed to process a context window of 11 frames, the specified dimension should be 792.

HTKMLFDeserializer options

Parameter	Mandatory	Accepted values	Description
`mlfFile`	If `mlfFileList` is not specified	Path	Path to an HTK-style `mlf` file that contains the labels for all utterances specified in the `scp` file(s).
`mlfFileList`	If `mlfFile` is not specified	Array of paths	Paths to HTK-style `mlf` file(s) that contains the labels for all utterances specified in the `scp` file(s).
`dim`	If `labelDim` is not specified	Positive integer	Total cardinality of the label set
`labelMappingFile`	Yes	Path	Path to a file which lists all the labels seen in the `mlf` file, one per line.

CNTKTextFormatDeserializer options

Same options that can be used with CNTKTextFormatReader

ImageDeserializer options

file: a simple text file where each line contains a tab-separated mapping between logical sequence key, image file (e.g. JPEG, PNG etc) and 0-based label.

For more information please see ImageReader

Examples of Configurations and Tests

You will find complete network definitions and the corresponding data set examples in the CNTK Repository. There you will also find Unit and End-to-End Tests that use deserializers, i.e.

Getting Started

Additional Documentation

How to use CNTK

Using CNTK Models in Your Code

Advanced topics

Licenses

Source Code & Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly