Skip to content

Understanding and Extending Readers

Mark Hillebrand edited this page Sep 7, 2016 · 2 revisions

Starting from version 1.5, CNTK is moving away from the monolithic reader design towards a more composable model that allows you to specify and compose input data of different formats.

Before, each and every reader was responsible for different aspects of data reading, including but not limited to:

  • Deserialization of the data from external storage into in-memory representation
  • Randomization of the whole corpus
  • Different transformations of input sequences/samples (i.e. cropping or scaling of images)
  • Creation of minibatches for different modes (i.e. frame, sequence or truncated BPTT) with a layout that can be consumed by GPU
  • Prefetch on the level of minibatches and IO chunks

In version 1.5, the major pieces of the above functionality were factored out and moved to core CNTK to be shared between different readers. This version also introduces two main abstractions that can be extended in order to support new data formats:

  • deserializer - it is responsible for deserialization of input from external storage into in-memory sequences
  • transform - it transforms an input sequence into an output sequence

In the next sections we discuss these abstractions in more detail.

Deserializers

Let's have a look at the following fragment of configuration for the HTKMLFReader from the end-to-end LSTM/FullUtterance test (full config here):

...
# Old reader config. For illustration only.
reader = [
    readerType = "HTKMLFReader"
    readMethod = "blockRandomize"
    nbruttsineachrecurrentiter = 32
    randomize = "auto"
    verbosity = 0

    features = [
        dim = 363
        type = "real"
        scpFile = "$DataDir$/glob_0000.scp"
    ]

    labels = [
        mlfFile = "$DataDir$/glob_0000.mlf"
        labelMappingFile = "$DataDir$/state.list"

        labelDim = 132
        labelType = "category"
    ]
]

This fragment of configuration declares a reader that produces two streams of data with names "features" and "labels". It takes as input two types of files:

  • a list of feature files known in HTK parlance as an scp file (“script” file)
  • a label file known as mlf file (“master label file”)

In the above configuration fragment there are no explicit entities that would define how scp or mlf formats are deserialized. Everything is encapsulated in the HTKMLFReader configuration. So if you need to expose yet another input stream of different data format together with scp and mlf, you would need to change HTKMLFReader and add support there.

To increase composability and reuse, the new configuration for the same input explicitly defines deserializers and the input streams that they produce:

reader = [
    verbosity = 0
    randomize = true

    # A list of deserializers the reader uses.
    deserializers = (
        [
            # Type of deserializer, in this case the one that knows
            # how to deserialize HTK feature files.
            type = "HTKFeatureDeserializer"
            # Module (.dll or .so) where this deserializer is implemented
            module = "HTKDeserializers"

            # Description of input streams the deserializer provides,
            # can be one or many, depending on a particular
            # deserializer implementation
            # For HTKFeatureDeserializer, just one stream can be described.
            input = [
                # Description of input stream to feed the Input node named "features"
                features = [
                    dim = 363
                    scpFile = "$DataDir$/glob_0000.scp"
                ]
            ]
        ]:
        [
            # Type of deserializer, in this case the one
            # that knows how to deserialize mlf files.
            type = "HTKMLFDeserializer"
            module = "HTKDeserializers"
            # Description of input streams the deserializer provides,
            # For HTKMLFDeserializer, just one stream can be described.
            input = [
                # Description of input stream to feed the Input node named "labels"
                labels = [
                    dim = 132
                    mlfFile = "$DataDir$/glob_0000.mlf"
                    labelMappingFile = "$DataDir$/state.list"
                ]
            ]
        ]
    )
]

The sequences produced by the mlf and htk deserializers are combined together based on their logical key (which is a string that uniquely identifies a speech utterance and is present in both scp and mlf files). When you need another stream of different format, you can simply add the corresponding deserializer to the configuration (it is not possible with the HTK feature and HTK MLF deserializers right now to expose more than one input stream each).

NOTE: Currently, both old and new reader configurations are supported. When the "deserializers" key is used in the reader configuration, the reader type is implicitly set to "CompositeDataReader", so please make sure that CompositeDataReader module can be loaded (on Windows, CompositeDataReader.dll should be located in the same directory as CNTK executable; on Linux CompositeDataReader.so should be lib folder that sits side-by-side to the bin folder containing the CNTK executable).

Currently CNTK supports the below deserializers:

Deserializer type Module Description
HTKFeatureDeserializer HTKDeserializers Deserializer for HTK feature files
HTKMLFDeserializer HTKDeserializers Deserializer for HTK MLF files
ImageDeserializer ImageReader Deserializer for images that uses OpenCV
CNTKTextFormatDeserializer CNTKTextFormatReader Deserializer for CNTK text format files

Please refer to the tables below for the full description of the configuration parameters.

Transforms

A transform is a simple abstraction that take a sequence as an input, performs some transformation of samples in the sequence and returns the output sequence. Typical examples of transforms are different transformations of images such as crop, scale or transpose. Transforms can be configured on per input basis.

Let's have a look how transforms can be applied to the input (the config is taken from the Tests/EndToEndTests/Image/AlexNet test):

deserializers = ([
    type = "ImageDeserializer"
    module = "ImageReader"

    # Map file which maps images to labels
    file = "$ConfigDir$/train_map.txt"

    # Description of input streams
    input = [
            # Description of input stream to feed the Input node named "features"
            features = [
                transforms = (
                    [
                        type = "Crop"
                        # Possible values: Center, Random. Default: Center
                        cropType = "Random"
                        # Crop scale ratio.
                        cropRatio = 0.875
                        # Crop scale ratio jitter type
                        jitterType = "uniRatio"
                    ]:[
                        type = "Scale"
                        width = 224
                        height = 224
                        channels = 3
                        # Interpolation to use when scaling image to width x height size.
                        interpolations = "linear"
                    ]:[
                        type = "Mean"
                        # Stores mean values for each pixel in OpenCV matrix XML format.
                        meanFile = "$ConfigDir$/ImageNet1K_mean.xml"
                    ]:[
                        # Changes the image layout from HWC to CHW
                        type = "Transpose"
                    ]
                )
            ]
            # Description of input stream to feed the Input node named "labels"
            labels = [
                labelDim = 1000
            ]
        ]
    ]
])

In this configuration four transforms are applied to the input stream features. Initially, the image data deserializer produces sequences consisting of a single image in HWC representation. After that the ordered list of transforms is applied to the image: firstly the Crop transform, followed by Scale and Mean. The last transformation is Transpose that changes the image layout from HWC to CHW.

Currently the following transforms are implemented. For their detailed description please see ImageReader.

Transform type Module
Crop ImageReader
Scale ImageReader
Color ImageReader
Mean ImageReader
Transpose ImageReader

New Reader Configuration Format Description

A reader configuration section to compose several data deserializers looks like follows:

reader = [
    randomize = true|false
    verbosity = 0|1|2
    ...

    deserializers = (
        [<deserializerConfiguration1>]:
        [<deserializerConfiguration2>]:
        ...
        [<deserializerConfigurationN>]
    )
]

Each deserializer configuration is specified as:

[
    module = "<readerModuleName>"   # Name of the external module (.dll or .so) where this particular deserializer is implemented
    type = "<deserializerType>"     # The type of the deserializer

    # There could be more deserializer-specific options in this section

    # Date deserializer input - describes a set of streams this deserializer produces.
    # It can be one (as in HTK) or many (as in CNTKTextFormat)
    input = [
        # Replace 'InputNameN' by the name of the corresponding input node in the network.
        InputName1 = [<inputConfiguration>]
        InputName2 = [<inputConfiguration>]
        ...
    ]
]

An input configuration contains input-specific options and, optionally, an ordered list of transforms that should be applied to the input:

[
    # Per-input data deserializer-specific options

    # Optionally a pipeline of transformations, to be implemented by data deserializer's reader module:
    transforms = (
       [<transformationConfiguration1>]:
       [<transformationConfiguration2>]:
       ...
       [<transformationConfigurationN>]
    )
]

Transform configuration identifies the transform type and any transform-specific options:

[
    type = "<transformName>"
    # Transform-specific options
]

Configuration Options

General Reader Configuration

Parameter Mandatory Accepted values Default value Description
verbosity No Integer 0, 1, 2 Verbosity level, controls the diagnostic output of different components (Randomizer, Deserializer, Bundler, etc.)
randomize No true, false true Specifies whether the input should be randomized. Randomization method is identical to blockRandomize of the HTKMLFReader.
randomizationWindow No Integer Size of the dataset Specifies the randomization range (in number of samples)1
truncationLength If truncated is set Positive integer Specifies the truncation length in samples for BPTT. Ignored if truncated is false.
multiThreadedDeserialization No Boolean false Whether to use multiple-threads when getting the sequence for each minibatch from the deserializers.
frameMode No Boolean false Whether data should be randomized and returned at the frame or sequence level. When set, input sequence are split into frames. If set, truncated needs to be false.
truncated No Boolean false This enables truncated back-propagation through time (BPTT). If set, frameMode needs to be false.

1 If no randomizationWindow is specified, the randomization range is set to be equal to the size of the dataset (i.e., the input is randomized across the whole dataset). randomizationWindow is ignored when randomize is set to false.

General Deserializer Configuration

Parameter Mandatory Accepted values Default value Description
module Yes Reader module name Specifies the reader module (DLL / SO) implementing the data deserializer
type Yes Deserializer name Specifies a data deserializer exposed by the given reader module

General Transform Configuration

Parameter Mandatory Accepted values Default value Description
type Yes Transform name Specifies a transform name exposed by the reader module implementing the data deserializer

HTKFeatureDeserializer options

Parameter Mandatory Accepted values Default value Description
scpFile Yes Array of paths A list of paths to SCP files to be processed. The files should be HTK compatible files and must be specified in the “archive” format. The details of using an archive are described in HTKMLF Reader.
dim Yes Positive integer An integer that specifies the full feature vector dimension with the desired context window.1
contextWindow No positive integer or pair of positive integers 1 A single integer is interpreted as a pair of the same integer. Specifies left and right size (first and second integer of the pair) of the context window in samples.
prefixPathInSCP No A path prefix empty string A prefix to apply to the paths specified within the SCP files.

1 For example, if you had 72-dimensional features (24-dimensional filterbank features plus delta and delta-delta coefficients) and the network is designed to process a context window of 11 frames, the specified dimension should be 792.

HTKMLFDeserializer options

Parameter Mandatory Accepted values Default value Description
mlfFile If mlfFileList is not specified Path Path to an HTK-style mlf file that contains the labels for all utterances specified in the scp file(s).
mlfFileList If mlfFile is not specified Array of paths Paths to HTK-style mlf file(s) that contains the labels for all utterances specified in the scp file(s).
dim If labelDim is not specified Positive integer Total cardinality of the label set
labelMappingFile Yes Path Path to a file which lists all the labels seen in the mlf file, one per line.

CNTKTextFormatDeserializer options

Same options that can be used with CNTKTextFormatReader

ImageDeserializer options

  • file: a simple text file where each line contains a tab-separated mapping between logical sequence key, image file (e.g. JPEG, PNG etc) and 0-based label.

For more information please see ImageReader

Examples of Configurations and Tests

You will find complete network definitions and the corresponding data set examples in the CNTK Repository. There you will also find Unit and End-to-End Tests that use deserializers, i.e.

Clone this wiki locally