Skip to content
Frank Seide edited this page Jul 27, 2016 · 28 revisions

Convolution() computes the convolution of a weight matrix with an image. There is a simplified syntax for 2D convolutions and more advanced syntax for N-dimensional convolutions. The 2D convolution syntax is:

Convolution(w, image, 
            kernelWidth, kernelHeight,
            horizontalStride, verticalStride,
            zeroPadding=false, maxTempMemSizeInSamples=0, imageLayout="cudnn" /* or "HWC"*/ )

where:

  • w - convolution weight matrix, it has the dimensions of [mapCount, kernelWidth * kernelHeight * inputChannels].
  • image - the input image.
  • mapCount - depth of output feature map (number of output channels)
  • kernelWidth - width of the kernel
  • kernelHeight - height of the kernel
  • horizontalStride - stride in horizontal direction
  • verticalStride - stride in vertical direction
  • zeroPadding - [named optional] specifies whether the sides of the image should be padded with zeros. Default is false.
  • maxTempMemSizeInSamples - [named optional] maximum amount of auxiliary memory (in samples) that should be reserved to perform convolution operations. Some convolution engines (e.g. cuDNN and GEMM-based engines) can benefit from using workspace as it may improve performance. However, sometimes this may lead to higher memory utilization. Default is 0 which means the same as the input samples.
  • imageLayout - [named optional] the storage format of each image. By default it’s HWC, which means each image is stored as [channel, width, height] in column major. If you use cuDNN to speed up training, you should set it to cudnn, which means each image is stored as [width, height, channel]. Note that cudnn layout will work both on GPU and CPU so it is recommended to use it by default.

Example (ConvReLULayer NDL macro):

ConvReLULayer(inp, outMap, inWCount, kW, kH, hStride, vStride, wScale, bValue) =
[
    W = LearnableParameter (outMap, inWCount, init="gaussian", initValueScale=wScale)
    b = ImageParameter (1, 1, outMap, init="fixedValue", value=bValue, imageLayout="$imageLayout$")
    c = Convolution (W, inp, kW, kH, outMap, hStride, vStride, zeroPadding=true, imageLayout="$imageLayout$")
    y = RectifiedLinear (c + b)
].y

Note: If you are using the deprecated NDLNetworkBuilder, the optional imageLayout parameter defaults to "HWC" instead; and there should be no trailing .y in the example.

N-dimensional Convolution

N-dimensional convolution allows to create convolutions of any dimensions, stride, sharing or padding. The syntax is:

Convolution(w, input,
            {kernel dimensions}, 
            mapCount = {map dimensions}, 
            stride = {stride dimensions}, 
            sharing = {sharing flags},
            autoPadding = {padding flags (boolean)},
            lowerPad = {lower padding (int)},
            upperPad = {upper padding (int)},
            maxTempMemSizeInSamples = 0,
            imageLayout = "cudnn")

Where:

  • w - convolution weight matrix, it has the dimensions of [mapCount, kernelDimensionsProduct].
  • input - convolution input
  • kernel dimensions - dimensions of the kernel
  • mapCount - [named, optional, default is 0] depth of feature map. 0 means use the row dimension of w
  • stride - [named, optional, default is 1] stride dimensions
  • sharing - [named, optional, default is true] sharing flags for each input dimension
  • autoPadding - [named, optional, default is true] automatic padding flags for each input dimension
  • lowerPad - [named, optional, default is 0] precise lower padding for each input dimension
  • upperPad - [named, optional, default is 0] precise upper padding for each input dimension
  • maxTempMemSizeInSamples - [named optional] maximum amount of auxiliary memory (in samples) that should be reserved to perform convolution operations. Some convolution engines (e.g. cuDNN and GEMM-based engines) can benefit from using workspace as it may improve performance. However, sometimes this may lead to higher memory utilization. Default is 0 which means the same as the input samples.
  • imageLayout - [named optional] the storage format of each image. The only supported value is cudnn, which means each image is stored as [width, height, channel].

All dimensions arrays are colon-separated. Note: If you use the deprecated NDLNetworkBuilder, these must be comma-separated and enclosed in { } instead.

Clone this wiki locally