Skip to content
Frank Seide edited this page Jul 27, 2016 · 28 revisions

Convolution() computes the convolution of a weight matrix with an image or tensor. This operation is used in image-processing applications and language processing.

Convolution() supports any dimensions, stride, sharing or padding. The syntax is:

Convolution(w, input,
            {kernel dimensions}, 
            mapCount = {map dimensions}, 
            stride = {stride dimensions}, 
            sharing = {sharing flags},
            autoPadding = {padding flags (boolean)},
            lowerPad = {lower padding (int)},
            upperPad = {upper padding (int)},
            maxTempMemSizeInSamples = 0,
            imageLayout = "cudnn")

Where:

  • w - convolution weight matrix, it has the dimensions of [mapCount, kernelDimensionsProduct], where kernelDimensionsProduct must be the product of the kernel dimensions (e.g. 75 for kernel dimensions (5:5:3)).
  • input - convolution input
  • kernel dimensions - dimensions of the kernel
  • mapCount - [named, optional, default is 0] depth of feature map. 0 means use the row dimension of w
  • stride - [named, optional, default is 1] stride dimensions
  • sharing - [named, optional, default is true] sharing flags for each input dimension
  • autoPadding - [named, optional, default is true] automatic padding flags for each input dimension
  • lowerPad - [named, optional, default is 0] precise lower padding for each input dimension
  • upperPad - [named, optional, default is 0] precise upper padding for each input dimension
  • maxTempMemSizeInSamples - [named optional] maximum amount of auxiliary memory (in samples) that should be reserved to perform convolution operations. Some convolution engines (e.g. cuDNN and GEMM-based engines) can benefit from using workspace as it may improve performance. However, sometimes this may lead to higher memory utilization. Default is 0 which means the same as the input samples.

All values of the form {...} must actually be given as a colon-separated sequence of values, e.g. (5:5) for the kernel dimensions. (If you use the deprecated NDLNetworkBuilder, these must be comma-separated and enclosed in { } instead.)

Example (ConvReLULayer NDL macro):

ConvReLULayer(inp, outMap, inMap, kW, kH, hStride, vStride, wScale, bValue) =
[
    W = Parameter (outMap, inMap * kW * kH, init="gaussian", initValueScale=wScale)
    b = Parameter (outMap, 1, init="fixedValue", value=bValue)
    c = Convolution (W, inp, (kW:kH:inMap), stride=(hStride:vStride), autoPadding=true)
    y = RectifiedLinear (c + b)
].y

Note: If you are using the deprecated NDLNetworkBuilder, there should be no trailing .y in the example.

Simplified 2D Convolution (deprecated NDL only)

The 2D convolution syntax is:

Convolution(w, image, 
            kernelWidth, kernelHeight,
            horizontalStride, verticalStride,
            zeroPadding=false, maxTempMemSizeInSamples=0, imageLayout="cudnn" /* or "HWC"*/ )

where:

  • w - convolution weight matrix, it has the dimensions of [mapCount, kernelWidth * kernelHeight * inputChannels].
  • image - the input image.
  • mapCount - depth of output feature map (number of output channels)
  • kernelWidth - width of the kernel
  • kernelHeight - height of the kernel
  • horizontalStride - stride in horizontal direction
  • verticalStride - stride in vertical direction
  • zeroPadding - [named optional] specifies whether the sides of the image should be padded with zeros. Default is false.
  • maxTempMemSizeInSamples - [named optional] maximum amount of auxiliary memory (in samples) that should be reserved to perform convolution operations. Some convolution engines (e.g. cuDNN and GEMM-based engines) can benefit from using workspace as it may improve performance. However, sometimes this may lead to higher memory utilization. Default is 0 which means the same as the input samples.
  • imageLayout - [named optional] the storage format of each image. By default it’s HWC, which means each image is stored as [channel, width, height] in column major. If you use cuDNN to speed up training, you should set it to cudnn, which means each image is stored as [width, height, channel]. Note that cudnn layout will work both on GPU and CPU so it is recommended to use it by default.
Clone this wiki locally