diff --git a/README.md b/README.md index 1da29d5bb6..5c2706f597 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,17 @@ AMD's library for high peformance machine learning primitives. MIOpen supports t * [Boost](http://www.boost.org/) at least version 1.58 * MIOpen uses `boost-system` and `boost-filesystem` packages to enable persistent [kernel cache](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/doc/src/cache.md) -Instructions to install the above dependencies are present in this [section](#installing-the-dependencies). +## Installing the dependencies + +The dependencies can be installed with the `install_deps.cmake`, script: `cmake -P install_deps.cmake` + + +This will install by default to `/usr/local` but it can be installed in another location with `--prefix` argument: +``` +cmake -P install_deps.cmake --prefix /some/local/dir +``` + +Instructions to manually install all the dependencies on Ubuntu v16 are present in this [section](#installing-the-dependencies-manually). ## Installing MIOpen with pre-built packages @@ -167,19 +177,7 @@ Also, githooks can be installed to format the code per-commit: ./.githooks/install ``` -## Installing the dependencies - -The dependencies can be installed with the `install_deps.cmake`, script: - -``` -cmake -P install_deps.cmake -``` - -This will install by default to `/usr/local` but it can be installed in another location with `--prefix` argument: - -``` -cmake -P install_deps.cmake --prefix /some/local/dir -``` +## Installing the dependencies manually If Ubuntu v16 is used then the `OpenSSL` and `Boost` packages can also be installed by: ``` @@ -188,3 +186,15 @@ sudo apt-get install libboost-dev sudo apt-get install libboost-system-dev sudo apt-get install libboost-filesystem-dev ``` + +`half` header needs to be installed from [here](http://half.sourceforge.net/). + +## Using docker + +The easiest way is to use docker. You can build the top-level docker file: + + docker build -t miopen . + +Then to enter the developement environment use `docker run`: + + docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/data -w /data --group-add video -it miopen diff --git a/doc/src/cache.md b/doc/src/cache.md index 71a7b73a29..4ab864f724 100644 --- a/doc/src/cache.md +++ b/doc/src/cache.md @@ -6,7 +6,7 @@ MIOpen will cache binary kernels to disk, so they don't need to be compiled the Clear the cache --------------- -The cache can be cleared by simply deleting the cache directory(ie `$HOME/.cache/miopen`). This should only be needed for development purposes or to free disk space. The cache does not need to be cleared when upgrading MIOpen. +The cache can be cleared by simply deleting the cache directory (i.e., `$HOME/.cache/miopen`). This should only be needed for development purposes or to free disk space. The cache does not need to be cleared when upgrading MIOpen. Disabling the cache ------------------- diff --git a/doc/src/perfdatabase.md b/doc/src/perfdatabase.md index ce16f88711..20c34a7339 100644 --- a/doc/src/perfdatabase.md +++ b/doc/src/perfdatabase.md @@ -11,6 +11,9 @@ MIOpen performs Exhaustive Search only if explicitly requested via MIOpen API an The optimized solution found during the successful Search process is written into the PerfDb for future re-use. That is why MIOpen will not Search for optimized solution more than once for a given problem in this mode. +See documentation about miopenFind*() API calls for more info on how Search can be explicitly requested. + + **DB_UPDATE (2)** Similar to NONE, but Search will NOT be skipped if PerfDb contains relevant record. If Search is requested via MIOpen API, then MIOpen will perform the Search and update PerfDb. @@ -34,3 +37,15 @@ Note: This mode is intended for tuning the MIOpen installation. When MIOpen is i **DB_CLEAN (5)** MIOpen removes relevant records from the PerfDb instead of just reading and using those. Search is blocked, even if explicitly requested. + +## MIOPEN_FIND_ENFORCE_SCOPE + +This variable allows to limit the scope of `MIOPEN_FIND_ENFORCE`, so that only forward, backward data or backward weights convolutions will be affected. Both symbolic and numeric values are supported, as shown below. + +**ALL (1)** `MIOPEN_FIND_ENFORCE` affects all convolutions. This is the default. + +**CONV_FWD (2)** `MIOPEN_FIND_ENFORCE` affects only Forward convolutions. + +**CONV_BWD (3)** `MIOPEN_FIND_ENFORCE` affects only Backward Data convolutions. + +**CONV_WRW (4)** `MIOPEN_FIND_ENFORCE` affects only Backward With Regard to Weights (a.k.a WRW) convolutions. diff --git a/doc/src/releasenotes.md b/doc/src/releasenotes.md index 14af3fcc4f..e8a93e8240 100644 --- a/doc/src/releasenotes.md +++ b/doc/src/releasenotes.md @@ -1,6 +1,39 @@ ## MIOpen Release notes +### 03/30/2018 [ 1.3.0 ] + +Notes: + +- Performance improvements for RNNs +- Performance improvements for convolutions using 1x1 filters +- Performance improvement for Batch Normalization +- This release adds preliminary fp16 support for Inference using CNNs +- Bug fixes for various components of MIOpen + +Changes: + +- Added 2 new API for RNNs: miopenGetRNNLayerParamOffset and miopenGetRNNLayerBiasOffset +- Added support for uninitialized hidden states and nullptr outputs in RNNs +- Added support for Set and Scale operations for strided tensors with dimensions 1 to 5 +- Added multi-thread and multi-process support for the performance database +- Improved performance for OpTensor +- Fixed bug in convolutions for backward bias +- Fixed logic issues in get and set layer functions and related w_supertensor test +- Fixed hang in batch norm with batch sizes greater than 256 + +Known Issues: + +- RNNs do not support fp16 +- Training with CNNs does not support fp16 + + +### 03/08/2018 [ 1.2.1 ] + +Notes: + +- This release adds support for ROCm 1.7.1. + ### 12/15/2017 [ 1.2.0 ] diff --git a/doc/src/rnn.rst b/doc/src/rnn.rst index b57fc80c18..3a53861c97 100644 --- a/doc/src/rnn.rst +++ b/doc/src/rnn.rst @@ -51,7 +51,6 @@ miopenGetRNNDescriptor .. doxygenfunction:: miopenGetRNNDescriptor - miopenDestroyRNNDescriptor -------------------------- @@ -135,6 +134,17 @@ miopenSetRNNLayerBias .. doxygenfunction:: miopenSetRNNLayerBias +miopenGetRNNLayerParamOffset +---------------------------- + +.. doxygenfunction:: miopenGetRNNLayerParamOffset + + +miopenGetRNNLayerBiasOffset +--------------------------- + +.. doxygenfunction:: miopenGetRNNLayerBiasOffset + miopenRNNForwardTraining ------------------------ diff --git a/include/miopen/miopen.h b/include/miopen/miopen.h index 1ac1bcb86f..01aff0be79 100644 --- a/include/miopen/miopen.h +++ b/include/miopen/miopen.h @@ -289,11 +289,11 @@ typedef enum { /*! @ingroup convolutions * @enum miopenConvolutionMode_t - * Convolution mode selection for convolution layer preference + * Convolution mode selection for convolution layer preference. */ typedef enum { - miopenConvolution = 0, /*!< Convolutions */ - miopenTranspose = 1, /*!< Transpose convolutions */ + miopenConvolution = 0, /*!< Cross-Correlation convolution */ + miopenTranspose = 1, /*!< Transpose convolutions -- deconvolution */ } miopenConvolutionMode_t; /*! @ingroup padding @@ -613,7 +613,8 @@ MIOPEN_EXPORT miopenStatus_t miopenDestroyConvolutionDescriptor(miopenConvolutionDescriptor_t convDesc); /*! @enum miopenConvFwdAlgorithm_t - * Convolutional algorithm mode for forward propagation. + * Convolutional algorithm mode for forward propagation. MIOpen use cross-correlation for its + * convolution implementation. */ typedef enum { miopenConvolutionFwdAlgoGEMM = 0, /*!< GEMM variant */ @@ -698,11 +699,11 @@ miopenConvolutionForwardGetWorkSpaceSize(miopenHandle_t handle, * to execute this function, miopenConvolutionForwardGetWorkSpaceSize() must be * run to determine the required memory for this search. * - * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If a - * configuration match is not found, a default configuration will be returned. + * * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If + * a configuration match is not found, a default configuration will be returned. * - * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. If - * a match is not found, an exhaustive search is performed by running individual algorithms. + * * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. + * If a match is not found, an exhaustive search is performed by running individual algorithms. * * @param handle MIOpen handle (input) * @param xDesc Tensor descriptor for data input tensor x (input) @@ -831,11 +832,11 @@ miopenConvolutionBackwardDataGetWorkSpaceSize(miopenHandle_t handle, * execute this function, miopenConvolutionBackwardsDataGetWorkSpaceSize() must be run to determine * the required memory for this search. * - * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If a - * configuration match is not found, a default configuration will be returned. + * * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If + * a configuration match is not found, a default configuration will be returned. * - * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. If - * a match is not found, an exhaustive search is performed by running individual algorithms. + * * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. + * If a match is not found, an exhaustive search is performed by running individual algorithms. * * @param handle MIOpen handle (input) * @param dyDesc Tensor descriptor for data input tensor dy (input) @@ -944,11 +945,11 @@ miopenConvolutionBackwardWeightsGetWorkSpaceSize(miopenHandle_t handle, * execute this function, miopenConvolutionBackwardsWeightsGetWorkSpaceSize() must be run to * determine the required memory for this search. * - * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If a - * configuration match is not found, a default configuration will be returned. + * * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If + * a configuration match is not found, a default configuration will be returned. * - * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. If - * a match is not found, an exhaustive search is performed by running individual algorithms. + * * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. + * If a match is not found, an exhaustive search is performed by running individual algorithms. * * @param handle MIOpen handle (input) * @param dyDesc Tensor descriptor for data input tensor dy (input) @@ -1357,6 +1358,7 @@ MIOPEN_EXPORT miopenStatus_t miopenDestroyLRNDescriptor(miopenLRNDescriptor_t lr * * This function takes the input tensor descriptor and outputs a derived tensor for the * normalization scale (gamma) and shift (beta) tensors. + * * For an input tensor NCHW and spatial mode, the output derived tensor is 1C11, while for * per-activation the derived tensor is 1CHW. * @@ -1374,8 +1376,10 @@ MIOPEN_EXPORT miopenStatus_t miopenDeriveBNTensorDescriptor(miopenTensorDescript * Batch normalization pass for forward training pass. * Takes in batch normalization mode bn_mode and input tensor x, output tensor y, bnBias and bnScale * with their descriptor. + * * If either resultSaveMean, or resultSaveInvVariance are null pointers then the values for the mean * and inverse variance will not be used. + * * Likewise, if either resultRunningMean, or resultRunningVariance are null pointers then the values * for the running mean and variance will not be saved. * Running averages and variances are scaled using an exponential averaging factor: \f[ @@ -1429,6 +1433,7 @@ miopenBatchNormalizationForwardTraining(miopenHandle_t handle, * Batch normalization pass for forward inference pass. * Takes in batch normalization mode bn_mode and input tensor x, output tensor y, bnBias and bnScale * with their descriptor. + * * If either estimatedMean, or estimatedVariance are null pointers then the values for the mean and * variance will not be used. * @@ -1469,9 +1474,11 @@ miopenBatchNormalizationForwardInference(miopenHandle_t handle, * * Batch normalization pass for backwards propagation training pass. * The method for backwards propagation batch normalization. + * * Takes in batch normalization mode bn_mode and input tensor data x, input activation tensor dy, * output tensor dx, the learned tensors resultBNBiasDiff and resultBNScaleDiff with their * descriptor. + * * If BOTH savedMean, and savedVariance are not null pointers then the method will use the saved * mean and variance calculated by the forward training phase. * @@ -1694,8 +1701,8 @@ MIOPEN_EXPORT miopenStatus_t miopenSoftmaxBackward(miopenHandle_t handle, * RNN mode selection for rnn layer preference */ typedef enum { - miopenRNNRELU = 0, /*!< RNN ReLU squash */ - miopenRNNTANH = 1, /*!< RNN tanh squash */ + miopenRNNRELU = 0, /*!< RNN ReLU activation */ + miopenRNNTANH = 1, /*!< RNN tanh activation */ miopenLSTM = 2, /*!< LSTM */ miopenGRU = 3, /*!< GRU */ } miopenRNNMode_t; @@ -1767,11 +1774,6 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNDescriptor(miopenRNNDescriptor_t rnnDes int* hiddenSize, int* layer); -/* // discuss later -MIOPEN_EXPORT miopenStatus_t miopenGetRNNDescriptor( - miopenRNNDescriptor_t rnnDesc, miopenRNNMode_t* mode, int* seqLength, int* layer, int* bidir -*/ - /*! @brief Destroys the tensor descriptor object * * @param rnnDesc RNN tensor descriptor type (input) @@ -1791,7 +1793,7 @@ MIOPEN_EXPORT miopenStatus_t miopenDestroyRNNDescriptor(miopenRNNDescriptor_t rn * @param rnnMode RNN model type (input) * @param biasMode RNN bias included (input) * @param algo RNN algorithm selected (input) - * @param dataType fp32 or fp16 datatype mode, only fp 16 currently supported for RNNs (input) + * @param dataType Only fp32 currently supported for RNNs (input) * @return miopenStatus_t */ MIOPEN_EXPORT miopenStatus_t miopenSetRNNDescriptor(miopenRNNDescriptor_t rnnDesc, @@ -1935,23 +1937,28 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNHiddenTensorSize(miopenHandle_t handle, * For miopenLSTM paramID 0 to 3 refer to the weight matrices associated * with the input GEMM, 4-7 are associated with matrices associated with the * hidden state GEMM. - * ParamID 0 and 4 are for the input gate operations. - * ParamID 1 and 5 are for the forget gate operations. - * ParamID 2 and 6 are for the memory gate operations. - * ParamID 3 and 7 are for the output gate operations. * + * * paramID 0 and 4 are for the input gate operations. + * + * * paramID 1 and 5 are for the forget gate operations. * - * For miopenGRU paramID 0 to 2 refer to the the weight matrices associated - * with the input GEMM, while 5 through 6 are associated with the hidden state + * * paramID 2 and 6 are for the memory gate operations. + * + * * paramID 3 and 7 are for the output gate operations. + * + * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated + * with the input GEMM, while 3 through 5 are associated with the hidden state * GEMM. - * ParamID 0 and 4 are for the reset gate operations. - * ParamID 1 and 5 are for the update gate operations. - * ParamID 2 and 6 are for the memory gate operations. + * + * * paramID 0 and 3 are for the reset gate operations. + * + * * paramID 1 and 4 are for the update gate operations. + * + * * paramID 2 and 5 are for the memory gate operations. * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. * - * * @param handle MIOpen handle (input) * @param rnnDesc RNN layer descriptor type (input) * @param layer The layer number in the RNN stack (input) @@ -1973,27 +1980,30 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParamSize(miopenHandle_t handle, * weight matrix associated with the in input GEMM, while biasID == 1 retrieves * the bias associated with the hidden state GEMM. * - * For miopenLSTM paramID 0 to 3 refer to the biases associated + * For miopenLSTM biasID 0 to 3 refer to the biases associated * with the input GEMM, 4-7 are associated with biases associated with the * hidden state GEMM. - * biasID 0 and 4 are for the input gate operations. - * biasID 1 and 5 are for the forget gate operations. - * biasID 2 and 6 are for the memory gate operations. - * biasID 3 and 7 are for the output gate operations. * + * * biasID 0 and 4 are for the input gate operations. * - * For miopenGRU biasID 0 to 2 refer to the biases associated - * with the input GEMM, while 5 through 6 are associated with the hidden state - * GEMM. - * biasID 0 and 4 are for the reset gate operations. - * biasID 1 and 5 are for the update gate operations. - * biasID 2 and 6 are for the memory gate operations. + * * biasID 1 and 5 are for the forget gate operations. * + * * biasID 2 and 6 are for the memory gate operations. + * + * * biasID 3 and 7 are for the output gate operations. + * + * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM, + * while 3 through 5 are associated with the hidden state GEMM. + * + * * biasID 0 and 3 are for the reset gate operations. + * + * * biasID 1 and 4 are for the update gate operations. + * + * * biasID 2 and 5 are for the memory gate operations. * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. * - * * @param handle MIOpen handle (input) * @param rnnDesc RNN layer descriptor type (input) * @param layer The layer number in the RNN stack (input) @@ -2019,18 +2029,24 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasSize(miopenHandle_t handle, * For miopenLSTM paramID 0 to 3 refer to the weight matrices associated * with the input GEMM, 4-7 are associated with matrices associated with the * hidden state GEMM. - * ParamID 0 and 4 are for the input gate operations. - * ParamID 1 and 5 are for the forget gate operations. - * ParamID 2 and 6 are for the memory gate operations. - * ParamID 3 and 7 are for the output gate operations. * + * * paramID 0 and 4 are for the input gate operations. + * + * * paramID 1 and 5 are for the forget gate operations. + * + * * paramID 2 and 6 are for the memory gate operations. * - * For miopenGRU paramID 0 to 2 refer to the weight matrices associated - * with the input GEMM, while 5 through 6 are associated with the hidden state + * * paramID 3 and 7 are for the output gate operations. + * + * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated + * with the input GEMM, while 3 through 5 are associated with the hidden state * GEMM. - * ParamID 0 and 4 are for the reset gate operations. - * ParamID 1 and 5 are for the update gate operations. - * ParamID 2 and 6 are for the memory gate operations. + * + * * paramID 0 and 3 are for the reset gate operations. + * + * * paramID 1 and 4 are for the update gate operations. + * + * * paramID 2 and 5 are for the memory gate operations. * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. @@ -2044,6 +2060,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasSize(miopenHandle_t handle, * nullptr then only the paramDesc is populated and returned. The size in bytes of the * layer parameter matrix can be determined by using miopenGetRNNLayerParamSize(). * + * Note: When inputSkip mode is selected there is no input layer matrix operation, + * and therefore no associated memory. In this case miopenGetRNNLayerParam() will return + * a error status miopenStatusBadParm for input paramID associated with the input GEMM. * * @param handle MIOpen handle (input) * @param rnnDesc RNN layer descriptor type (input) @@ -2075,23 +2094,28 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParam(miopenHandle_t handle, * bias associated with the in input GEMM, while biasID == 1 retrieves * the bias associated with the hidden state GEMM. * - * For miopenLSTM paramID 0 to 3 refer to the biases associated + * For miopenLSTM biasID 0 to 3 refer to the biases associated * with the input GEMM, 4-7 are associated with biases associated with the * hidden state GEMM. - * biasID 0 and 4 are for the input gate operations. - * biasID 1 and 5 are for the forget gate operations. - * biasID 2 and 6 are for the memory gate operations. - * biasID 3 and 7 are for the output gate operations. * + * * biasID 0 and 4 are for the input gate operations. * - * For miopenGRU biasID 0 to 2 refer to the biases associated - * with the input GEMM, while 5 through 6 are associated with the hidden state - * GEMM. - * biasID 0 and 4 are for the reset gate operations. - * biasID 1 and 5 are for the update gate operations. - * biasID 2 and 6 are for the memory gate operations. + * * biasID 1 and 5 are for the forget gate operations. + * + * * biasID 2 and 6 are for the memory gate operations. + * + * * biasID 3 and 7 are for the output gate operations. * * + * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM, + * while 3 through 5 are associated with the hidden state GEMM. + * + * * biasID 0 and 3 are for the reset gate operations. + * + * * biasID 1 and 4 are for the update gate operations. + * + * * biasID 2 and 5 are for the memory gate operations. + * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. * @@ -2104,6 +2128,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParam(miopenHandle_t handle, * nullptr then only the biasDesc is populated and returned. The size in bytes of the * layer bias can be determined by using miopenGetRNNLayerBiasSize(). * + * Note: When inputSkip mode is selected there is no input layer matrix operation, + * and therefore no associated memory. In this case miopenGetRNNLayerBias() will return + * a error status miopenStatusBadParm for input biasID associated with the input GEMM. * * @param handle MIOpen handle (input) * @param rnnDesc RNN layer descriptor type (input) @@ -2138,18 +2165,24 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBias(miopenHandle_t handle, * For miopenLSTM paramID 0 to 3 refer to the weight matrix offsets associated * with the input GEMM, 4-7 are associated with matrix offset associated with the * hidden state GEMM. - * ParamID 0 and 4 are for the input gate operations. - * ParamID 1 and 5 are for the forget gate operations. - * ParamID 2 and 6 are for the memory gate operations. - * ParamID 3 and 7 are for the output gate operations. * + * * paramID 0 and 4 are for the input gate operations. + * + * * paramID 1 and 5 are for the forget gate operations. + * + * * paramID 2 and 6 are for the memory gate operations. + * + * * paramID 3 and 7 are for the output gate operations. * * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated - * with the input GEMM, while 5 through 6 are associated with the hidden state + * with the input GEMM, while 3 through 5 are associated with the hidden state * GEMM. - * ParamID 0 and 4 are for the reset gate operations. - * ParamID 1 and 5 are for the update gate operations. - * ParamID 2 and 6 are for the memory gate operations. + * + * * paramID 0 and 3 are for the reset gate operations. + * + * * paramID 1 and 4 are for the update gate operations. + * + * * paramID 2 and 5 are for the memory gate operations. * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. @@ -2161,6 +2194,10 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBias(miopenHandle_t handle, * The argument layerParamOffset should either be nullptr, or an address to place the * offset. If layerParamOffset is nullptr then only the paramDesc is populated and returned. * + * Note: When inputSkip mode is selected there is no input layer matrix operation, + * and therefore no associated memory. In this case miopenGetRNNLayerParamOffset() will return + * a error status miopenStatusBadParm for input paramID associated with the input GEMM. + * * * @param rnnDesc RNN layer descriptor type (input) * @param layer The layer number in the RNN stack (input) @@ -2185,21 +2222,25 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParamOffset(miopenRNNDescriptor_t * bias associated with the in input GEMM, while biasID == 1 retrieves * the weight matrix associated with the hidden state GEMM. * - * For miopenLSTM paramID 0 to 3 refer to the bias offset associated + * For miopenLSTM biasID 0 to 3 refer to the bias offset associated * with the input GEMM, 4-7 are the bias offsets associated with the hidden state GEMM. - * biasID 0 and 4 are for the input gate operations. - * biasID 1 and 5 are for the forget gate operations. - * biasID 2 and 6 are for the memory gate operations. - * biasID 3 and 7 are for the output gate operations. * + * * biasID 0 and 4 are for the input gate operations. * - * For miopenGRU biasID 0 to 2 refer to the bias offsets associated - * with the input GEMM, while 5 through 6 are associated with the hidden state - * GEMM. - * biasID 0 and 4 are for the reset gate operations. - * biasID 1 and 5 are for the update gate operations. - * biasID 2 and 6 are for the memory gate operations. + * * biasID 1 and 5 are for the forget gate operations. + * + * * biasID 2 and 6 are for the memory gate operations. + * + * * biasID 3 and 7 are for the output gate operations. + * + * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM, + * while 3 through 5 are associated with the hidden state GEMM. + * + * * biasID 0 and 3 are for the reset gate operations. * + * * biasID 1 and 4 are for the update gate operations. + * + * * biasID 2 and 5 are for the memory gate operations. * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. @@ -2211,6 +2252,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParamOffset(miopenRNNDescriptor_t * The argument layerBiasOffset should either be nullptr, or point to an output address. * If layerBias is nullptr then only the biasDesc is populated and returned. * + * Note: When inputSkip mode is selected there is no input layer matrix operation, + * and therefore no associated memory. In this case miopenGetRNNLayerBiasOffset() will return + * a error status miopenStatusBadParm for input biasID associated with the input GEMM. * * @param rnnDesc RNN layer descriptor type (input) * @param layer The layer number in the RNN stack (input) @@ -2239,18 +2283,25 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasOffset(miopenRNNDescriptor_t r * For miopenLSTM paramID 0 to 3 refer to the weight matrices associated * with the input GEMM, 4-7 are associated with matrices associated with the * hidden state GEMM. - * ParamID 0 and 4 are for the input gate operations. - * ParamID 1 and 5 are for the forget gate operations. - * ParamID 2 and 6 are for the memory gate operations. - * ParamID 3 and 7 are for the output gate operations. * + * * paramID 0 and 4 are for the input gate operations. + * + * * paramID 1 and 5 are for the forget gate operations. + * + * * paramID 2 and 6 are for the memory gate operations. + * + * * paramID 3 and 7 are for the output gate operations. * - * For miopenGRU paramID 0 to 2 refer to the weight matrices associated - * with the input GEMM, while 5 through 6 are associated with the hidden state + * + * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated + * with the input GEMM, while 3 through 5 are associated with the hidden state * GEMM. - * ParamID 0 and 4 are for the reset gate operations. - * ParamID 1 and 5 are for the update gate operations. - * ParamID 2 and 6 are for the memory gate operations. + * + * * paramID 0 and 3 are for the reset gate operations. + * + * * paramID 1 and 4 are for the update gate operations. + * + * * paramID 2 and 5 are for the memory gate operations. * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. @@ -2258,6 +2309,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasOffset(miopenRNNDescriptor_t r * The input argument paramDesc is a previously populated tensor descriptor typically * by first calling miopenGetRNNLayerParam(). * + * Note: When inputSkip mode is selected there is no input layer matrix operation, + * and therefore no associated memory. In this case miopenSetRNNLayerParam() will return + * a error status miopenStatusBadParm for input paramID associated with the input GEMM. * * @param handle MIOpen handle (input) * @param rnnDesc RNN layer descriptor type (input) @@ -2288,22 +2342,27 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerParam(miopenHandle_t handle, * weight matrix associated with the in input GEMM, while biasID == 1 retrieves * the bias associated with the hidden state GEMM. * - * For miopenLSTM paramID 0 to 3 refer to the biases associated + * For miopenLSTM biasID 0 to 3 refer to the biases associated * with the input GEMM, 4-7 are associated with the biases associated with the * hidden state GEMM. - * biasID 0 and 4 are for the input gate operations. - * biasID 1 and 5 are for the forget gate operations. - * biasID 2 and 6 are for the memory gate operations. - * biasID 3 and 7 are for the output gate operations. * + * * biasID 0 and 4 are for the input gate operations. * - * For miopenGRU biasID 0 to 2 refer to the biases associated - * with the input GEMM, while 5 through 6 are associated with the hidden state - * GEMM. - * biasID 0 and 4 are for the reset gate operations. - * biasID 1 and 5 are for the update gate operations. - * biasID 2 and 6 are for the memory gate operations. + * * biasID 1 and 5 are for the forget gate operations. + * + * * biasID 2 and 6 are for the memory gate operations. + * + * * biasID 3 and 7 are for the output gate operations. + * + * + * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM, + * while 3 through 5 are associated with the hidden state GEMM. + * + * * biasID 0 and 3 are for the reset gate operations. + * + * * biasID 1 and 4 are for the update gate operations. * + * * biasID 2 and 5 are for the memory gate operations. * * For bi-directional RNNs the backwards in time direction is numbered as the layer * directly after the forward in time direction. @@ -2311,6 +2370,9 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerParam(miopenHandle_t handle, * The input argument biasDesc is a previously populated tensor descriptor typically * by first calling miopenGetRNNLayeBias(). * + * Note: When inputSkip mode is selected there is no input layer matrix operation, + * and therefore no associated memory. In this case miopenSetRNNLayerBias will return + * a error status miopenStatusBadParm for input biasID associated with the input GEMM. * * @param handle MIOpen handle (input) * @param rnnDesc RNN layer descriptor type (input) @@ -2351,13 +2413,15 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerBias(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param hx Pointer to the hidden layer input tensor (input) + * @param hx Pointer to the hidden layer input tensor. If hx is NULL, + * then the initial hidden state will be zero initialized. (input) * @param cxDesc A cell tensor descriptor that has as its first dimension * of the number of layers if the direction mode is unidirectional and twice the * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param cx Pointer to the cell layer input tensor (input) + * @param cx Pointer to the cell layer input tensor. If cx is NULL, + * then the initial cell state will be zero initialized. (input) * @param wDesc A weights tensor descriptor (input) * @param w Pointer to input weights tensor (input) * @param yDesc An array of fully packed tensor descriptors associated @@ -2373,13 +2437,15 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerBias(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param hy Pointer to the hidden layer output tensor (output) + * @param hy Pointer to the hidden layer output tensor. If hy is NULL, + * then the final hidden state will not be saved. (output) * @param cyDesc A cell tensor descriptor that has as its first dimension * of the number of layers if the direction mode is unidirectional and twice the * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param cy Pointer to the cell layer output tensor (output) + * @param cy Pointer to the cell layer output tensor. If hy is NULL, + * then the final cell state will not be saved. (output) * @param workSpace Pointer to memory allocated for forward training (input) * @param workSpaceNumBytes Number of allocated bytes in memory for the workspace (input) * @param reserveSpace Pointer to memory allocated for random states (input / output) @@ -2436,7 +2502,8 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNForwardTraining(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param dcy Pointer to the cell layer input tensor (input) + * @param dcy Pointer to the cell layer input tensor. If dcy is NULL, + * then the initial delta cell state will be zero initialized. (input) * @param wDesc A weights tensor descriptor (input) * @param w Pointer to input weights tensor (input) * @param hxDesc An input hidden tensor descriptor that has as its first dimension @@ -2444,13 +2511,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNForwardTraining(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param hx Pointer to output tensor (input) + * @param hx Pointer to the hidden layer input tensor. If hx is NULL, + * then the initial hidden state will be zero initialized. (input) * @param cxDesc A input cell tensor descriptor that has as its first dimension * of the number of layers if the direction mode is unidirectional and twice the * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param cx Pointer to the hidden layer output tensor (input) + * @param cx Pointer to the hidden layer input tensor. If cx is NULL, + * then the initial cell state will be zero initialized. (input) * @param dxDesc An array of tensor descriptors. These are the * input descriptors to each time step. The first dimension of each descriptor is the * batch size and may decrease from element n to element n+1 and not increase in size. @@ -2462,13 +2531,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNForwardTraining(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param dhx Pointer to the cell layer output tensor (output) + * @param dhx Pointer to the delta hidden layer output tensor. If dhx is NULL + * the hidden gradient will not ouput. (output) * @param dcxDesc A tensor descriptor that has as its first dimension * of the number of layers if the direction mode is unidirectional and twice the * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param dcx Pointer to the cell layer output tensor (output) + * @param dcx Pointer to the cell layer output tensor. If dcx is NULL + * the cell gradient will not ouput. (output) * @param workSpace Pointer to memory allocated for forward training (input) * @param workSpaceNumBytes Number of allocated bytes in memory for the workspace (input) * @param reserveSpace Pointer to memory allocated for random states (input / output) @@ -2521,7 +2592,8 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardData(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param hx Pointer to the hidden layer input tensor (input) + * @param hx Pointer to the hidden layer input tensor. If hx is NULL, + * then the initial hidden state will be zero initialized. (input) * @param yDesc An array of fully packed tensor descriptors associated * with the output from each time step. The first dimension of the tensor descriptors * must equal the first dimension of the first descriptor (batch size) in the xDesc @@ -2529,9 +2601,9 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardData(miopenHandle_t handle, * depends on the direction mode selected. If the direction mode is unidirectional, * the second dimension is the hiddenSize. If direction mode is bidirectional * the second dimension is twice the hiddenSize. (input) - * @param y Pointer to the cell layer input tensor (input) + * @param y Pointer to the output tensor (input) * @param dwDesc A weights tensor descriptor (input) - * @param dw Pointer to input weights tensor (output) + * @param dw Pointer to input weights tensor (input / output) * @param workSpace Pointer to memory allocated for forward training (input) * @param workSpaceNumBytes Number of allocated bytes in memory for the workspace (input) * @param reserveSpace Pointer to memory allocated for random states (input) @@ -2572,13 +2644,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardWeights(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param hx Pointer to the hidden layer input tensor (input) + * @param hx Pointer to the hidden layer input tensor. If hx is NULL, + * then the initial hidden state will be zero initialized. (input) * @param cxDesc A cell tensor descriptor that has as its first dimension * of the number of layers if the direction mode is unidirectional and twice the * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param cx Pointer to the cell layer input tensor (input) + * @param cx Pointer to the cell layer input tensor. If cx is NULL, + * then the initial cell state will be zero initialized. (input) * @param wDesc A weights tensor descriptor (input) * @param w Pointer to input weights tensor (input) * @param yDesc An array of fully packed tensor descriptors associated @@ -2594,13 +2668,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardWeights(miopenHandle_t handle, * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param hy Pointer to the hidden layer output tensor (output) + * @param hy Pointer to the hidden layer output tensor. If hy is NULL, + * then the final hidden state will not be saved. (output) * @param cyDesc A output cell tensor descriptor that has as its first dimension * of the number of layers if the direction mode is unidirectional and twice the * number of layers if the direction mode is bidirectional. The second dimension of * the descriptor must equal the largest first dimension of the xDesc tensor descriptor * array. The third dimension equals the hiddenSize. (input) - * @param cy Pointer to the cell layer output tensor (output) + * @param cy Pointer to the cell layer output tensor. If cy is NULL, + * then the final cell state will not be saved. (output) * @param workSpace Pointer to memory allocated for forward training (input) * @param workSpaceNumBytes Number of allocated bytes in memory for the workspace (input) * @return miopenStatus_t diff --git a/src/kernels/MIOpenLRNBwd.cl b/src/kernels/MIOpenLRNBwd.cl index b5c235b25e..9bdb6f7529 100644 --- a/src/kernels/MIOpenLRNBwd.cl +++ b/src/kernels/MIOpenLRNBwd.cl @@ -40,7 +40,7 @@ #endif #define _FLOAT2 PPCAT(_FLOAT, TWO) -#define _FLOAT2 PPCAT(_FLOAT, THREE) +#define _FLOAT3 PPCAT(_FLOAT, THREE) #define _FLOAT4 PPCAT(_FLOAT, FOUR) #define _FLOAT8 PPCAT(_FLOAT, EIGHT)