diff --git a/README.md b/README.md
index 1da29d5bb6..5c2706f597 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,17 @@ AMD's library for high peformance machine learning primitives. MIOpen supports t
 * [Boost](http://www.boost.org/) at least version 1.58
   * MIOpen uses `boost-system` and `boost-filesystem` packages to enable persistent [kernel cache](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/doc/src/cache.md)
 
-Instructions to install the above dependencies are present in this [section](#installing-the-dependencies).
+## Installing the dependencies
+
+The dependencies can be installed with the `install_deps.cmake`, script: `cmake -P install_deps.cmake`
+
+
+This will install by default to `/usr/local` but it can be installed in another location with `--prefix` argument:
+```
+cmake -P install_deps.cmake --prefix /some/local/dir
+```
+
+Instructions to manually install all the dependencies on Ubuntu v16 are present in this [section](#installing-the-dependencies-manually).
 
 ## Installing MIOpen with pre-built packages
 
@@ -167,19 +177,7 @@ Also, githooks can be installed to format the code per-commit:
 ./.githooks/install
 ```
 
-## Installing the dependencies
-
-The dependencies can be installed with the `install_deps.cmake`, script:
-
-```
-cmake -P install_deps.cmake
-```
-
-This will install by default to `/usr/local` but it can be installed in another location with `--prefix` argument:
-
-```
-cmake -P install_deps.cmake --prefix /some/local/dir
-```
+## Installing the dependencies manually
 
 If Ubuntu v16 is used then the `OpenSSL` and `Boost` packages can also be installed by:
 ```
@@ -188,3 +186,15 @@ sudo apt-get install libboost-dev
 sudo apt-get install libboost-system-dev
 sudo apt-get install libboost-filesystem-dev
 ```
+
+`half` header needs to be installed from [here](http://half.sourceforge.net/). 
+
+## Using docker
+
+The easiest way is to use docker. You can build the top-level docker file:
+
+    docker build -t miopen .
+
+Then to enter the developement environment use `docker run`:
+
+    docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/data -w /data --group-add video -it miopen
diff --git a/doc/src/cache.md b/doc/src/cache.md
index 71a7b73a29..4ab864f724 100644
--- a/doc/src/cache.md
+++ b/doc/src/cache.md
@@ -6,7 +6,7 @@ MIOpen will cache binary kernels to disk, so they don't need to be compiled the
 Clear the cache
 ---------------
 
-The cache can be cleared by simply deleting the cache directory(ie `$HOME/.cache/miopen`). This should only be needed for development purposes or to free disk space. The cache does not need to be cleared when upgrading MIOpen.
+The cache can be cleared by simply deleting the cache directory (i.e., `$HOME/.cache/miopen`). This should only be needed for development purposes or to free disk space. The cache does not need to be cleared when upgrading MIOpen.
 
 Disabling the cache
 -------------------
diff --git a/doc/src/perfdatabase.md b/doc/src/perfdatabase.md
index ce16f88711..20c34a7339 100644
--- a/doc/src/perfdatabase.md
+++ b/doc/src/perfdatabase.md
@@ -11,6 +11,9 @@ MIOpen performs Exhaustive Search only if explicitly requested via MIOpen API an
 
 The optimized solution found during the successful Search process is written into the PerfDb for future re-use. That is why MIOpen will not Search for optimized solution more than once for a given problem in this mode.
 
+See documentation about miopenFind*() API calls for more info on how Search can be explicitly requested.
+
+
 **DB_UPDATE (2)**
 
 Similar to NONE, but Search will NOT be skipped if PerfDb contains relevant record. If Search is requested via MIOpen API, then MIOpen will perform the Search and update PerfDb.
@@ -34,3 +37,15 @@ Note: This mode is intended for tuning the MIOpen installation. When MIOpen is i
 **DB_CLEAN (5)**
 
 MIOpen removes relevant records from the PerfDb instead of just reading and using those. Search is blocked, even if explicitly requested.
+
+## MIOPEN_FIND_ENFORCE_SCOPE
+
+This variable allows to limit the scope of `MIOPEN_FIND_ENFORCE`, so that only forward, backward data or backward weights convolutions will be affected. Both symbolic and numeric values are supported, as shown below.
+
+**ALL (1)** `MIOPEN_FIND_ENFORCE` affects all convolutions. This is the default.
+		
+**CONV_FWD (2)** `MIOPEN_FIND_ENFORCE` affects only Forward convolutions.
+
+**CONV_BWD (3)** `MIOPEN_FIND_ENFORCE` affects only Backward Data convolutions.
+
+**CONV_WRW (4)** `MIOPEN_FIND_ENFORCE` affects only Backward With Regard to Weights (a.k.a WRW) convolutions.
diff --git a/doc/src/releasenotes.md b/doc/src/releasenotes.md
index 14af3fcc4f..e8a93e8240 100644
--- a/doc/src/releasenotes.md
+++ b/doc/src/releasenotes.md
@@ -1,6 +1,39 @@
 
 ## MIOpen Release notes
 
+### 03/30/2018 [ 1.3.0 ]
+
+Notes: 
+
+- Performance improvements for RNNs
+- Performance improvements for convolutions using 1x1 filters
+- Performance improvement for Batch Normalization
+- This release adds preliminary fp16 support for Inference using CNNs
+- Bug fixes for various components of MIOpen
+
+Changes:
+
+- Added 2 new API for RNNs: miopenGetRNNLayerParamOffset and miopenGetRNNLayerBiasOffset
+- Added support for uninitialized hidden states and nullptr outputs in RNNs
+- Added support for Set and Scale operations for strided tensors with dimensions 1 to 5
+- Added multi-thread and multi-process support for the performance database
+- Improved performance for OpTensor
+- Fixed bug in convolutions for backward bias
+- Fixed logic issues in get and set layer functions and related w_supertensor test
+- Fixed hang in batch norm with batch sizes greater than 256
+
+Known Issues:
+
+- RNNs do not support fp16
+- Training with CNNs does not support fp16
+
+
+### 03/08/2018 [ 1.2.1 ]
+
+Notes:
+
+- This release adds support for ROCm 1.7.1.
+
 
 ### 12/15/2017 [ 1.2.0 ]
 
diff --git a/doc/src/rnn.rst b/doc/src/rnn.rst
index b57fc80c18..3a53861c97 100644
--- a/doc/src/rnn.rst
+++ b/doc/src/rnn.rst
@@ -51,7 +51,6 @@ miopenGetRNNDescriptor
 .. doxygenfunction::  miopenGetRNNDescriptor
 
 
-
 miopenDestroyRNNDescriptor
 --------------------------
 
@@ -135,6 +134,17 @@ miopenSetRNNLayerBias
 
 .. doxygenfunction::  miopenSetRNNLayerBias
 
+miopenGetRNNLayerParamOffset
+----------------------------
+
+.. doxygenfunction::  miopenGetRNNLayerParamOffset
+
+
+miopenGetRNNLayerBiasOffset
+---------------------------
+
+.. doxygenfunction::  miopenGetRNNLayerBiasOffset
+
 miopenRNNForwardTraining
 ------------------------
 
diff --git a/include/miopen/miopen.h b/include/miopen/miopen.h
index 1ac1bcb86f..01aff0be79 100644
--- a/include/miopen/miopen.h
+++ b/include/miopen/miopen.h
@@ -289,11 +289,11 @@ typedef enum {
 
 /*! @ingroup convolutions
  *  @enum miopenConvolutionMode_t
- * Convolution mode selection for convolution layer preference
+ * Convolution mode selection for convolution layer preference.
 */
 typedef enum {
-    miopenConvolution = 0, /*!< Convolutions */
-    miopenTranspose   = 1, /*!< Transpose convolutions */
+    miopenConvolution = 0, /*!< Cross-Correlation convolution */
+    miopenTranspose   = 1, /*!< Transpose convolutions -- deconvolution */
 } miopenConvolutionMode_t;
 
 /*! @ingroup padding
@@ -613,7 +613,8 @@ MIOPEN_EXPORT miopenStatus_t
 miopenDestroyConvolutionDescriptor(miopenConvolutionDescriptor_t convDesc);
 
 /*! @enum miopenConvFwdAlgorithm_t
- * Convolutional algorithm mode for forward propagation.
+ * Convolutional algorithm mode for forward propagation. MIOpen use cross-correlation for its
+ * convolution implementation.
  */
 typedef enum {
     miopenConvolutionFwdAlgoGEMM     = 0, /*!< GEMM variant */
@@ -698,11 +699,11 @@ miopenConvolutionForwardGetWorkSpaceSize(miopenHandle_t handle,
  * to execute this function, miopenConvolutionForwardGetWorkSpaceSize() must be
  * run to determine the required memory for this search.
  *
- * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If a
- * configuration match is not found, a default configuration will be returned.
+ * * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If
+ * a configuration match is not found, a default configuration will be returned.
  *
- * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. If
- * a match is not found, an exhaustive search is performed by running individual algorithms.
+ * * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration.
+ * If a match is not found, an exhaustive search is performed by running individual algorithms.
  *
  * @param handle             MIOpen handle (input)
  * @param xDesc              Tensor descriptor for data input tensor x (input)
@@ -831,11 +832,11 @@ miopenConvolutionBackwardDataGetWorkSpaceSize(miopenHandle_t handle,
  * execute this function, miopenConvolutionBackwardsDataGetWorkSpaceSize() must be run to determine
  * the required memory for this search.
  *
- * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If a
- * configuration match is not found, a default configuration will be returned.
+ * * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If
+ * a configuration match is not found, a default configuration will be returned.
  *
- * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. If
- * a match is not found, an exhaustive search is performed by running individual algorithms.
+ * * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration.
+ * If a match is not found, an exhaustive search is performed by running individual algorithms.
  *
  * @param handle             MIOpen handle (input)
  * @param dyDesc             Tensor descriptor for data input tensor dy (input)
@@ -944,11 +945,11 @@ miopenConvolutionBackwardWeightsGetWorkSpaceSize(miopenHandle_t handle,
  * execute this function, miopenConvolutionBackwardsWeightsGetWorkSpaceSize() must be run to
  * determine the required memory for this search.
  *
- * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If a
- * configuration match is not found, a default configuration will be returned.
+ * * If exhaustiveSearch == 0, MIOpen will look for the first kernel with a configuration match. If
+ * a configuration match is not found, a default configuration will be returned.
  *
- * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration. If
- * a match is not found, an exhaustive search is performed by running individual algorithms.
+ * * If exhaustiveSearch == 1, MIOpen will look for the best kernel for the provided configuration.
+ * If a match is not found, an exhaustive search is performed by running individual algorithms.
  *
  * @param handle             MIOpen handle (input)
  * @param dyDesc             Tensor descriptor for data input tensor dy (input)
@@ -1357,6 +1358,7 @@ MIOPEN_EXPORT miopenStatus_t miopenDestroyLRNDescriptor(miopenLRNDescriptor_t lr
  *
  * This function takes the input tensor descriptor and outputs a derived tensor for the
  * normalization scale (gamma) and shift (beta) tensors.
+ *
  * For an input tensor NCHW and spatial mode, the output derived tensor is 1C11, while for
  * per-activation the derived tensor is 1CHW.
  *
@@ -1374,8 +1376,10 @@ MIOPEN_EXPORT miopenStatus_t miopenDeriveBNTensorDescriptor(miopenTensorDescript
  * Batch normalization pass for forward training pass.
  * Takes in batch normalization mode bn_mode and input tensor x, output tensor y, bnBias and bnScale
  * with their descriptor.
+ *
  * If either resultSaveMean, or resultSaveInvVariance are null pointers then the values for the mean
  * and inverse variance will not be used.
+ *
  * Likewise, if either resultRunningMean, or resultRunningVariance are null pointers then the values
  * for the running mean and variance will not be saved.
  * Running averages and variances are scaled using an exponential averaging factor: \f[
@@ -1429,6 +1433,7 @@ miopenBatchNormalizationForwardTraining(miopenHandle_t handle,
  * Batch normalization pass for forward inference pass.
  * Takes in batch normalization mode bn_mode and input tensor x, output tensor y, bnBias and bnScale
  * with their descriptor.
+ *
  * If either estimatedMean, or estimatedVariance are null pointers then the values for the mean and
  * variance will not be used.
  *
@@ -1469,9 +1474,11 @@ miopenBatchNormalizationForwardInference(miopenHandle_t handle,
  *
  * Batch normalization pass for backwards propagation training pass.
  * The method for backwards propagation batch normalization.
+ *
  * Takes in batch normalization mode bn_mode and input tensor data x, input activation tensor dy,
  * output tensor dx, the learned tensors resultBNBiasDiff and resultBNScaleDiff with their
  * descriptor.
+ *
  * If BOTH savedMean, and savedVariance are not null pointers then the method will use the saved
  * mean and variance calculated by the forward training phase.
  *
@@ -1694,8 +1701,8 @@ MIOPEN_EXPORT miopenStatus_t miopenSoftmaxBackward(miopenHandle_t handle,
 * RNN mode selection for rnn layer preference
 */
 typedef enum {
-    miopenRNNRELU = 0, /*!< RNN ReLU squash */
-    miopenRNNTANH = 1, /*!< RNN tanh squash */
+    miopenRNNRELU = 0, /*!< RNN ReLU activation */
+    miopenRNNTANH = 1, /*!< RNN tanh activation */
     miopenLSTM    = 2, /*!< LSTM */
     miopenGRU     = 3, /*!< GRU */
 } miopenRNNMode_t;
@@ -1767,11 +1774,6 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNDescriptor(miopenRNNDescriptor_t rnnDes
                                                     int* hiddenSize,
                                                     int* layer);
 
-/* // discuss later
-MIOPEN_EXPORT miopenStatus_t miopenGetRNNDescriptor(
-    miopenRNNDescriptor_t rnnDesc, miopenRNNMode_t* mode, int* seqLength, int* layer, int* bidir
-*/
-
 /*! @brief Destroys the tensor descriptor object
 *
 * @param rnnDesc RNN tensor descriptor type (input)
@@ -1791,7 +1793,7 @@ MIOPEN_EXPORT miopenStatus_t miopenDestroyRNNDescriptor(miopenRNNDescriptor_t rn
  * @param rnnMode      RNN model type (input)
  * @param biasMode     RNN bias included (input)
  * @param algo         RNN algorithm selected (input)
- * @param dataType     fp32 or fp16 datatype mode, only fp 16 currently supported for RNNs (input)
+ * @param dataType     Only fp32 currently supported for RNNs (input)
  * @return             miopenStatus_t
 */
 MIOPEN_EXPORT miopenStatus_t miopenSetRNNDescriptor(miopenRNNDescriptor_t rnnDesc,
@@ -1935,23 +1937,28 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNHiddenTensorSize(miopenHandle_t handle,
  * For miopenLSTM paramID 0 to 3 refer to the weight matrices associated
  * with the input GEMM, 4-7 are associated with matrices associated with the
  * hidden state GEMM.
- * ParamID 0 and 4 are for the input gate operations.
- * ParamID 1 and 5 are for the forget gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
- * ParamID 3 and 7 are for the output gate operations.
  *
+ * * paramID 0 and 4 are for the input gate operations.
+ *
+ * * paramID 1 and 5 are for the forget gate operations.
  *
- * For miopenGRU paramID 0 to 2 refer to the the weight matrices associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
+ * * paramID 2 and 6 are for the memory gate operations.
+ *
+ * * paramID 3 and 7 are for the output gate operations.
+ *
+ * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated
+ * with the input GEMM, while 3 through 5 are associated with the hidden state
  * GEMM.
- * ParamID 0 and 4 are for the reset gate operations.
- * ParamID 1 and 5 are for the update gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
+ *
+ * * paramID 0 and 3 are for the reset gate operations.
+ *
+ * * paramID 1 and 4 are for the update gate operations.
+ *
+ * * paramID 2 and 5 are for the memory gate operations.
  *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
  *
- *
  * @param handle          MIOpen handle (input)
  * @param rnnDesc         RNN layer descriptor type (input)
  * @param layer           The layer number in the RNN stack (input)
@@ -1973,27 +1980,30 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParamSize(miopenHandle_t handle,
  * weight matrix associated with the in input GEMM, while biasID == 1 retrieves
  * the bias associated with the hidden state GEMM.
  *
- * For miopenLSTM paramID 0 to 3 refer to the biases associated
+ * For miopenLSTM biasID 0 to 3 refer to the biases associated
  * with the input GEMM, 4-7 are associated with biases associated with the
  * hidden state GEMM.
- * biasID 0 and 4 are for the input gate operations.
- * biasID 1 and 5 are for the forget gate operations.
- * biasID 2 and 6 are for the memory gate operations.
- * biasID 3 and 7 are for the output gate operations.
  *
+ * * biasID 0 and 4 are for the input gate operations.
  *
- * For miopenGRU biasID 0 to 2 refer to the biases associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
- * GEMM.
- * biasID 0 and 4 are for the reset gate operations.
- * biasID 1 and 5 are for the update gate operations.
- * biasID 2 and 6 are for the memory gate operations.
+ * * biasID 1 and 5 are for the forget gate operations.
  *
+ * * biasID 2 and 6 are for the memory gate operations.
+ *
+ * * biasID 3 and 7 are for the output gate operations.
+ *
+ * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM,
+ * while 3 through 5 are associated with the hidden state GEMM.
+ *
+ * * biasID 0 and 3 are for the reset gate operations.
+ *
+ * * biasID 1 and 4 are for the update gate operations.
+ *
+ * * biasID 2 and 5 are for the memory gate operations.
  *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
  *
- *
  * @param handle          MIOpen handle (input)
  * @param rnnDesc         RNN layer descriptor type (input)
  * @param layer           The layer number in the RNN stack (input)
@@ -2019,18 +2029,24 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasSize(miopenHandle_t handle,
  * For miopenLSTM paramID 0 to 3 refer to the weight matrices associated
  * with the input GEMM, 4-7 are associated with matrices associated with the
  * hidden state GEMM.
- * ParamID 0 and 4 are for the input gate operations.
- * ParamID 1 and 5 are for the forget gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
- * ParamID 3 and 7 are for the output gate operations.
  *
+ * * paramID 0 and 4 are for the input gate operations.
+ *
+ * * paramID 1 and 5 are for the forget gate operations.
+ *
+ * * paramID 2 and 6 are for the memory gate operations.
  *
- * For miopenGRU paramID 0 to 2 refer to the weight matrices associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
+ * * paramID 3 and 7 are for the output gate operations.
+ *
+ * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated
+ * with the input GEMM, while 3 through 5 are associated with the hidden state
  * GEMM.
- * ParamID 0 and 4 are for the reset gate operations.
- * ParamID 1 and 5 are for the update gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
+ *
+ * * paramID 0 and 3 are for the reset gate operations.
+ *
+ * * paramID 1 and 4 are for the update gate operations.
+ *
+ * * paramID 2 and 5 are for the memory gate operations.
  *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
@@ -2044,6 +2060,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasSize(miopenHandle_t handle,
  * nullptr then only the paramDesc is populated and returned. The size in bytes of the
  * layer parameter matrix can be determined by using miopenGetRNNLayerParamSize().
  *
+ * Note: When inputSkip mode is selected there is no input layer matrix operation,
+ * and therefore no associated memory. In this case miopenGetRNNLayerParam() will return
+ * a error status miopenStatusBadParm for input paramID associated with the input GEMM.
  *
  * @param handle          MIOpen handle (input)
  * @param rnnDesc         RNN layer descriptor type (input)
@@ -2075,23 +2094,28 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParam(miopenHandle_t handle,
  * bias associated with the in input GEMM, while biasID == 1 retrieves
  * the bias associated with the hidden state GEMM.
  *
- * For miopenLSTM paramID 0 to 3 refer to the biases associated
+ * For miopenLSTM biasID 0 to 3 refer to the biases associated
  * with the input GEMM, 4-7 are associated with biases associated with the
  * hidden state GEMM.
- * biasID 0 and 4 are for the input gate operations.
- * biasID 1 and 5 are for the forget gate operations.
- * biasID 2 and 6 are for the memory gate operations.
- * biasID 3 and 7 are for the output gate operations.
  *
+ * * biasID 0 and 4 are for the input gate operations.
  *
- * For miopenGRU biasID 0 to 2 refer to the biases associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
- * GEMM.
- * biasID 0 and 4 are for the reset gate operations.
- * biasID 1 and 5 are for the update gate operations.
- * biasID 2 and 6 are for the memory gate operations.
+ * * biasID 1 and 5 are for the forget gate operations.
+ *
+ * * biasID 2 and 6 are for the memory gate operations.
+ *
+ * * biasID 3 and 7 are for the output gate operations.
  *
  *
+ * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM,
+ * while 3 through 5 are associated with the hidden state GEMM.
+ *
+ * * biasID 0 and 3 are for the reset gate operations.
+ *
+ * * biasID 1 and 4 are for the update gate operations.
+ *
+ * * biasID 2 and 5 are for the memory gate operations.
+ *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
  *
@@ -2104,6 +2128,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParam(miopenHandle_t handle,
  * nullptr then only the biasDesc is populated and returned. The size in bytes of the
  * layer bias can be determined by using miopenGetRNNLayerBiasSize().
  *
+ * Note: When inputSkip mode is selected there is no input layer matrix operation,
+ * and therefore no associated memory. In this case miopenGetRNNLayerBias() will return
+ * a error status miopenStatusBadParm for input biasID associated with the input GEMM.
  *
  * @param handle          MIOpen handle (input)
  * @param rnnDesc         RNN layer descriptor type (input)
@@ -2138,18 +2165,24 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBias(miopenHandle_t handle,
  * For miopenLSTM paramID 0 to 3 refer to the weight matrix offsets associated
  * with the input GEMM, 4-7 are associated with matrix offset associated with the
  * hidden state GEMM.
- * ParamID 0 and 4 are for the input gate operations.
- * ParamID 1 and 5 are for the forget gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
- * ParamID 3 and 7 are for the output gate operations.
  *
+ * * paramID 0 and 4 are for the input gate operations.
+ *
+ * * paramID 1 and 5 are for the forget gate operations.
+ *
+ * * paramID 2 and 6 are for the memory gate operations.
+ *
+ * * paramID 3 and 7 are for the output gate operations.
  *
  * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
+ * with the input GEMM, while 3 through 5 are associated with the hidden state
  * GEMM.
- * ParamID 0 and 4 are for the reset gate operations.
- * ParamID 1 and 5 are for the update gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
+ *
+ * * paramID 0 and 3 are for the reset gate operations.
+ *
+ * * paramID 1 and 4 are for the update gate operations.
+ *
+ * * paramID 2 and 5 are for the memory gate operations.
  *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
@@ -2161,6 +2194,10 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBias(miopenHandle_t handle,
  * The argument layerParamOffset should either be nullptr, or an address to place the
  * offset. If layerParamOffset is nullptr then only the paramDesc is populated and returned.
  *
+ * Note: When inputSkip mode is selected there is no input layer matrix operation,
+ * and therefore no associated memory. In this case miopenGetRNNLayerParamOffset() will return
+ * a error status miopenStatusBadParm for input paramID associated with the input GEMM.
+ *
  *
  * @param rnnDesc           RNN layer descriptor type (input)
  * @param layer             The layer number in the RNN stack (input)
@@ -2185,21 +2222,25 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParamOffset(miopenRNNDescriptor_t
  * bias associated with the in input GEMM, while biasID == 1 retrieves
  * the weight matrix associated with the hidden state GEMM.
  *
- * For miopenLSTM paramID 0 to 3 refer to the bias offset associated
+ * For miopenLSTM biasID 0 to 3 refer to the bias offset associated
  * with the input GEMM, 4-7 are the bias offsets associated with the hidden state GEMM.
- * biasID 0 and 4 are for the input gate operations.
- * biasID 1 and 5 are for the forget gate operations.
- * biasID 2 and 6 are for the memory gate operations.
- * biasID 3 and 7 are for the output gate operations.
  *
+ * * biasID 0 and 4 are for the input gate operations.
  *
- * For miopenGRU biasID 0 to 2 refer to the bias offsets associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
- * GEMM.
- * biasID 0 and 4 are for the reset gate operations.
- * biasID 1 and 5 are for the update gate operations.
- * biasID 2 and 6 are for the memory gate operations.
+ * * biasID 1 and 5 are for the forget gate operations.
+ *
+ * * biasID 2 and 6 are for the memory gate operations.
+ *
+ * * biasID 3 and 7 are for the output gate operations.
+ *
+ * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM,
+ * while 3 through 5 are associated with the hidden state GEMM.
+ *
+ * * biasID 0 and 3 are for the reset gate operations.
  *
+ * * biasID 1 and 4 are for the update gate operations.
+ *
+ * * biasID 2 and 5 are for the memory gate operations.
  *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
@@ -2211,6 +2252,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerParamOffset(miopenRNNDescriptor_t
  * The argument layerBiasOffset should either be nullptr, or point to an output address.
  * If layerBias is nullptr then only the biasDesc is populated and returned.
  *
+ * Note: When inputSkip mode is selected there is no input layer matrix operation,
+ * and therefore no associated memory. In this case miopenGetRNNLayerBiasOffset() will return
+ * a error status miopenStatusBadParm for input biasID associated with the input GEMM.
  *
  * @param rnnDesc         RNN layer descriptor type (input)
  * @param layer           The layer number in the RNN stack (input)
@@ -2239,18 +2283,25 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasOffset(miopenRNNDescriptor_t r
  * For miopenLSTM paramID 0 to 3 refer to the weight matrices associated
  * with the input GEMM, 4-7 are associated with matrices associated with the
  * hidden state GEMM.
- * ParamID 0 and 4 are for the input gate operations.
- * ParamID 1 and 5 are for the forget gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
- * ParamID 3 and 7 are for the output gate operations.
  *
+ * * paramID 0 and 4 are for the input gate operations.
+ *
+ * * paramID 1 and 5 are for the forget gate operations.
+ *
+ * * paramID 2 and 6 are for the memory gate operations.
+ *
+ * * paramID 3 and 7 are for the output gate operations.
  *
- * For miopenGRU paramID 0 to 2 refer to the weight matrices associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
+ *
+ * For miopenGRU paramID 0 to 2 refer to the weight matrix offset associated
+ * with the input GEMM, while 3 through 5 are associated with the hidden state
  * GEMM.
- * ParamID 0 and 4 are for the reset gate operations.
- * ParamID 1 and 5 are for the update gate operations.
- * ParamID 2 and 6 are for the memory gate operations.
+ *
+ * * paramID 0 and 3 are for the reset gate operations.
+ *
+ * * paramID 1 and 4 are for the update gate operations.
+ *
+ * * paramID 2 and 5 are for the memory gate operations.
  *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
@@ -2258,6 +2309,9 @@ MIOPEN_EXPORT miopenStatus_t miopenGetRNNLayerBiasOffset(miopenRNNDescriptor_t r
  * The input argument paramDesc is a previously populated tensor descriptor typically
  * by first calling miopenGetRNNLayerParam().
  *
+ * Note: When inputSkip mode is selected there is no input layer matrix operation,
+ * and therefore no associated memory. In this case miopenSetRNNLayerParam() will return
+ * a error status miopenStatusBadParm for input paramID associated with the input GEMM.
  *
  * @param handle          MIOpen handle (input)
  * @param rnnDesc         RNN layer descriptor type (input)
@@ -2288,22 +2342,27 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerParam(miopenHandle_t handle,
  * weight matrix associated with the in input GEMM, while biasID == 1 retrieves
  * the bias associated with the hidden state GEMM.
  *
- * For miopenLSTM paramID 0 to 3 refer to the biases associated
+ * For miopenLSTM biasID 0 to 3 refer to the biases associated
  * with the input GEMM, 4-7 are associated with the biases associated with the
  * hidden state GEMM.
- * biasID 0 and 4 are for the input gate operations.
- * biasID 1 and 5 are for the forget gate operations.
- * biasID 2 and 6 are for the memory gate operations.
- * biasID 3 and 7 are for the output gate operations.
  *
+ * * biasID 0 and 4 are for the input gate operations.
  *
- * For miopenGRU biasID 0 to 2 refer to the biases associated
- * with the input GEMM, while 5 through 6 are associated with the hidden state
- * GEMM.
- * biasID 0 and 4 are for the reset gate operations.
- * biasID 1 and 5 are for the update gate operations.
- * biasID 2 and 6 are for the memory gate operations.
+ * * biasID 1 and 5 are for the forget gate operations.
+ *
+ * * biasID 2 and 6 are for the memory gate operations.
+ *
+ * * biasID 3 and 7 are for the output gate operations.
+ *
+ *
+ * For miopenGRU biasID 0 to 2 refer to the biases associated with the input GEMM,
+ * while 3 through 5 are associated with the hidden state GEMM.
+ *
+ * * biasID 0 and 3 are for the reset gate operations.
+ *
+ * * biasID 1 and 4 are for the update gate operations.
  *
+ * * biasID 2 and 5 are for the memory gate operations.
  *
  * For bi-directional RNNs the backwards in time direction is numbered as the layer
  * directly after the forward in time direction.
@@ -2311,6 +2370,9 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerParam(miopenHandle_t handle,
  * The input argument biasDesc is a previously populated tensor descriptor typically
  * by first calling miopenGetRNNLayeBias().
  *
+ * Note: When inputSkip mode is selected there is no input layer matrix operation,
+ * and therefore no associated memory. In this case miopenSetRNNLayerBias will return
+ * a error status miopenStatusBadParm for input biasID associated with the input GEMM.
  *
  * @param handle          MIOpen handle (input)
  * @param rnnDesc         RNN layer descriptor type (input)
@@ -2351,13 +2413,15 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerBias(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param hx                    Pointer to the hidden layer input tensor (input)
+ * @param hx                    Pointer to the hidden layer input tensor. If hx is NULL,
+ * then the initial hidden state will be zero initialized. (input)
  * @param cxDesc                A cell tensor descriptor that has as its first dimension
  * of the number of layers if the direction mode is unidirectional and twice the
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param cx                    Pointer to the cell layer input tensor (input)
+ * @param cx                    Pointer to the cell layer input tensor. If cx is NULL,
+ * then the initial cell state will be zero initialized. (input)
  * @param wDesc                 A weights tensor descriptor (input)
  * @param w                     Pointer to input weights tensor (input)
  * @param yDesc                 An array of fully packed tensor descriptors associated
@@ -2373,13 +2437,15 @@ MIOPEN_EXPORT miopenStatus_t miopenSetRNNLayerBias(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param hy                    Pointer to the hidden layer output tensor (output)
+ * @param hy                    Pointer to the hidden layer output tensor. If hy is NULL,
+ * then the final hidden state will not be saved. (output)
  * @param cyDesc                A cell tensor descriptor that has as its first dimension
  * of the number of layers if the direction mode is unidirectional and twice the
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param cy                    Pointer to the cell layer output tensor (output)
+ * @param cy                    Pointer to the cell layer output tensor. If hy is NULL,
+ * then the final cell state will not be saved. (output)
  * @param workSpace             Pointer to memory allocated for forward training (input)
  * @param workSpaceNumBytes     Number of allocated bytes in memory for the workspace (input)
  * @param reserveSpace          Pointer to memory allocated for random states (input / output)
@@ -2436,7 +2502,8 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNForwardTraining(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param dcy                   Pointer to the cell layer input tensor (input)
+ * @param dcy                   Pointer to the cell layer input tensor. If dcy is NULL,
+ * then the initial delta cell state will be zero initialized. (input)
  * @param wDesc                 A weights tensor descriptor (input)
  * @param w                     Pointer to input weights tensor (input)
  * @param hxDesc                An input hidden tensor descriptor that has as its first dimension
@@ -2444,13 +2511,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNForwardTraining(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param hx                    Pointer to output tensor (input)
+ * @param hx                    Pointer to the hidden layer input tensor. If hx is NULL,
+ * then the initial hidden state will be zero initialized. (input)
  * @param cxDesc                A input cell tensor descriptor that has as its first dimension
  * of the number of layers if the direction mode is unidirectional and twice the
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param cx                    Pointer to the hidden layer output tensor (input)
+ * @param cx                    Pointer to the hidden layer input tensor. If cx is NULL,
+ * then the initial cell state will be zero initialized. (input)
  * @param dxDesc                An array of tensor descriptors. These are the
  * input descriptors to each time step. The first dimension of each descriptor is the
  * batch size and may decrease from element n to element n+1 and not increase in size.
@@ -2462,13 +2531,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNForwardTraining(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param dhx                   Pointer to the cell layer output tensor (output)
+ * @param dhx                   Pointer to the delta hidden layer output tensor. If dhx is NULL
+ * the hidden gradient will not ouput. (output)
  * @param dcxDesc               A tensor descriptor that has as its first dimension
  * of the number of layers if the direction mode is unidirectional and twice the
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param dcx                   Pointer to the cell layer output tensor (output)
+ * @param dcx                   Pointer to the cell layer output tensor. If dcx is NULL
+ * the cell gradient will not ouput. (output)
  * @param workSpace             Pointer to memory allocated for forward training (input)
  * @param workSpaceNumBytes     Number of allocated bytes in memory for the workspace (input)
  * @param reserveSpace          Pointer to memory allocated for random states (input / output)
@@ -2521,7 +2592,8 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardData(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param hx                    Pointer to the hidden layer input tensor (input)
+ * @param hx                    Pointer to the hidden layer input tensor. If hx is NULL,
+ * then the initial hidden state will be zero initialized. (input)
  * @param yDesc                 An array of fully packed tensor descriptors associated
  * with the output from each time step. The first dimension of the tensor descriptors
  * must equal the first dimension of the first descriptor (batch size) in the xDesc
@@ -2529,9 +2601,9 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardData(miopenHandle_t handle,
  * depends on the direction mode selected. If the direction mode is unidirectional,
  * the second dimension is the hiddenSize. If direction mode is bidirectional
  * the second dimension is twice the hiddenSize. (input)
- * @param y                     Pointer to the cell layer input tensor (input)
+ * @param y                     Pointer to the output tensor (input)
  * @param dwDesc                A weights tensor descriptor (input)
- * @param dw                    Pointer to input weights tensor (output)
+ * @param dw                    Pointer to input weights tensor (input / output)
  * @param workSpace             Pointer to memory allocated for forward training (input)
  * @param workSpaceNumBytes     Number of allocated bytes in memory for the workspace (input)
  * @param reserveSpace          Pointer to memory allocated for random states (input)
@@ -2572,13 +2644,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardWeights(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param hx                    Pointer to the hidden layer input tensor (input)
+ * @param hx                    Pointer to the hidden layer input tensor. If hx is NULL,
+ * then the initial hidden state will be zero initialized. (input)
  * @param cxDesc                A cell tensor descriptor that has as its first dimension
  * of the number of layers if the direction mode is unidirectional and twice the
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param cx                    Pointer to the cell layer input tensor (input)
+ * @param cx                    Pointer to the cell layer input tensor. If cx is NULL,
+ * then the initial cell state will be zero initialized. (input)
  * @param wDesc                 A weights tensor descriptor (input)
  * @param w                     Pointer to input weights tensor (input)
  * @param yDesc                 An array of fully packed tensor descriptors associated
@@ -2594,13 +2668,15 @@ MIOPEN_EXPORT miopenStatus_t miopenRNNBackwardWeights(miopenHandle_t handle,
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param hy                    Pointer to the hidden layer output tensor (output)
+ * @param hy                    Pointer to the hidden layer output tensor. If hy is NULL,
+ * then the final hidden state will not be saved. (output)
  * @param cyDesc                A output cell tensor descriptor that has as its first dimension
  * of the number of layers if the direction mode is unidirectional and twice the
  * number of layers if the direction mode is bidirectional. The second dimension of
  * the descriptor must equal the largest first dimension of the xDesc tensor descriptor
  * array. The third dimension equals the hiddenSize. (input)
- * @param cy                    Pointer to the cell layer output tensor (output)
+ * @param cy                    Pointer to the cell layer output tensor. If cy is NULL,
+ * then the final cell state will not be saved. (output)
  * @param workSpace             Pointer to memory allocated for forward training (input)
  * @param workSpaceNumBytes     Number of allocated bytes in memory for the workspace (input)
  * @return                      miopenStatus_t
diff --git a/src/kernels/MIOpenLRNBwd.cl b/src/kernels/MIOpenLRNBwd.cl
index b5c235b25e..9bdb6f7529 100644
--- a/src/kernels/MIOpenLRNBwd.cl
+++ b/src/kernels/MIOpenLRNBwd.cl
@@ -40,7 +40,7 @@
 #endif
 
 #define _FLOAT2 PPCAT(_FLOAT, TWO)
-#define _FLOAT2 PPCAT(_FLOAT, THREE)
+#define _FLOAT3 PPCAT(_FLOAT, THREE)
 #define _FLOAT4 PPCAT(_FLOAT, FOUR)
 #define _FLOAT8 PPCAT(_FLOAT, EIGHT)