Doc updates for 1.6 release (#1315)

* Fusion and miopen.h doc updates. * Adding fusion and readme changes, plus debugging environmental variables. * Added 1.6 release notes. * updated the supported fusions table * Fixed comments on PR. Updated release notes and date. * Changed Roc to roc. * Added version for rocBlas * added minimum rocblas version * updated date * Added Winograd debug flags. * Added a dependency path cmake example * removed a 1 * Updated date
ROCm · Nov 19, 2018 · ffedda8 · ffedda8
1 parent 50084ba
commit ffedda8
Show file tree

Hide file tree

Showing 6 changed files with 303 additions and 48 deletions.
diff --git a/README.md b/README.md
@@ -17,6 +17,7 @@ AMD's library for high peformance machine learning primitives. MIOpen supports t
 * [OpenSSL](https://www.openssl.org/) or [libressl](https://www.libressl.org/)
 * [Boost](http://www.boost.org/) at least version 1.58
   * MIOpen uses `boost-system` and `boost-filesystem` packages to enable persistent [kernel cache](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/doc/src/cache.md)
+* [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS) Minimum version 14.3
 
 ## Installing the dependencies
 
@@ -27,6 +28,9 @@ This will install by default to `/usr/local` but it can be installed in another
 ```
 cmake -P install_deps.cmake --prefix /some/local/dir
 ```
+This prefix can used to specify the dependency path during the configuration phase using the `CMAKE_PREFIX_PATH`.
+
+MIOpen's HIP backend uses [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS) by default. Users can intall RocBlas minimum release by using `apt-get install rocblas`. To disable using RocBlas set the configuration flag `-DMIOPEN_USE_ROCBLAS=Off`. rocBlas is *not* available for the OpenCL backend.
 
 Instructions to manually install all the dependencies on Ubuntu v16 are present in this [section](#installing-the-dependencies-manually).
 
@@ -64,6 +68,11 @@ The above assumes that OpenCL is installed in one of the standard locations. If
 cmake -DMIOPEN_BACKEND=OpenCL -DOPENCL_LIBRARIES=<opencl-library-path> -DOPENCL_INCLUDE_DIRS=<opencl-headers-path> ..
 ```
 
+And an example setting the dependency path:
+```
+cmake -DMIOPEN_BACKEND=OpenCL -DCMAKE_PREFIX_PATH=/some/local/dir ..
+```
+
 #### For HIP, run:
 
 Set the C++ compiler to `hcc`.

diff --git a/doc/src/DebugAndLogging.md b/doc/src/DebugAndLogging.md
@@ -0,0 +1,43 @@
+Debugging and Logging
+=====================
+
+## Logging
+The most basic enviromental variable for debugging purposes is `MIOPEN_ENABLE_LOGGING=1`. This will give the user basic layer by layer call and configurations. If bulding from source, a user can use the environmental variable `MIOPEN_ENABLE_LOGGING_CMD=1` to output the associated `MIOpenDriver` command line.
+
+
+## Log Levels
+The `MIOPEN_LOG_LEVEL` environment variable controls the verbosity of the messages printed by MIOpen onto console. Allowed values are:
+* 0 - Default. Works as level 4 for Release builds, level 5 for Debug builds.
+* 1 - Quiet. No logging messages (except those controlled by MIOPEN_ENABLE_LOGGING).
+* 2 - Fatal errors only (not used yet).
+* 3 - Errors and fatals.
+* 4 - All errors and warnings.
+* 5 - Info. All the above plus information for debugging purposes.
+* 6 - Detailed info. All the above plus more detailed information for debugging.
+* 7 - Trace: the most detailed debugging info plus all above (not used so far).
+
+All messages output via `stderr`.
+
+
+## Layer Filtering
+The following list of environment variables can be helpful for both debugging MIOpen as well integration with frameworks.
+
+* `MIOPEN_ENABLE_LOGGING=1` – log all the MIOpen APIs called including the parameters passed to those APIs.
+* `MIOPEN_DEBUG_GCN_ASM_KERNELS=0` – disable hand-tuned asm. kernels for Direct convolution algorithm. Fall-back to kernels written in high-level language.
+* `MIOPEN_DEBUG_CONV_FFT=0` – disable FFT convolution algorithm. 
+* `MIOPEN_DEBUG_CONV_DIRECT=0` – disable Direct convolution algorithm.
+* `MIOPEN_DEBUG_AMD_ROCM_PRECOMPILED_BINARIES=0` - this disables binary Winograd kernels, however, not all Winograds are binaries. To disable all Winograd algorithms, the following two vars can be used:
+* MIOPEN_DEBUG_AMD_WINOGRAD_3X3=0 - FP32 Winograd Fwd/Bwd, filter size fixed to 3x3.
+* MIOPEN_DEBUG_AMD_WINOGRAD_RXS=0 - FP32 and FP16 Winograd Fwd/Bwd, variable filter size.
+
+## rocBlas Logging
+The `ROCBLAS_LAYER` environmental variable can be set to output GEMM information:
+* `ROCBLAS_LAYER=` - is not set, there is no logging
+* `ROCBLAS_LAYER=1` - is set to 1, then there is trace logging
+* `ROCBLAS_LAYER=2` - is set to 2, then there is bench logging
+* `ROCBLAS_LAYER=3` - is set to 3, then there is both trace and bench logging
+
+ To disable using rocBlas entirely set the configuration flag `-DMIOPEN_USE_ROCBLAS=Off` during MIOpen configuration.
+
+
+More information on logging with RocBlas can be found [here](https://github.com/ROCmSoftwarePlatform/rocBLAS/wiki/5.Logging).
diff --git a/doc/src/Getting_Started_FusionAPI.md b/doc/src/Getting_Started_FusionAPI.md
@@ -181,17 +181,203 @@ It may be noted that it is an error to attempt to execute a fusion plan that is
 Once the application is done with the fusion plan, the fusion plan and the fusion args objects may be destroyed using the API calls:
 
 ```cpp
-miopenStatus_t miopenDestroyFusionPlan(miopenFusionPlanDescriptor_t fusePlanDesc);
-
 miopenStatus_t miopenDestroyFusionPlan(miopenFusionPlanDescriptor_t fusePlanDesc);
 ```
 Once the fusion plan object is destroyed, all the operations created are destroyed automatically and do not need any special cleanup.
 
 
-## Supported Fusions
-The table below outlines the supported fusions as well as any applicable constraints. Currently, only convolutions with unit stride and unit dilation are supported. Currently, the fusion API is in the initial phases of development and may change.
+## Supported Fusions 
+The tables below outlines the supported fusions for fp32 and fp16 as well as any applicable constraints. **(C = convolution, B = bias, N = batch normalization, A = activation)**
+
+
+### Convolution based FP32 Fusion for Inference
 
-<table border=0 cellpadding=0 cellspacing=0 width=713 style='border-collapse:
+
+<table border=1 cellpadding=0 cellspacing=0 width=714 style='border-collapse:
+ collapse;table-layout:fixed;width:535pt'>
+ <col width=93 style='mso-width-source:userset;mso-width-alt:2986;width:70pt'>
+ <col width=76 style='mso-width-source:userset;mso-width-alt:2432;width:57pt'>
+ <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
+ <col width=171 style='mso-width-source:userset;mso-width-alt:5461;width:128pt'>
+ <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
+ <col width=140 style='mso-width-source:userset;mso-width-alt:4480;width:105pt'>
+ <col width=132 style='mso-width-source:userset;mso-width-alt:4224;width:99pt'>
+ <tr height=21 style='height:16.0pt'>
+  <td colspan=7 height=21 class=xl66 width=714 style='height:16.0pt;width:535pt'>Single
+  Precision Floating Point</td>
+ </tr>
+ <tr height=45 style='height:34.0pt'>
+  <td height=45 class=xl66 style='height:34.0pt'><center><b>Combination</b></center></td>
+  <td class=xl66><center><b>Conv Algo</b></center></td>
+  <td class=xl66><center><b>Stride</b></center></td>
+  <td class=xl66><center><b>Filter Dims</b></center></td>
+  <td class=xl68 width=51 style='width:38pt'><center><b>N Mode*</b></center></td>
+  <td class=xl66><center><b>Activations</b></center></td>
+  <td class=xl68 width=132 style='width:99pt'><center><b>Other Constraints</b></center></td>
+ </tr>
+ <tr height=107 style='mso-height-source:userset;height:80.0pt'>
+  <td height=107 class=xl65 style='height:80.0pt'>CBNA</td>
+  <td class=xl65>Direct</td>
+  <td class=xl67 width=51 style='width:38pt'>1 and 2</td>
+  <td class=xl67 width=171 style='width:128pt'>3x3, 5x5, 7x7, 9x9, 11x11</td>
+  <td class=xl65>All</td>
+  <td class=xl65>All</td>
+  <td class=xl67 width=132 style='width:99pt'>stride and padding must be either
+  1 or 2</td>
+ </tr>
+ <tr height=63 style='mso-height-source:userset;height:47.0pt'>
+  <td rowspan=12 height=354 class=xl65 style='height:263.0pt'>CBA</td>
+  <td class=xl65>Direct</td>
+  <td class=xl65></td>
+  <td class=xl67 width=171 style='width:128pt'>1x1</td>
+  <td class=xl65></td>
+  <td class=xl65>All</td>
+  <td class=xl67 width=132 style='width:99pt'>stride/ padding not supported</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td rowspan=11 height=291 class=xl65 style='height:216.0pt'>Winograd</td>
+  <td class=xl65>1</td>
+  <td class=xl67 width=171 style='width:128pt'>1x1, 2x2</td>
+  <td class=xl65>N/A</td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>c &gt;= 18</td>
+ </tr>
+ <tr height=39 style='mso-height-source:userset;height:29.0pt'>
+  <td height=39 class=xl65 style='height:29.0pt'>1</td>
+  <td class=xl67 width=171 style='width:128pt'>3x3</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>c &gt;= 18 and c is even</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>1</td>
+  <td class=xl67 width=171 style='width:128pt'>4x4, 5x5, 6x6</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>4 x c &gt;= 18</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>1</td>
+  <td class=xl67 width=171 style='width:128pt'>7x7, 8x8, 9x9</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>12 x c &gt;= 18</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>1</td>
+  <td class=xl67 width=171 style='width:128pt'>10x10, 11x11, 12x12</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>16 x c &gt;= 18</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>1</td>
+  <td class=xl67 width=171 style='width:128pt'>larger filter sizes</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>none</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>2</td>
+  <td class=xl67 width=171 style='width:128pt'>1x1</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>2 x c &gt;= 18</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>2</td>
+  <td class=xl67 width=171 style='width:128pt'>2x2, 3x3, 4x4, 5x5, 6x6</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>4 x c &gt;= 18</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>2</td>
+  <td class=xl67 width=171 style='width:128pt'>7x7</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>12 x c &gt;= 18</td>
+ </tr>
+ <tr height=45 style='height:34.0pt'>
+  <td height=45 class=xl65 style='height:34.0pt'>2</td>
+  <td class=xl67 width=171 style='width:128pt'>8x8, 9x9, 10x10, 11x11, 12x12</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>16 x c &gt;= 18</td>
+ </tr>
+ <tr height=23 style='height:17.0pt'>
+  <td height=23 class=xl65 style='height:17.0pt'>2</td>
+  <td class=xl67 width=171 style='width:128pt'>larger filter sizes</td>
+  <td class=xl65></td>
+  <td class=xl65>Relu, Leaky Relu</td>
+  <td class=xl67 width=132 style='width:99pt'>none</td>
+ </tr>
+ <tr height=45 style='height:34.0pt'>
+  <td height=45 class=xl65 style='height:34.0pt'>NA</td>
+  <td class=xl65>-</td>
+  <td class=xl65></td>
+  <td class=xl65>-</td>
+  <td class=xl65>All</td>
+  <td class=xl65>All</td>
+  <td class=xl67 width=132 style='width:99pt'>Padding not supported</td>
+ </tr>
+</table>
+*N mode is either spatial, or per activation. For CBA other asymmetric kernels are supported as well, but are not enumerated here for brevity.
+
+<br><br>
+
+
+### Convolution based FP16 Fusion for Inference
+
+<table border=1 cellpadding=0 cellspacing=0 width=714 style='border-collapse:
+ collapse;table-layout:fixed;width:535pt'>
+ <col width=93 style='mso-width-source:userset;mso-width-alt:2986;width:70pt'>
+ <col width=76 style='mso-width-source:userset;mso-width-alt:2432;width:57pt'>
+ <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
+ <col width=171 style='mso-width-source:userset;mso-width-alt:5461;width:128pt'>
+ <col width=51 style='mso-width-source:userset;mso-width-alt:1621;width:38pt'>
+ <col width=140 style='mso-width-source:userset;mso-width-alt:4480;width:105pt'>
+ <col width=132 style='mso-width-source:userset;mso-width-alt:4224;width:99pt'>
+ <tr height=21 style='height:16.0pt'>
+  <td colspan=7 height=21 class=xl67 width=714 style='height:16.0pt;width:535pt'><center><b>Half
+  Precision Floating Point</td></b></center>
+ </tr>
+ <tr height=45 style='height:34.0pt'>
+  <td height=45 class=xl66 style='height:34.0pt'><center><b>Combination</b></center></td>
+  <td class=xl66><center><b>Conv Algo</b></center></td>
+  <td class=xl66><center><b>Stride</b></center></td>
+  <td class=xl66><center><b>Filter Dims</b></center></td>
+  <td class=xl68 width=51 style='width:38pt'><center><b>N Mode*</b></center></td>
+  <td class=xl66><center><b>Activations</b></center></td>
+  <td class=xl68 width=132 style='width:99pt'><center><b>Other Constraints</b></center></td>
+ </tr>
+ <tr height=68 style='height:51.0pt'>
+  <td height=68 style='height:51.0pt'>CBNA</td>
+  <td>Direct</td>
+  <td class=xl69 width=51 style='width:38pt'>1 and 2</td>
+  <td>3x3, 5x5, 7x7, 9x9, 11x11</td>
+  <td>All</td>
+  <td>All</td>
+  <td class=xl68 width=132 style='width:99pt'>stride and padding must be either
+  1 or 2</td>
+ </tr>
+ <tr height=45 style='height:34.0pt'>
+  <td height=45 class=xl66 style='height:34.0pt'>CBA</td>
+  <td>Direct</td>
+  <td class=xl65></td>
+  <td>1x1</td>
+  <td></td>
+  <td>All</td>
+  <td class=xl68 width=132 style='width:99pt'>stride/ padding not supported</td>
+ </tr>
+</table>
+
+*N mode is either spatial, or per activation. 
+<br><br>
+
+
+### Batch Normalization based fusion for FP32 and FP16 for Inference and Training
+<table border=1 cellpadding=0 cellspacing=0 width=713 style='border-collapse:
  collapse;table-layout:fixed;width:534pt'>
  <col width=108 style='mso-width-source:userset;mso-width-alt:3456;width:81pt'>
  <col width=87 style='width:65pt'>
@@ -200,45 +386,34 @@ The table below outlines the supported fusions as well as any applicable constra
  <col width=123 style='mso-width-source:userset;mso-width-alt:3925;width:92pt'>
  <col width=87 style='width:65pt'>
  <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl65 width=108 style='height:34.0pt;width:81pt'>Combination</td>
-  <td class=xl65 width=87 style='width:65pt'>Conv Algo</td>
-  <td class=xl65 width=221 style='width:166pt'>Filter Dims</td>
-  <td class=xl65 width=87 style='width:65pt'>BN Mode</td>
-  <td class=xl65 width=123 style='width:92pt'>Activations</td>
-  <td class=xl65 width=87 style='width:65pt'>Other Constraints</td>
+  <td height=45 class=xl65 width=108 style='height:34.0pt;width:81pt'><center><b>Combination</b></center></td>
+  <td class=xl65 width=87 style='width:65pt'><center><b>N mode*</b></center></td>
+  <td class=xl65 width=123 style='width:92pt'><center><b>Activations</b></center></td>
+  <td class=xl65 width=87 style='width:65pt'><center><b>Constraints</b></center></td>
  </tr>
  <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl66 width=108 style='height:34.0pt;width:81pt'>CBNA</td>
-  <td class=xl66 width=87 style='width:65pt'>Direct</td>
-  <td class=xl66 width=221 style='width:166pt'>1x1, 3x3, 5x5, 7x7, 9x9, 11x11</td>
-  <td class=xl66 width=87 style='width:65pt'>All</td>
-  <td class=xl66 width=123 style='width:92pt'>All</td>
-  <td class=xl66 width=87 style='width:65pt'>Padding not supported</td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td rowspan=2 height=46 class=xl67 width=108 style='height:34.0pt;width:81pt'>CBA</td>
-  <td class=xl66 width=87 style='width:65pt'>Direct</td>
-  <td class=xl66 width=221 style='width:166pt'>1x1, 3x3, 5x5, 7x7, 9x9, 11x11</td>
-  <td class=xl66 width=87 style='width:65pt'></td>
-  <td class=xl66 width=123 style='width:92pt'>All</td>
-  <td class=xl66 width=87 style='width:65pt'></td>
- </tr>
- <tr height=23 style='height:17.0pt'>
-  <td height=23 class=xl66 width=87 style='height:17.0pt;width:65pt'>Winograd</td>
-  <td class=xl66 width=221 style='width:166pt'>3x3</td>
-  <td class=xl66 width=87 style='width:65pt'>N/A</td>
-  <td class=xl66 width=123 style='width:92pt'>Relu, Leaky Relu</td>
-  <td class=xl66 width=87 style='width:65pt'>c &gt;= 18</td>
+  <td height=45 class=xl66 width=108 style='height:34.0pt;width:81pt'>NA for inference</td>
+  <td class=xl66 width=87 style='width:65pt'><center>All</center></td>
+  <td class=xl66 width=123 style='width:92pt'><center>All</center></td>
+  <td class=xl66 width=87 style='width:65pt'>None </td>
  </tr>
  <tr height=45 style='height:34.0pt'>
-  <td height=45 class=xl66 width=108 style='height:34.0pt;width:81pt'>NA</td>
-  <td class=xl66 width=87 style='width:65pt'>-</td>
-  <td class=xl66 width=221 style='width:166pt'>-</td>
-  <td class=xl66 width=87 style='width:65pt'>All</td>
-  <td class=xl66 width=123 style='width:92pt'>All</td>
-  <td class=xl66 width=87 style='width:65pt'>Padding not supported</td>
+  <td height=46 class=xl67 width=108 style='height:34.0pt;width:81pt'>NA forward training</td>
+  <td class=xl66 width=87 style='width:65pt'><center>All</center></td>
+  <td class=xl66 width=123 style='width:92pt'><center>All</center></td>
+  <td class=xl66 width=87 style='width:65pt'>None </td>
  </tr>
+  <tr height=45 style='height:34.0pt'>
+  <td height=46 class=xl67 width=108 style='height:34.0pt;width:81pt'>NA backward training</td>
+  <td class=xl66 width=87 style='width:65pt'><center>All</center></td>
+  <td class=xl66 width=123 style='width:92pt'><center>All</center></td>
+  <td class=xl66 width=87 style='width:65pt'>None </td>
+ </tr>
+
 </table>
+*N mode is either spatial, or per activation.
+<br><br>
+
 
 
 ## Performance Comparison to Non-Fused Kernels

diff --git a/doc/src/index.rst b/doc/src/index.rst
@@ -14,6 +14,7 @@ Welcome to MIOpen
     releasenotes
     install
     driver
+    DebugAndLogging
     cache
     perfdatabase
     Getting_Started_FusionAPI

diff --git a/doc/src/releasenotes.md b/doc/src/releasenotes.md
@@ -1,6 +1,33 @@
 
 ## MIOpen Release notes
 
+
+### 11/18/2018 [ 1.6.0 ]
+
+- Training in fp16 (half precision) including mixed-precision is now fully supported
+- Batch Normalization in fp16 (half precision) including mixed-precision are now available
+- Performance improvements for 3x3 and 1x1 single-precision convolutions
+- Layer fusions for BatchNorm+Activation are now available
+- Layer fusions with convolutions now support varying strides and padding configurations
+
+Changes: 
+
+- rocBLAS is now used as the default BLAS library for the HIP backend (minimum version 14.3.0)
+- Fixed various bugs in convolution kernels
+- Fixed issues with bad references in layer fusion 
+- Fixed gfx803 assembily issues
+- Added support fp16 Winograd convolutions
+- Added support for fp16 pooling
+- Improved error reporting for convolutions and layer fusions
+- Improved documentation
+
+Known Issues:
+
+- RNNs do not support fp16
+- OpenCL backend does not have full fp16 support
+- Layer fusions for convolution 1x1 fp16 are not supported
+
+
 ### 09/14/2018 [ 1.5.0 ]
 
 Notes: