Release MIOpen v2.0.0 · ROCm/MIOpen

Notes:

This release contains several new features including an immediate mode for selecting convolutions, bfloat16 support, new layers, modes, and algorithms.
MIOpenDriver, a tool for benchmarking and developing kernels is now shipped with MIOpen.
BFloat16 now supported in HIP requires an updated rocBLAS as a GEMM backend.
Immediate mode API now provides the ability to quickly obtain a convolution kernel.
MIOpen now contains HIP source kernels and implements the ImplicitGEMM kernels. This is a new feature and is currently disabled by default. Use the environmental variable "MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=1" to activation this feature. ImplicitGEMM requires an up to date HIP version of at least 1.5.9211.
A new "loss" catagory of layers has been added, of which, CTC loss is the first. See the API reference for more details.
2.0 is the last release of active support for gfx803 architectures. In future releases, MIOpen will not actively debug and develop new features specifically for gfx803.
System Find-Db in memory cache is disabled by default. Please see build instructions to enable this feature.

Changes:

Added support for bfloat16 datatype in convolutions
Added softmax channel mode and new softmax version 2 API
Added fast / accurate / log softmax algorithms
Added new implicit GEMM convolution algorithm for forward and backwards data passes, disabled by default
Added int32 datatype support for output tensors in int8 convolutions
Added immediate mode for finding the best convolution kernel for a given configuration
Added a Find-Db infrastructure which stashes results of find on a user's system
Added a shipped System Find-Db containing offline run Find() results
Added an additional, faster batch norm assembly kernel for fp16
Added CTC loss layer
Added MIOpenDriver as a default component in MIOpen's build #34
Fixed C compatability for boolean types in C API #103
Fixed incorrect calculation in per-activation batch norm backwards pass #104
Fixed bug #95 with asm batch norm ISA
Fixed IsApplicable bug in Conv3x3Asm for group convolutions
Improved performance of 1x1 stride 2 fp32 convolutions in the forward and backwards data passes
Improved 3-D convolution stability
Improved applicability of direct convolution backwards weights for 2x2, 5x10, and 5x20 filter sizes
Improved maintainability in kernels and cpp code
Updated rocBLAS minimum version to branch master-rocm-2.6

Provide feedback