forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 0
Half precision floating point
KAWASHIMA Takahiro edited this page Feb 13, 2023
·
13 revisions
- Format
- IEEE
- ISO/IEC/IEEE JTC 1/SC 25
- C
- ISO/IEC JTC 1/SC 22 WG 14 (C WG)
- C++
- ISO/IEC JTC 1/SC 22 WG 21 (C++ WG)
- Fortran
- ISO/IEC JTC 1/SC 22 WG 5 (FORTRAN WG)
- MPI
- MPI Forum
- IEEE 754-2019 - IEEE Standard for Floating-Point Arithmetic
- https://ieeexplore.ieee.org/document/8766229 (2019-07-22)
-
binary16
is defined as a format
- ISO/IEC 60559:2020 Information technology - Microprocessor Systems - Floating-Point arithmetic
- https://www.iso.org/standard/80985.html (2020-05)
- Same as IEEE 754-2019
- bfloat16
- ARM alternative half-precision format
- ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015): Information technology - Programming languages, their environments, and system software interfaces - Floating-point extensions for C -
- https://www.iso.org/standard/65615.html (2015-10)
- https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1945.pdf (2015-06-10)
- ISO/IEC TS (technical specification) which adds IEEE 754-2008 support to C
-
_Float16
and_Float16 _Complex
are defined as types for IEEE 754-2008binary16
- Extension of ISO/IEC 9899:2011 (C11)
- Not included in ISO/IEC 9899:2017 (C18)
- ISO/IEC JTC 1/SC 22/WG 14 N2016: Adding Fundamental Type for Short Float
- https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2016.pdf (2016-02-14)
- Same as ISO/IEC JTC 1/SC 22/WG 21 P0192R1 of C++
- ISO/IEC JTC 1/SC 22/WG 14 N2487: short float
- https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2487.htm (2020-02-20)
- Follow-up on N2016
- C++ Standards Committee Papers
- ISO/IEC JTC 1/SC 22/WG 21 P0192R0: Adding Fundamental Type for Short Float
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0192r0.pdf (2015-11-11)
-
short float
is proposed for C and C++ as new float type, shorter than 32 bit - No description about complex types
- ISO/IEC JTC 1/SC 22/WG 21 P0192R1: Adding Fundamental Type for Short Float
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0192r1.pdf (2016-02-14)
- Improved version of P0192R0
- Same as ISO/IEC JTC 1/SC 22/WG 14 N2016 of C
- ISO/IEC JTC 1/SC 22/WG 21 P0303R0: Extensions to C++ for Short Float Type
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0303r0.pdf (2017-10-15)
- Based on P0192R1
- Modification points against current C++ working draft are shown
-
std::complex<short float>
is added for a C++ complex type
- ISO/IEC JTC 1/SC 22/WG 21 P0192R4:
short float
and fixed-size floating point types- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0192r4.html (2018-10-08)
- Improved version of P0192R1
- Rejected in November 2018
-
short float
- Same as or shorter than
float
- Bit length is not specified
- Same as or shorter than
std::complex<short float>
-
std::float16_t
,std::float32_t
,std::float64_t
- IEEE 754-2008 -compliant types
- Bit length is specified
- ISO/IEC JTC 1/SC 22/WG 21 P1467R9: Extended floating-point types and standard names
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html (2022-04-22)
-
std::float16_t
,std::float32_t
,std::float64_t
,std::float128_t
- IEEE 754-2008 -compliant types
- Bit length is specified
std::bfloat16_t
- ISO/IEC JTC 1/SC 22/WG 21 P1468R0: Fixed-layout floating-point type aliases
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1468r3.html (2020-01-10)
- Merged into P1467R4 (and newer)
- GitHub issues
- 16-bit floating-point support for C/C++
- define language-agnostic, IEEE types|
- Data type naming rule
- Slides
- pt2pt wg FP16 MPI Forum Meeting Dec 4 - Dec 7, 2017, San Jose
- MPI Forum virtual meeting FP16 Jan 31, 2018
- AArch64 v8.2 FP16 extensions is supported for ARM starting from QEMU 2.12
-
_Float16
and__fp16
are supported in C and C++ for several platforms- ARM: GCC 7 and newer
- AArch64: GCC 7 and newer
- x86: GCC 12 and newer
- Documents:
- https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html (latest version)
- https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html (latest version)
- https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Floating-Types.html (GCC 7.1.0)
- https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Half-Precision.html (GCC 7.1.0)
- https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Floating-Types.html (GCC 12.1.0)
- https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Half-Precision.html (GCC 12.1.0)
-
_Float16
is supported in C and C++ for several platforms- 32-bit Arm: Clang 6 and newer
- 64-bit Arm (AArch64): Clang 6 and newer
- SPIR: Clang 6 and newer
- AMDGPU: Clang 13 and newer
- X86: Clang 14 and newer
- Other platforms: Clang 6 and 7 (dropped in Clang 8 to wait ABI standardization)
-
__fp16
is supported in C and C++ for all platforms starting from Clang 6 -
__bf16
is supported in C and C++ for several platforms- 32-bit ARM: Clang 13 and newer
- 64-bit ARM (AArch64): Clang 13 and newer
- X86: Clang 14 and newer
-
real(kind=2)
is supported in Fortran (detail is unexamined) - Documents:
- https://clang.llvm.org/docs/LanguageExtensions.html (latest version)
- http://releases.llvm.org/6.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point (Clang 6.0.0)
- http://releases.llvm.org/8.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point (Clang 8.0.0)
- http://releases.llvm.org/13.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point (Clang 13.0.0)
-
GtiHub issue of
MPI_REAL2
andMPI_COMPLEX4
-
GitHub PR for
MPI_REAL2
,MPI_COMPLEX4
,MPIX_SHORT_FLOAT
,MPIX_C_SHORT_FLOAT_COMPLEX
,MPIX_CXX_SHORT_FLOAT_COMPLEX
, andMPIX_C_FLOAT16
- GitHub issue and PR for
MPIX_C_FLOAT16