Half precision floating point

Standardizing Bodies

Format
- IEEE
- ISO/IEC/IEEE JTC 1/SC 25
C
- ISO/IEC JTC 1/SC 22 WG 14 (C WG)
  - https://www.open-std.org/JTC1/SC22/WG14/
C++
- ISO/IEC JTC 1/SC 22 WG 21 (C++ WG)
  - https://www.open-std.org/JTC1/SC22/WG21/
Fortran
- ISO/IEC JTC 1/SC 22 WG 5 (FORTRAN WG)
  - https://wg5-fortran.org/
MPI
- MPI Forum
  - http://mpi-forum.org/

Standardizations

Format

IEEE 754-2019 - IEEE Standard for Floating-Point Arithmetic
- https://ieeexplore.ieee.org/document/8766229 (2019-07-22)
- binary16 is defined as a format
ISO/IEC 60559:2020 Information technology - Microprocessor Systems - Floating-Point arithmetic
- https://www.iso.org/standard/80985.html (2020-05)
- Same as IEEE 754-2019
bfloat16
ARM alternative half-precision format

C

ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015): Information technology - Programming languages, their environments, and system software interfaces - Floating-point extensions for C -
- https://www.iso.org/standard/65615.html (2015-10)
- https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1945.pdf (2015-06-10)
- ISO/IEC TS (technical specification) which adds IEEE 754-2008 support to C
- _Float16 and _Float16 _Complex are defined as types for IEEE 754-2008 binary16
- Extension of ISO/IEC 9899:2011 (C11)
- Not included in ISO/IEC 9899:2017 (C18)
ISO/IEC JTC 1/SC 22/WG 14 N2016: Adding Fundamental Type for Short Float
- https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2016.pdf (2016-02-14)
- Same as ISO/IEC JTC 1/SC 22/WG 21 P0192R1 of C++
ISO/IEC JTC 1/SC 22/WG 14 N2487: short float
- https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2487.htm (2020-02-20)
- Follow-up on N2016

C++

C++ Standards Committee Papers
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/
ISO/IEC JTC 1/SC 22/WG 21 P0192R0: Adding Fundamental Type for Short Float
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0192r0.pdf (2015-11-11)
- short float is proposed for C and C++ as new float type, shorter than 32 bit
- No description about complex types
ISO/IEC JTC 1/SC 22/WG 21 P0192R1: Adding Fundamental Type for Short Float
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0192r1.pdf (2016-02-14)
- Improved version of P0192R0
- Same as ISO/IEC JTC 1/SC 22/WG 14 N2016 of C
ISO/IEC JTC 1/SC 22/WG 21 P0303R0: Extensions to C++ for Short Float Type
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0303r0.pdf (2017-10-15)
- Based on P0192R1
- Modification points against current C++ working draft are shown
- std::complex<short float> is added for a C++ complex type
ISO/IEC JTC 1/SC 22/WG 21 P0192R4: short float and fixed-size floating point types
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0192r4.html (2018-10-08)
- Improved version of P0192R1
- Rejected in November 2018
- short float
  - Same as or shorter than float
  - Bit length is not specified
- std::complex<short float>
- std::float16_t, std::float32_t, std::float64_t
  - IEEE 754-2008 -compliant types
  - Bit length is specified
ISO/IEC JTC 1/SC 22/WG 21 P1467R9: Extended floating-point types and standard names
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html (2022-04-22)
- std::float16_t, std::float32_t, std::float64_t, std::float128_t
  - IEEE 754-2008 -compliant types
  - Bit length is specified
- std::bfloat16_t
ISO/IEC JTC 1/SC 22/WG 21 P1468R0: Fixed-layout floating-point type aliases
- https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1468r3.html (2020-01-10)
- Merged into P1467R4 (and newer)

MPI

GitHub issues
- 16-bit floating-point support for C/C++
  - https://github.com/mpi-forum/mpi-issues/issues/65
- define language-agnostic, IEEE types|
  - https://github.com/mpi-forum/mpi-issues/issues/66
- Data type naming rule
  - https://github.com/mpi-forum/mpi-issues/issues/74
Slides
- pt2pt wg FP16 MPI Forum Meeting Dec 4 - Dec 7, 2017, San Jose
  - https://github.com/mpi-forum/mpi-forum.github.io/blob/master/slides/2017/12/FP16-201712-AH-rev4.pdf
- MPI Forum virtual meeting FP16 Jan 31, 2018
  - https://github.com/mpi-forum/mpi-issues/files/1672779/FP16-20180131-AH-rev1.pdf

Emulators

QEMU

AArch64 v8.2 FP16 extensions is supported for ARM starting from QEMU 2.12
- https://wiki.qemu.org/ChangeLog/2.12#ARM

Compilers

GCC

_Float16 and __fp16 are supported in C and C++ for several platforms
- ARM: GCC 7 and newer
- AArch64: GCC 7 and newer
- x86: GCC 12 and newer
Documents:
- https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html (latest version)
- https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html (latest version)
- https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Floating-Types.html (GCC 7.1.0)
- https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Half-Precision.html (GCC 7.1.0)
- https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Floating-Types.html (GCC 12.1.0)
- https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Half-Precision.html (GCC 12.1.0)

Clang/Flang/LLVM

_Float16 is supported in C and C++ for several platforms
- 32-bit Arm: Clang 6 and newer
- 64-bit Arm (AArch64): Clang 6 and newer
- SPIR: Clang 6 and newer
- AMDGPU: Clang 13 and newer
- X86: Clang 14 and newer
- Other platforms: Clang 6 and 7 (dropped in Clang 8 to wait ABI standardization)
__fp16 is supported in C and C++ for all platforms starting from Clang 6
__bf16 is supported in C and C++ for several platforms
- 32-bit ARM: Clang 13 and newer
- 64-bit ARM (AArch64): Clang 13 and newer
- X86: Clang 14 and newer
real(kind=2) is supported in Fortran (detail is unexamined)
Documents:
- https://clang.llvm.org/docs/LanguageExtensions.html (latest version)
- http://releases.llvm.org/6.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point (Clang 6.0.0)
- http://releases.llvm.org/8.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point (Clang 8.0.0)
- http://releases.llvm.org/13.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point (Clang 13.0.0)

MPI Libraries

Open MPI

GtiHub issue of MPI_REAL2 and MPI_COMPLEX4
- https://github.com/open-mpi/ompi/issues/2653
GitHub PR for MPI_REAL2, MPI_COMPLEX4, MPIX_SHORT_FLOAT, MPIX_C_SHORT_FLOAT_COMPLEX, MPIX_CXX_SHORT_FLOAT_COMPLEX, and MPIX_C_FLOAT16
- https://github.com/open-mpi/ompi/pull/6205

MPICH

GitHub issue and PR for MPIX_C_FLOAT16
- https://github.com/pmodels/mpich/issues/3389
- https://github.com/pmodels/mpich/pull/3455

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half precision floating point

Standardizing Bodies

Standardizations

Format

C

C++

MPI

Emulators

QEMU

Compilers

GCC

Clang/Flang/LLVM

MPI Libraries

Open MPI

MPICH

Clone this wiki locally