Enable gt half precision types for HIP #295

cmpfeil · 2025-02-12T13:40:28Z

Uses 16 bit FP types from HIP headers (<hip/hip_fp16.h>, <hip/hip_bf16.h>) when CUDA headers not available.

BF16 tests pass on AMD MI300A when built with module rocm/6.3 loaded via

cmake -S . -B build-hip -DCMAKE_INSTALL_PREFIX=build-hip -DGTENSOR_DEVICE=hip -DBUILD_TESTING=ON -DGTENSOR_ENABLE_BF16=ON -DCMAKE_CXX_COMPILER=$(which hipcc)
cmake --build build-hip --target install

(FP16 tests pass analogously, when built with -DGTENSOR_ENABLE_FP16=ON)

cmpfeil added 4 commits February 11, 2025 17:13

Feature FP16 for HIP

7dc16f5

Simplify to FLOAT16T_ON_DEVICE flag

1277a72

Feature BF16 for HIP

15b354f

Add suffix underscore for private member variables

5386fc9

cmpfeil marked this pull request as ready for review February 12, 2025 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable gt half precision types for HIP #295

Enable gt half precision types for HIP #295

cmpfeil commented Feb 12, 2025

Enable gt half precision types for HIP #295

Are you sure you want to change the base?

Enable gt half precision types for HIP #295

Conversation

cmpfeil commented Feb 12, 2025