Skip to content

Commit

Permalink
Fix Windows hipcub vector test failure for type bool on gfx12 (#426)
Browse files Browse the repository at this point in the history
Currently, when we perform tests on CubVector types, we:
- initialize vectors to a constant value
- double them (using addition) on the device
- double them on the host
- compare the host and device results to make sure they're the same,
using a bitwise-comparison

When the test is instantiated with type bool, it can fail. Here's why:

To maintain CUB compatibility, the CubVector type for bools uses
unsigned char as the backing storage type. As a result, in the vector_double_kernel,
when device_input is a CubVector of bools, the addition really operates on unsigned char,
and there is no cast back to bool on write-back to the device_output.

The cast back to bool is only done later on the host, and must use a reinterpret_cast to a bool pointer,
followed by a dereference. In this scenario, the compiler does not convert non-zero values to 1 (as it would
with a static_cast to bool operating on a value).
That means we can end up comparing two values that are both true but have different (non-zero) binary values.

This change works around the problem by switching to using subtraction in the vector_double_kernel
(device_output = device_input - device_input)
instead of addition. This means the result will be zero, which will always cast to the same binary false value (0).

To fix this properly requires grabbing the underlying storage type of the CubVector (unsigned char), reinterpret_casting
to that first, and then static_casting to bool. This would ensure that the compiler always converts non-zero values to 1
on the second step. Unfortunately, currently there's no platform-independent way to grab the actual storage type.
  • Loading branch information
umfranzw authored Nov 14, 2024
1 parent 1da9237 commit 42e8cbe
Showing 1 changed file with 16 additions and 5 deletions.
21 changes: 16 additions & 5 deletions test/hipcub/test_hipcub_vector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,22 @@ TYPED_TEST_SUITE(HipcubVector, Params);

template<class T, unsigned int VecSize, unsigned int BlockSize>
__global__
void vector_double_kernel(T* device_input, T* device_output)
void vector_test_kernel(T* device_input, T* device_output)
{
unsigned int index = hipThreadIdx_x + (hipBlockIdx_x * BlockSize);

device_output[index] = device_input[index] + device_input[index];
// Note about why subtraction is used here:
// To maintain CUB compatibility, the CubVector type for bools uses
// unsigned char as the backing storage type.
// As a result, when device_input is a CubVector of bools, the math below really operates on unsigned char,
// and there is no cast back to bool on write-back to the device_output.

// The cast back to bool is only done later on the host, and must use a reinterpret_cast to a bool pointer,
// followed by a dereference. In this scenario, the compiler does not convert non-zero values to 1 (as it would
// with a static_cast to bool operating on a value).
// That means we can end up comparing two values that are both true but have different (non-zero) binary values.
// Using subtraction below results in zero, which will always cast to the same binary false value (0).
device_output[index] = device_input[index] - device_input[index];
}

template<class T, unsigned int vec_size>
Expand All @@ -82,7 +93,7 @@ void run_vector_test()
Vector* device_input;
HIP_CHECK(test_common_utils::hipMallocHelper(&device_input, size * sizeof(Vector)));

T input_num = 10;
T input_num = static_cast<T>(10);

Vector input_vec;
for(unsigned int i = 0; i < vec_size; i++)
Expand All @@ -99,7 +110,7 @@ void run_vector_test()
Vector* device_output;
HIP_CHECK(test_common_utils::hipMallocHelper(&device_output, size * sizeof(Vector)));

vector_double_kernel<Vector, vec_size, block_size>
vector_test_kernel<Vector, vec_size, block_size>
<<<size / block_size, block_size>>>(device_input, device_output);

std::vector<Vector> output(size);
Expand All @@ -108,7 +119,7 @@ void run_vector_test()
output.size() * sizeof(Vector),
hipMemcpyDeviceToHost));

const T expected_num = input_num + input_num;
const T expected_num = static_cast<T>(input_num - input_num);

for(unsigned int i = 0; i < size; i++)
{
Expand Down

0 comments on commit 42e8cbe

Please sign in to comment.