Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMDE is swapping the definition of SIMDE_MM_HINT_T0 and T2 #1186

Open
galanhir opened this issue May 27, 2024 · 5 comments
Open

SIMDE is swapping the definition of SIMDE_MM_HINT_T0 and T2 #1186

galanhir opened this issue May 27, 2024 · 5 comments

Comments

@galanhir
Copy link

On one hand, the prefetch macros are originally defined _MM_HINT_T0 (resp, T2) by Intel as:

xmmintrin.h:#define _MM_HINT_T0  3
...
xmmintrin.h:#define _MM_HINT_T2  1

On the other hand, SIMDE is defining

  #define SIMDE_MM_HINT_T0   1
...
  #define SIMDE_MM_HINT_T2   3

eventually identifying one for the other in case of SIMDE_X86_SSE_ENABLE_NATIVE_ALIASES:

  #undef  _MM_HINT_T0
  #define _MM_HINT_T0   SIMDE_MM_HINT_T0
...
  #undef  _MM_HINT_T2
  #define _MM_HINT_T2   SIMDE_MM_HINT_T2

This is probably wrong, as it ends up swapping _MM_HINT_T0 and _MM_HINT_T2 when built nativelly for Intel x86 targets.

@mr-c
Copy link
Collaborator

mr-c commented May 27, 2024

Thank you for your report @galanhir ; can you send us a pull request with a fix?

@aqrit
Copy link
Contributor

aqrit commented May 28, 2024

MSVC and ICC define _MM_HINT_T0 as 1.
GCC, CLANG, and ICX define it as 3.

I will confirm that SIMDE_MM_HINT_T0 produces the wrong prefetch (PREFETCHT2) under gcc.
But the correct PREFETCHT0 under msvc.

@galanhir
Copy link
Author

galanhir commented May 30, 2024

I hereby confirm that MSVC and ICC header files (xmmintrin.h) are defining _MM_HINT_T0 as 1, whereas both GCC, CLANG, and XCode header files (also xmmintrin.h) are defining _MM_HINT_T0 as 3. I also confirm that all those compilers generate PREFETCHT0 when compiling _mm_prefetch(ptr, _MM_HINT_T0).

I have also noted that Intel documentation (https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_prefetch) states:

    _MM_HINT_T0 // 3, move data using the T0 hint. The PREFETCHT0 instruction will be generated.
...
    _MM_HINT_T2 // 1, move data using the T2 hint. The PREFETCHT2 instruction will be generated.

which seems to indicate at least a mismatch btw MSVC header and Intel documentation.

At the end of the day, that sounds like a great deal of confusion between supported compilers.

Maybe, this calls for a delicate compiler filtering, with different definitions for SIMDE_MM_HINT_T0 depending on the compiler. But not clear what "default" behaviour shall be chosen and which compilers to filter out. A simple approach would be to keep the default as is, and revert the definittion when CLANG or GCC are detected, but, such simple approach may miss other x86_64 compilers detected as native by SIMDE and still behaving like CLANG or GCC.

I am not clear about the situation of MSVN, when CLANG-CL is used as back-end (I guess the same than CLANG, but i have not verified).

A session of GODBOLT to test usual configurations: https://godbolt.org/z/hM7c6E6zj.

I have verified that the mapping on __builtin_prefetch() directive follows (at least) CLANG translation usage, i.e. _mm_prefetch(ptr, _MM_HINT_T0) maps __builtin_prefetch(p, 0, 3) as expected and seems therefore correct on ARM-CLANG. I have not verified on other non x86_64 targets (but, maybe, it matters less).

@aqrit
Copy link
Contributor

aqrit commented May 30, 2024

Maybe, this calls for a delicate compiler filtering

maybe, something like:

#if defined(SIMDE_X86_SSE_NATIVE) && defined(_MM_HINT_T0)
#  define SIMDE_MM_HINT_T0 _MM_HINT_T0
#else 
#  define SIMDE_MM_HINT_T0 3
#endif

Might get tricky though, some of those defines are extensions, or otherwise not always defined by every compiler... ?

Intel documentation

There are Intel produced pdf's out there that also give the currently used enums.

I think there is some copy & paste going with the online guide...
For example, they swapped the arguments of _mm_test_mix_ones_zeros to reflex a typo in llvm/gcc headers.
But in the headers, despite the incorrect naming, the args still mapped to testc in the correct order...
so the guide is now incorrect.

@mr-c
Copy link
Collaborator

mr-c commented Sep 14, 2024

@aqrit and @galanhir (or anyone else); a PR to fix this would be very welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants