blake2module.c needs to be compiled with libhacl SIMD flags #130213

jmroot · 2025-02-17T02:11:19Z

Bug report

Bug description:

blake2module.c may include _hacl/Hacl_Hash_Blake2b_Simd256.h and/or _hacl/Hacl_Hash_Blake2s_Simd128.h, and thus needs to compile with LIBHACL_SIMD128_FLAGS and/or LIBHACL_SIMD256_FLAGS, or you get an error like this if the compiler doesn't enable those SIMD features by default:

In file included from ./Modules/blake2module.c:139:
In file included from ./Modules/_hacl/Hacl_Hash_Blake2b_Simd256.h:42:
In file included from ./Modules/_hacl/libintvector.h:28:
/Library/Developer/CommandLineTools/usr/bin/../lib/clang/7.0.2/include/smmintrin.h:28:2: error: "SSE4.1 instruction set not enabled"

I would create a PR but unfortunately I can't figure out where this needs to be added.

CPython versions tested on:

3.14

Operating systems tested on:

macOS

Linked PRs

gh-130213: Check availability of Intel SIMD types #130332

The text was updated successfully, but these errors were encountered:

picnixz · 2025-02-17T11:02:09Z

Aren't we setting them in configure? In addition we have:

// Small mismatch between the variable names Python defines as part of configure
// at the ones HACL* expects to be set in order to enable those headers.
#define HACL_CAN_COMPILE_VEC128 HACL_CAN_COMPILE_SIMD128
#define HACL_CAN_COMPILE_VEC256 HACL_CAN_COMPILE_SIMD256

#include "_hacl/Hacl_Hash_Blake2b.h"
#include "_hacl/Hacl_Hash_Blake2s.h"
#if HACL_CAN_COMPILE_SIMD256
#include "_hacl/Hacl_Hash_Blake2b_Simd256.h"
#endif
#if HACL_CAN_COMPILE_SIMD128
#include "_hacl/Hacl_Hash_Blake2s_Simd128.h"
#endif

So, the headers shouldn't even be included if HACL_* are not defined

picnixz · 2025-02-17T11:09:15Z

In addition, -mavx2 should be sufficient to enable SSE 4.1, at least on gcc. I don't know whether this is the case on clang because I don't know how to check if -mavx2 implies -msse4.1:

if test "$ac_sys_system" != "Linux-android" -a "$ac_sys_system" != "WASI" || test "$ANDROID_API_LEVEL" -ge 28; then
  AX_CHECK_COMPILE_FLAG([-mavx2],[
    [LIBHACL_SIMD256_FLAGS="-mavx2"]
    AC_DEFINE([HACL_CAN_COMPILE_SIMD256], [1], [HACL* library can compile SIMD256 implementations])

picnixz · 2025-02-17T11:12:32Z

Oh, I think I see what happens:

Modules/blake2module.o: $(srcdir)/Modules/blake2module.c $(MODULE__BLAKE2_DEPS) $(MODULE_DEPS_SHARED) $(PYTHON_HEADERS); $(CC)  -I$(srcdir)/Modules/_hacl/include $(PY_STDMODULE_CFLAGS) $(CCSHARED) -c $(srcdir)/Modules/blake2module.c -o Modules/blake2module.o

We might be missing some flag here indeed. OTOH, we have

_blake2 blake2module.c -I$(srcdir)/Modules/_hacl/include Modules/_hacl/libHacl_Hash_Blake2.a

in Setup.stdlib, but maybe we're missing a -D_BSD_SOURCE -D_DEFAULT_SOURCE here as SHA and MD5 modules are compiled with this.

picnixz · 2025-02-17T11:15:25Z

I won't have time to investigate more so I leave it to @msprotz

msprotz · 2025-02-17T15:24:42Z

@jmroot I'm slightly confused by your report. It looks like merely including smmintrin.h triggers an error if you do not pass -msse4.1, as opposed to actually using definitions from smmintrin.h

@jmroot I dimly recall something like that happening on old clang versions. Can you check with a very recent clang please?

@picnixz blake2module.c should not be compiled with any -m flags, otherwise, the compiler might start introducing, say, SSE4.1 instructions in a section of code that is not guarded behind the run-time check for e.g. has_simd256(), which will generate illegal instruction errors if Python is executed on a machine that does not have these instructions.

The idea is that one should be able to include various *mmintrin.h headers without issues, but must guard their usage behind suitable runtime checks.

jmroot · 2025-02-17T17:46:20Z

@jmroot I'm slightly confused by your report. It looks like merely including smmintrin.h triggers an error if you do not pass -msse4.1, as opposed to actually using definitions from smmintrin.h

Yes, that seems to be the case.

@jmroot I dimly recall something like that happening on old clang versions. Can you check with a very recent clang please?

It works fine with current clang of course, because -msse4.1 is on by default.

jmroot · 2025-02-17T17:57:58Z

The trigger for the error in smmintrin.h is #ifndef __SSE4_1__, so guarding the include with #ifdef __SSE4_1__ should be a partial solution.

msprotz · 2025-02-17T18:06:57Z

On my system (which definitely does not have avx512):

jonathan@absinthe:/tmp $ cat test.c
#include <immintrin.h>
jonathan@absinthe:/tmp $ cc -c test.c
jonathan@absinthe:/tmp $ echo $?
0

the header inclusion works just fine. But I try to use the header without the right -m flags:

jonathan@absinthe:/tmp $ cat -p test.c
#include <immintrin.h>

int main() {
  __m512 res, src, a, b;
  __mmask16  k = 0x5555;

  res = _mm512_mask_add_ps(src, k, a, b);
  return 0;
}
jonathan@absinthe:/tmp $ cc test.c
test.c:7:9: error: always_inline function '_mm512_mask_add_ps' requires target feature 'avx512f', but would be inlined into function 'main' that is compiled without support for 'avx512f'
    7 |   res = _mm512_mask_add_ps(src, k, a, b);
      |         ^
test.c:7:9: error: AVX vector argument of type '__m512' (vector of 16 'float' values) without 'avx512f' enabled changes the ABI
2 errors generated.

Then I do get an actual error. So it appears on my system, you're allowed to include headers for intrinsics that are not currently enabled via suitable -m flags, but you do get an error if you try to use such intrinsics.

On your system, it appears that the behavior of the headers is much stricter.

What you suggest would work, but I would like to make sure I 100% understand why it's needed before I add it.

Do you agree that there is a change of behavior for intrinsic headers across clang versions?
Is compiling with clang7 a supported scenario for Python?

Thanks!

jmroot · 2025-02-17T18:16:39Z

Do you agree that there is a change of behavior for intrinsic headers across clang versions?

Slightly hard to tell given that with current clang from Xcode 16.2

% /usr/bin/clang -E -dM -x c-header /dev/null | grep -F SSE4
#define __SSE4_1__ 1

But it's certainly consistent with the evidence.

Is compiling with clang7 a supported scenario for Python?

I don't know what the policy is. The only requirement I'm aware of is C11. Disabling SIMD for older clang would also be an acceptable though less preferable resolution.

jmroot · 2025-02-18T20:53:32Z

The other part of the problem is that one of the types defined in the intrinsics headers is always used (__m256i, __m128i). Even when the includes are not guarded in libintvector.h, immintrin.h will internally not include avxintrin.h if __AVX__ is not defined. Simply adding the typedef results in a link-time error: LLVM ERROR: Do not know how to split this operator's operand!

Disabling SIMD may well be the only easy fix here. What do you think would be the best way to do that? Perhaps a configure check for whether smmintrin.h can be included without -msse4.1?

msprotz · 2025-02-18T22:09:48Z

This an omission on my end and on #130157 I mention that in the absence of intrinsics headers these types ought to be defined as void*.

I'm open to guarding the include for compatibility with old versions of clang. @picnixz thoughts?

msprotz · 2025-02-19T18:30:44Z

@jmroot I re-read your message above. Just to recap (correct me if I missed something):

if the toolchain can compile a "VEC128" implementation, then blake2module.c includes the header for Blake2s_128
this does not mean that the code from blake2module.c will run on a machine that supports VEC128 -- just that blake2module.c might elect, if support is found at runtime, to dispatch to the VEC128 implementation
corollary: blake2module.c cannot be compiled with e.g. -mavx2 as it would insert avx2 instructions in code that is not guaranteed to execute on a machine that supports AVX2
the vec128 implementations have function signatures that talk about the __m128i type
on old toolchains, the headers that contain __m128i and corresponding intrinsics give a hard error if cc is not running with the right -m flags
on old toolchains, the headers might not even want to define the right types, which causes an issue with typedef __m128i LibIntVector_vec128 for instance

Fixes include:

your suggestion: disabling SIMD if these headers cannot be included without passing the right -m flags -- with a test at configure-time; @jmroot can you tell which version of clang started behaving "the right way"? if it's only super old clang versions, then I guess that's ok?
big hack: because we know that blake2module.c only ever manipulates __m128i * (and not __m128), tweak libintvector.h to use a macro instead of a typedef, and locally define __m128 as void so that all usages of the vector behind a pointer translate to void * in the context of blake2module.c -- this seems like a terrible idea that might break in undebuggable ways
big expensive refactoring: make sure all the modules in blake2 do not #include <libintvector.h>in their public headers, relying on forward struct definitions to hide the usage of vector types from the perspective of client files

Option 3. will take considerable time and effort, and option 2 is a big hack. I lean towards your proposed solution (option 1).

@jmroot do you know which version of clang it is that those headers started behaving "the right way"?

jmroot · 2025-02-19T21:23:39Z

@jmroot I re-read your message above. Just to recap (correct me if I missed something):

if the toolchain can compile a "VEC128" implementation, then blake2module.c includes the header for Blake2s_128

this does not mean that the code from blake2module.c will run on a machine that supports VEC128 -- just that blake2module.c might elect, if support is found at runtime, to dispatch to the VEC128 implementation

corollary: blake2module.c cannot be compiled with e.g. -mavx2 as it would insert avx2 instructions in code that is not guaranteed to execute on a machine that supports AVX2

the vec128 implementations have function signatures that talk about the __m128i type

on old toolchains, the headers that contain __m128i and corresponding intrinsics give a hard error if cc is not running with the right -m flags

I think that's all correct.

on old toolchains, the headers might not even want to define the right types, which causes an issue with typedef __m128i LibIntVector_vec128 for instance

This seems to be specific to immintrin.h which is only used in the VEC256 case. The ones used for VEC128 should either error or provide the needed typedefs.

@jmroot do you know which version of clang it is that those headers started behaving "the right way"?

It looks like on the Apple side it happened in either Xcode 8 or some late 7.x version.

jmroot · 2025-02-20T03:33:07Z

@msprotz PR opened for option 1, please take a look.

chris-eibl · 2025-02-20T18:42:52Z

The not-so-old clang-cl 18.1.8 currently shipped with Visual Studio 2022 fails with:

  In file included from ..\Modules\blake2module.c:139:
  In file included from ..\Modules\_hacl/Hacl_Hash_Blake2b_Simd256.h:42:
..\Modules\_hacl\libintvector.h(230,9): error : unknown type name '__m256i' [E:\cpython_clang_pgo2\PCbuild\pythoncore.vcxproj]

I think this is related?

clang-cl 19.1.1 shipped with VS 2022 Preview 5 compiles fine.

If option 1 is the way to go, then IIUC we'll have to do something similar in pythoncore.vcxproj, i.e.

do not define HACL_CAN_COMPILE_SIMD128 and HACL_CAN_COMPILE_SIMD256 for blake2module.c based on LLVMToolsVersion
then we'd even could skip compiling Hacl_Hash_Blake2s_Simd128.c and Hacl_Hash_Blake2s_Simd256.c - they'd just be dead code?

msprotz · 2025-02-20T20:55:25Z

It seems like the same problem indeed. Based on my understanding, yes to both points.

Pinging @gpshead who might be interested in this discussion

msprotz · 2025-02-20T20:56:12Z

FYI, @jmroot, I separately have been exploring option 3 (use forward struct decls everywhere), which would solve the issue with blake2, but would immediately reappear for the upcoming hmac pr #130157 and would require further tricks. It seems like it's not really worth it.

jmroot · 2025-02-20T21:28:28Z

Thanks for looking into it. I figured option 3 would likely be a longer term effort.

jmroot added the type-bug An unexpected behavior, bug, or error label Feb 17, 2025

aisk added the build The build process and cross-build label Feb 17, 2025

msprotz mentioned this issue Feb 19, 2025

Fix long-standing issue with C Abstract Structs FStarLang/karamel#542

Open

bedevere-app bot mentioned this issue Feb 20, 2025

gh-130213: Check availability of Intel SIMD types #130332

Open

chris-eibl mentioned this issue Feb 20, 2025

gh-130090: Support PGO for clang-cl #129907

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blake2module.c needs to be compiled with libhacl SIMD flags #130213

blake2module.c needs to be compiled with libhacl SIMD flags #130213

jmroot commented Feb 17, 2025 •

edited by bedevere-app bot

Loading

picnixz commented Feb 17, 2025

picnixz commented Feb 17, 2025 •

edited

Loading

picnixz commented Feb 17, 2025 •

edited

Loading

picnixz commented Feb 17, 2025

msprotz commented Feb 17, 2025

jmroot commented Feb 17, 2025

jmroot commented Feb 17, 2025

msprotz commented Feb 17, 2025

jmroot commented Feb 17, 2025

jmroot commented Feb 18, 2025

msprotz commented Feb 18, 2025

msprotz commented Feb 19, 2025

jmroot commented Feb 19, 2025

jmroot commented Feb 20, 2025

chris-eibl commented Feb 20, 2025

msprotz commented Feb 20, 2025

msprotz commented Feb 20, 2025

jmroot commented Feb 20, 2025

blake2module.c needs to be compiled with libhacl SIMD flags #130213

blake2module.c needs to be compiled with libhacl SIMD flags #130213

Comments

jmroot commented Feb 17, 2025 • edited by bedevere-app bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

picnixz commented Feb 17, 2025

picnixz commented Feb 17, 2025 • edited Loading

picnixz commented Feb 17, 2025 • edited Loading

picnixz commented Feb 17, 2025

msprotz commented Feb 17, 2025

jmroot commented Feb 17, 2025

jmroot commented Feb 17, 2025

msprotz commented Feb 17, 2025

jmroot commented Feb 17, 2025

jmroot commented Feb 18, 2025

msprotz commented Feb 18, 2025

msprotz commented Feb 19, 2025

jmroot commented Feb 19, 2025

jmroot commented Feb 20, 2025

chris-eibl commented Feb 20, 2025

msprotz commented Feb 20, 2025

msprotz commented Feb 20, 2025

jmroot commented Feb 20, 2025

jmroot commented Feb 17, 2025 •

edited by bedevere-app bot

Loading

picnixz commented Feb 17, 2025 •

edited

Loading

picnixz commented Feb 17, 2025 •

edited

Loading