Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blake2module.c needs to be compiled with libhacl SIMD flags #130213

Open
jmroot opened this issue Feb 17, 2025 · 18 comments
Open

blake2module.c needs to be compiled with libhacl SIMD flags #130213

jmroot opened this issue Feb 17, 2025 · 18 comments
Labels
build The build process and cross-build type-bug An unexpected behavior, bug, or error

Comments

@jmroot
Copy link
Contributor

jmroot commented Feb 17, 2025

Bug report

Bug description:

blake2module.c may include _hacl/Hacl_Hash_Blake2b_Simd256.h and/or _hacl/Hacl_Hash_Blake2s_Simd128.h, and thus needs to compile with LIBHACL_SIMD128_FLAGS and/or LIBHACL_SIMD256_FLAGS, or you get an error like this if the compiler doesn't enable those SIMD features by default:

In file included from ./Modules/blake2module.c:139:
In file included from ./Modules/_hacl/Hacl_Hash_Blake2b_Simd256.h:42:
In file included from ./Modules/_hacl/libintvector.h:28:
/Library/Developer/CommandLineTools/usr/bin/../lib/clang/7.0.2/include/smmintrin.h:28:2: error: "SSE4.1 instruction set not enabled"

I would create a PR but unfortunately I can't figure out where this needs to be added.

CPython versions tested on:

3.14

Operating systems tested on:

macOS

Linked PRs

@jmroot jmroot added the type-bug An unexpected behavior, bug, or error label Feb 17, 2025
@aisk aisk added the build The build process and cross-build label Feb 17, 2025
@picnixz
Copy link
Member

picnixz commented Feb 17, 2025

Aren't we setting them in configure? In addition we have:

// Small mismatch between the variable names Python defines as part of configure
// at the ones HACL* expects to be set in order to enable those headers.
#define HACL_CAN_COMPILE_VEC128 HACL_CAN_COMPILE_SIMD128
#define HACL_CAN_COMPILE_VEC256 HACL_CAN_COMPILE_SIMD256

#include "_hacl/Hacl_Hash_Blake2b.h"
#include "_hacl/Hacl_Hash_Blake2s.h"
#if HACL_CAN_COMPILE_SIMD256
#include "_hacl/Hacl_Hash_Blake2b_Simd256.h"
#endif
#if HACL_CAN_COMPILE_SIMD128
#include "_hacl/Hacl_Hash_Blake2s_Simd128.h"
#endif

So, the headers shouldn't even be included if HACL_* are not defined

@picnixz
Copy link
Member

picnixz commented Feb 17, 2025

In addition, -mavx2 should be sufficient to enable SSE 4.1, at least on gcc. I don't know whether this is the case on clang because I don't know how to check if -mavx2 implies -msse4.1:

if test "$ac_sys_system" != "Linux-android" -a "$ac_sys_system" != "WASI" || test "$ANDROID_API_LEVEL" -ge 28; then
  AX_CHECK_COMPILE_FLAG([-mavx2],[
    [LIBHACL_SIMD256_FLAGS="-mavx2"]
    AC_DEFINE([HACL_CAN_COMPILE_SIMD256], [1], [HACL* library can compile SIMD256 implementations])

@picnixz
Copy link
Member

picnixz commented Feb 17, 2025

Oh, I think I see what happens:

Modules/blake2module.o: $(srcdir)/Modules/blake2module.c $(MODULE__BLAKE2_DEPS) $(MODULE_DEPS_SHARED) $(PYTHON_HEADERS); $(CC)  -I$(srcdir)/Modules/_hacl/include $(PY_STDMODULE_CFLAGS) $(CCSHARED) -c $(srcdir)/Modules/blake2module.c -o Modules/blake2module.o

We might be missing some flag here indeed. OTOH, we have

_blake2 blake2module.c -I$(srcdir)/Modules/_hacl/include Modules/_hacl/libHacl_Hash_Blake2.a

in Setup.stdlib, but maybe we're missing a -D_BSD_SOURCE -D_DEFAULT_SOURCE here as SHA and MD5 modules are compiled with this.

@picnixz
Copy link
Member

picnixz commented Feb 17, 2025

I won't have time to investigate more so I leave it to @msprotz

@msprotz
Copy link
Contributor

msprotz commented Feb 17, 2025

@jmroot I'm slightly confused by your report. It looks like merely including smmintrin.h triggers an error if you do not pass -msse4.1, as opposed to actually using definitions from smmintrin.h

@jmroot I dimly recall something like that happening on old clang versions. Can you check with a very recent clang please?

@picnixz blake2module.c should not be compiled with any -m flags, otherwise, the compiler might start introducing, say, SSE4.1 instructions in a section of code that is not guarded behind the run-time check for e.g. has_simd256(), which will generate illegal instruction errors if Python is executed on a machine that does not have these instructions.

The idea is that one should be able to include various *mmintrin.h headers without issues, but must guard their usage behind suitable runtime checks.

@jmroot
Copy link
Contributor Author

jmroot commented Feb 17, 2025

@jmroot I'm slightly confused by your report. It looks like merely including smmintrin.h triggers an error if you do not pass -msse4.1, as opposed to actually using definitions from smmintrin.h

Yes, that seems to be the case.

@jmroot I dimly recall something like that happening on old clang versions. Can you check with a very recent clang please?

It works fine with current clang of course, because -msse4.1 is on by default.

@jmroot
Copy link
Contributor Author

jmroot commented Feb 17, 2025

The trigger for the error in smmintrin.h is #ifndef __SSE4_1__, so guarding the include with #ifdef __SSE4_1__ should be a partial solution.

@msprotz
Copy link
Contributor

msprotz commented Feb 17, 2025

On my system (which definitely does not have avx512):

jonathan@absinthe:/tmp $ cat test.c
#include <immintrin.h>
jonathan@absinthe:/tmp $ cc -c test.c
jonathan@absinthe:/tmp $ echo $?
0

the header inclusion works just fine. But I try to use the header without the right -m flags:

jonathan@absinthe:/tmp $ cat -p test.c
#include <immintrin.h>

int main() {
  __m512 res, src, a, b;
  __mmask16  k = 0x5555;

  res = _mm512_mask_add_ps(src, k, a, b);
  return 0;
}
jonathan@absinthe:/tmp $ cc test.c
test.c:7:9: error: always_inline function '_mm512_mask_add_ps' requires target feature 'avx512f', but would be inlined into function 'main' that is compiled without support for 'avx512f'
    7 |   res = _mm512_mask_add_ps(src, k, a, b);
      |         ^
test.c:7:9: error: AVX vector argument of type '__m512' (vector of 16 'float' values) without 'avx512f' enabled changes the ABI
2 errors generated.

Then I do get an actual error. So it appears on my system, you're allowed to include headers for intrinsics that are not currently enabled via suitable -m flags, but you do get an error if you try to use such intrinsics.

On your system, it appears that the behavior of the headers is much stricter.

What you suggest would work, but I would like to make sure I 100% understand why it's needed before I add it.

  • Do you agree that there is a change of behavior for intrinsic headers across clang versions?
  • Is compiling with clang7 a supported scenario for Python?

Thanks!

@jmroot
Copy link
Contributor Author

jmroot commented Feb 17, 2025

  • Do you agree that there is a change of behavior for intrinsic headers across clang versions?

Slightly hard to tell given that with current clang from Xcode 16.2

% /usr/bin/clang -E -dM -x c-header /dev/null | grep -F SSE4
#define __SSE4_1__ 1

But it's certainly consistent with the evidence.

  • Is compiling with clang7 a supported scenario for Python?

I don't know what the policy is. The only requirement I'm aware of is C11. Disabling SIMD for older clang would also be an acceptable though less preferable resolution.

@jmroot
Copy link
Contributor Author

jmroot commented Feb 18, 2025

The other part of the problem is that one of the types defined in the intrinsics headers is always used (__m256i, __m128i). Even when the includes are not guarded in libintvector.h, immintrin.h will internally not include avxintrin.h if __AVX__ is not defined. Simply adding the typedef results in a link-time error: LLVM ERROR: Do not know how to split this operator's operand!

Disabling SIMD may well be the only easy fix here. What do you think would be the best way to do that? Perhaps a configure check for whether smmintrin.h can be included without -msse4.1?

@msprotz
Copy link
Contributor

msprotz commented Feb 18, 2025

This an omission on my end and on #130157 I mention that in the absence of intrinsics headers these types ought to be defined as void*.

I'm open to guarding the include for compatibility with old versions of clang. @picnixz thoughts?

@msprotz
Copy link
Contributor

msprotz commented Feb 19, 2025

@jmroot I re-read your message above. Just to recap (correct me if I missed something):

  • if the toolchain can compile a "VEC128" implementation, then blake2module.c includes the header for Blake2s_128
  • this does not mean that the code from blake2module.c will run on a machine that supports VEC128 -- just that blake2module.c might elect, if support is found at runtime, to dispatch to the VEC128 implementation
  • corollary: blake2module.c cannot be compiled with e.g. -mavx2 as it would insert avx2 instructions in code that is not guaranteed to execute on a machine that supports AVX2
  • the vec128 implementations have function signatures that talk about the __m128i type
  • on old toolchains, the headers that contain __m128i and corresponding intrinsics give a hard error if cc is not running with the right -m flags
  • on old toolchains, the headers might not even want to define the right types, which causes an issue with typedef __m128i LibIntVector_vec128 for instance

Fixes include:

  1. your suggestion: disabling SIMD if these headers cannot be included without passing the right -m flags -- with a test at configure-time; @jmroot can you tell which version of clang started behaving "the right way"? if it's only super old clang versions, then I guess that's ok?
  2. big hack: because we know that blake2module.c only ever manipulates __m128i * (and not __m128), tweak libintvector.h to use a macro instead of a typedef, and locally define __m128 as void so that all usages of the vector behind a pointer translate to void * in the context of blake2module.c -- this seems like a terrible idea that might break in undebuggable ways
  3. big expensive refactoring: make sure all the modules in blake2 do not #include <libintvector.h>in their public headers, relying on forward struct definitions to hide the usage of vector types from the perspective of client files

Option 3. will take considerable time and effort, and option 2 is a big hack. I lean towards your proposed solution (option 1).

@jmroot do you know which version of clang it is that those headers started behaving "the right way"?

@jmroot
Copy link
Contributor Author

jmroot commented Feb 19, 2025

@jmroot I re-read your message above. Just to recap (correct me if I missed something):

  • if the toolchain can compile a "VEC128" implementation, then blake2module.c includes the header for Blake2s_128
  • this does not mean that the code from blake2module.c will run on a machine that supports VEC128 -- just that blake2module.c might elect, if support is found at runtime, to dispatch to the VEC128 implementation
  • corollary: blake2module.c cannot be compiled with e.g. -mavx2 as it would insert avx2 instructions in code that is not guaranteed to execute on a machine that supports AVX2
  • the vec128 implementations have function signatures that talk about the __m128i type
  • on old toolchains, the headers that contain __m128i and corresponding intrinsics give a hard error if cc is not running with the right -m flags

I think that's all correct.

  • on old toolchains, the headers might not even want to define the right types, which causes an issue with typedef __m128i LibIntVector_vec128 for instance

This seems to be specific to immintrin.h which is only used in the VEC256 case. The ones used for VEC128 should either error or provide the needed typedefs.

@jmroot do you know which version of clang it is that those headers started behaving "the right way"?

It looks like on the Apple side it happened in either Xcode 8 or some late 7.x version.

@jmroot
Copy link
Contributor Author

jmroot commented Feb 20, 2025

@msprotz PR opened for option 1, please take a look.

@chris-eibl
Copy link
Contributor

The not-so-old clang-cl 18.1.8 currently shipped with Visual Studio 2022 fails with:

  In file included from ..\Modules\blake2module.c:139:
  In file included from ..\Modules\_hacl/Hacl_Hash_Blake2b_Simd256.h:42:
..\Modules\_hacl\libintvector.h(230,9): error : unknown type name '__m256i' [E:\cpython_clang_pgo2\PCbuild\pythoncore.vcxproj]

I think this is related?

clang-cl 19.1.1 shipped with VS 2022 Preview 5 compiles fine.

If option 1 is the way to go, then IIUC we'll have to do something similar in pythoncore.vcxproj, i.e.

  • do not define HACL_CAN_COMPILE_SIMD128 and HACL_CAN_COMPILE_SIMD256 for blake2module.c based on LLVMToolsVersion
  • then we'd even could skip compiling Hacl_Hash_Blake2s_Simd128.c and Hacl_Hash_Blake2s_Simd256.c - they'd just be dead code?

@msprotz
Copy link
Contributor

msprotz commented Feb 20, 2025

It seems like the same problem indeed. Based on my understanding, yes to both points.

Pinging @gpshead who might be interested in this discussion

@msprotz
Copy link
Contributor

msprotz commented Feb 20, 2025

FYI, @jmroot, I separately have been exploring option 3 (use forward struct decls everywhere), which would solve the issue with blake2, but would immediately reappear for the upcoming hmac pr #130157 and would require further tricks. It seems like it's not really worth it.

@jmroot
Copy link
Contributor Author

jmroot commented Feb 20, 2025

Thanks for looking into it. I figured option 3 would likely be a longer term effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants