-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Re-enable acceleration of Vector512<long>.op_Multiply #111832
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -3335,7 +3335,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic, | |||
{ | |||
// Emulate NI_AVX512DQ_VL_MultiplyLow with SSE41 for SIMD16 | |||
} | |||
else | |||
else if (simdSize != 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I think this needs Avx512DQ
ISA check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're treating any of the Vector512
methods other than IsSupported
as intrinsic, that implies we have the full baseline AVX-512 set (F,DQ,BW,CD,VL). It's a bit confusing because some of the import paths assert or check that, but most don't. I'm actually cleaning up some of those redundant asserts in a different branch now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I thought that Vector512.IsHardwareAccelerated
only relies on AVX512F
, but looks like DOTNET_EnableAVX512DQ=0
turns it off so it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, IsHardwareAccelerated
😄
The rule is Vector512.IsHardwareAccelerated
will return false unless all of the following are satisfied:
- Baseline AVX-512 support (F,CD,DW,BW,VL)
- Not throttling based on CPUID check for Skylake-X and others with severe downclocking for 512-bit vector instructions, or
DOTNET_PreferredVectorBitWidth: >= 512
- No
DOTNET_PreferredVectorBitWidth: < 512
The rule for whether Vector512
methods actually import as intrinsic is only that we have the baseline AVX-512 set, meaning IsHardwareAccelerated
may return false, but all methods may actually be accelerated anyway.
So the fact that we're importing the methods for Vector512
as intrinsic in the first place means the ISA requirements have already been met.
Vector128
and Vector256
are a bit different, because the baseline ISA requirement may not be enough to accelerate all methods.
Vector256.IsHardwareAccelerated
returns true only if AVX2 is supported, but we attempt to import methods as intrinsic as long as AVX is supported. Since many of the methods require AVX2 for acceleration, they have an extra check for AVX2 and then fall back to managed if it's not available. Hence all the (simdSize != 32) || compOpportunisticallyDependsOn(InstructionSet_AVX2)
checks.
Similar checks are not included for Vector128
, because the base requirement is SSE2, so almost all methods can be accelerated, minus a few that require SSE4.1 and check for it explicitly.
Clear as mud, I know...
This was a regression in 9.0, from #103555
https://godbolt.org/z/11hs3Kqdd