Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Re-enable acceleration of Vector512<long>.op_Multiply #111832

Merged
merged 1 commit into from
Jan 26, 2025

Conversation

saucecontrol
Copy link
Member

This was a regression in 9.0, from #103555

https://godbolt.org/z/11hs3Kqdd

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 25, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 25, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -3335,7 +3335,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
// Emulate NI_AVX512DQ_VL_MultiplyLow with SSE41 for SIMD16
}
else
else if (simdSize != 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think this needs Avx512DQ ISA check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're treating any of the Vector512 methods other than IsSupported as intrinsic, that implies we have the full baseline AVX-512 set (F,DQ,BW,CD,VL). It's a bit confusing because some of the import paths assert or check that, but most don't. I'm actually cleaning up some of those redundant asserts in a different branch now.

Copy link
Member

@EgorBo EgorBo Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I thought that Vector512.IsHardwareAccelerated only relies on AVX512F, but looks like DOTNET_EnableAVX512DQ=0 turns it off so it's ok.

Copy link
Member Author

@saucecontrol saucecontrol Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, IsHardwareAccelerated 😄

The rule is Vector512.IsHardwareAccelerated will return false unless all of the following are satisfied:

  1. Baseline AVX-512 support (F,CD,DW,BW,VL)
  2. Not throttling based on CPUID check for Skylake-X and others with severe downclocking for 512-bit vector instructions, or DOTNET_PreferredVectorBitWidth: >= 512
  3. No DOTNET_PreferredVectorBitWidth: < 512

The rule for whether Vector512 methods actually import as intrinsic is only that we have the baseline AVX-512 set, meaning IsHardwareAccelerated may return false, but all methods may actually be accelerated anyway.

So the fact that we're importing the methods for Vector512 as intrinsic in the first place means the ISA requirements have already been met.

Vector128 and Vector256 are a bit different, because the baseline ISA requirement may not be enough to accelerate all methods.

Vector256.IsHardwareAccelerated returns true only if AVX2 is supported, but we attempt to import methods as intrinsic as long as AVX is supported. Since many of the methods require AVX2 for acceleration, they have an extra check for AVX2 and then fall back to managed if it's not available. Hence all the (simdSize != 32) || compOpportunisticallyDependsOn(InstructionSet_AVX2) checks.

Similar checks are not included for Vector128, because the base requirement is SSE2, so almost all methods can be accelerated, minus a few that require SSE4.1 and check for it explicitly.

Clear as mud, I know...

@EgorBo EgorBo merged commit d5c8265 into dotnet:main Jan 26, 2025
117 of 119 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants