Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD horizontal adds #65

Open
ppenzin opened this issue Jun 4, 2024 · 3 comments
Open

SIMD horizontal adds #65

ppenzin opened this issue Jun 4, 2024 · 3 comments

Comments

@ppenzin
Copy link
Collaborator

ppenzin commented Jun 4, 2024

Originally thought to be a post-MVP feature: WebAssembly/simd#20

There is PR to LLVM to introduce shuffle patterns that combined with other instructions would translate to horizontal additions - motivation to restart the conversation on horizontal SIMD ops or at least provide a way to disseminate this among the runtimes.

@sparker-arm as the author of the patch, sorry to put you on the spot.

@sparker-arm
Copy link

From my brief look at this spec, it looks like we'd use dup_odd to pattern match a pairwise operation. So, IIUC, pattern matching with flexible vectors should be a bit easier than the current SIMD shuffles.

But we would still have the trouble of choosing a canonical form that matches well to hardware and for all the runtimes to perform the matching. For instance, the current shuffle approach in LLVM would map to concat_lower_upper and that, again, isn't useful for the horizontal FP instructions that I'm aware of.

So, I would definitely be in favour of having dedicated wasm instruction(s), for both fixed and flexible!

Arm hardware-wise, Neon includes faddp for floats, which are chained for a full reduction, and addv is used for integer reduction. SVE includes faddv, which performs a recursive pairwise reduction on floats, but I'm not sure what we use for integers.

@akirilov-arm
Copy link

We have the SADDV and UADDV instructions to deal with integers (signed and unsigned respectively) in SVE. BTW another option for floating-point values is FADDA, which is strictly ordered, but realistically has a performance cost.

There are also pairwise operations, i.e. ADDP and FADDP.

@ppenzin
Copy link
Collaborator Author

ppenzin commented Jun 6, 2024

I vaguely remember horizontal ops were not great on x86. @rrwinterton, do you have any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants