Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Downstream Change] sincos() vectorization #87

Open
3 tasks
MacDue opened this issue Feb 11, 2025 · 3 comments
Open
3 tasks

[Downstream Change] sincos() vectorization #87

MacDue opened this issue Feb 11, 2025 · 3 comments

Comments

@MacDue
Copy link
Contributor

MacDue commented Feb 11, 2025

This issue tracks the downstream changes needed to enable sincos() vectorization in the Arm toolchain. These changes are only for the release 20.x branch (https://github.com/arm/arm-toolchain/tree/release/arm-software/20.x) as we intend to complete up-streaming the changes for LLVM 21.

The patches were not possible to upstream for LLVM 20 due to long review times.

@kiranchandramohan
Copy link
Contributor

Can you also provide links to the llvm-project main branch patches? Have these landed or are they in review?

@MacDue
Copy link
Contributor Author

MacDue commented Feb 12, 2025

Can you also provide links to the llvm-project main branch patches? Have these landed or are they in review?

Added the upstream PR links (above). The upstream patches have not landed.

dcandler pushed a commit that referenced this issue Feb 24, 2025
…izing literal struct return values (#82)

This patch adds initial support for vectorizing literal struct return
values. Currently, this is limited to the case where the struct is
homogeneous (all elements have the same type) and not packed. The users
of the call also must all be `extractvalue` instructions.

The intended use case for this is vectorizing intrinsics such as:

```
declare { float, float } @llvm.sincos.f32(float %x)
```

Mapping them to structure-returning library calls such as:

```
declare { <4 x float>, <4 x i32> } @Sleef_sincosf4_u10advsimd(<4 x float>)
```

Or their widened form (such as `@llvm.sincos.v4f32` in this case).

Implementing this required two main changes:

1. Supporting widening `extractvalue`
2. Adding support for vectorized struct types in LV
  * This is mostly limited to parts of the cost model and scalarization

Since the supported use case is narrow, the required changes are
relatively small.

---

Downstream issue: #87
dcandler pushed a commit that referenced this issue Feb 24, 2025
…l] calls to llvm.sincos.* when -fno-math-errno is set (#83)

This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in `TargetSubtargetInfo::useAA()`). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing `-mllvm -combiner-global-alias-analysis=true`
to clang.

---

Downstream issue: #87
@MacDue
Copy link
Contributor Author

MacDue commented Feb 27, 2025

Update: All patches have landed upstream (so should be available in the 21.x release without any downstream changes).

kiranchandramohan pushed a commit that referenced this issue Feb 28, 2025
…nd vectorize llvm.sincos intrinsics (#84)

This teaches the loop vectorizer that `llvm.sincos` is trivially
vectorizable. Additionally, this patch updates the cost model to cost
intrinsics that return multiple values correctly. Previously, the cost
model only thought intrinsics that return `VectorType` need scalarizing,
which meant it cost intrinsics that return multiple vectors (that need
scalarizing) way too cheap (giving it the cost of a single function
call).

The `llvm.sincos` intrinsic also has a custom cost when a vector
function library is available, as certain VFs can be expanded (later in
code-gen) to a vector function, reducing the cost to a single call (+
the possible loads from the vector function returns values via output
pointers).

---

Downstream issue: #87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants