Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bjacob
Copy link
Contributor

@bjacob bjacob commented Feb 12, 2025

On ROCm, we want to use the device library for all math functions.

This expands on #19969, which only concerned math.erf.

We only leave one category of rewrites enabled: the operand casts to f32. The ROCm device library internally performs the same for many math functions, but we leave that unchanged here for incrementality. We might get to that in a follow-up PR.

@bjacob bjacob force-pushed the no-approx-at-all-on-rocm branch from 4ae6121 to 35e4441 Compare February 12, 2025 03:18
@MaheshRavishankar
Copy link
Contributor

If CI passes that's a good indication for e2e correctness for now I think

@MaheshRavishankar
Copy link
Contributor

Interesting. It has compilation failures

// CHECK: math.exp2
// CHECK: math.expm1
// CHECK: math.cbrt
// CHECK: math.erf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we figured out the numerical issue with math.erf library function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have (99%) figured that there was no issue with it, and the issues we ran into were caused by PolynomialApproximationPass being too coarse-grained and too convoluted, so that when we thought earlier that we were enabling/disabling math.erf approximation, we were also enabling/disabling a number of other things, unintentionally. This is what #19922 solved. Now in the present PR we are finally at a place where we have some fine-grained, well-defined levers to play with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining issue has been diagnosed in #20074 (comment) as an issue with overly strict tests requiring agreement with the less accurate polynomial approximations in f16, as opposed to the ROCm device lib performing the approximation after upcasting to f32.

@bjacob bjacob force-pushed the no-approx-at-all-on-rocm branch from 35e4441 to 86c508b Compare February 20, 2025 15:25
@bjacob
Copy link
Contributor Author

bjacob commented Feb 21, 2025

For a moment I felt that I was very close to resolving this with llvm/llvm-project#128203. That PR does solve the CI issues observed here, except that I have to add all the remaining math ops to scalarization, which is not desirable.

If and when we revive this, we have good reasons at this point to handle the necessary vector-flattening downstream.

Signed-off-by: Benoit Jacob <[email protected]>
@bjacob bjacob force-pushed the no-approx-at-all-on-rocm branch from 86c508b to 2d14e5e Compare February 27, 2025 21:00
@bjacob
Copy link
Contributor Author

bjacob commented Feb 27, 2025

Good news, this should be unblocked by llvm/llvm-project#128915. Retrying now.

@bjacob bjacob marked this pull request as ready for review February 27, 2025 21:05
@bjacob bjacob requested a review from hanhanW as a code owner February 27, 2025 21:05
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but we maybe need some end-to-end tests to make sure it compiles as a whole. Do we have unit tests for these already in tests/e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants