-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970
base: main
Are you sure you want to change the base?
Conversation
4ae6121
to
35e4441
Compare
If CI passes that's a good indication for e2e correctness for now I think |
Interesting. It has compilation failures |
// CHECK: math.exp2 | ||
// CHECK: math.expm1 | ||
// CHECK: math.cbrt | ||
// CHECK: math.erf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we figured out the numerical issue with math.erf
library function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have (99%) figured that there was no issue with it, and the issues we ran into were caused by PolynomialApproximationPass being too coarse-grained and too convoluted, so that when we thought earlier that we were enabling/disabling math.erf approximation, we were also enabling/disabling a number of other things, unintentionally. This is what #19922 solved. Now in the present PR we are finally at a place where we have some fine-grained, well-defined levers to play with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining issue has been diagnosed in #20074 (comment) as an issue with overly strict tests requiring agreement with the less accurate polynomial approximations in f16, as opposed to the ROCm device lib performing the approximation after upcasting to f32.
35e4441
to
86c508b
Compare
For a moment I felt that I was very close to resolving this with llvm/llvm-project#128203. That PR does solve the CI issues observed here, except that I have to add all the remaining math ops to scalarization, which is not desirable. If and when we revive this, we have good reasons at this point to handle the necessary vector-flattening downstream. |
Signed-off-by: Benoit Jacob <[email protected]>
86c508b
to
2d14e5e
Compare
Good news, this should be unblocked by llvm/llvm-project#128915. Retrying now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but we maybe need some end-to-end tests to make sure it compiles as a whole. Do we have unit tests for these already in tests/e2e
On ROCm, we want to use the device library for all math functions.
This expands on #19969, which only concerned
math.erf
.We only leave one category of rewrites enabled: the operand casts to f32. The ROCm device library internally performs the same for many math functions, but we leave that unchanged here for incrementality. We might get to that in a follow-up PR.