Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: DO NOT MERGE: ARMv8 NaN Correctness Proposal #1273

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Syonyk
Copy link
Contributor

@Syonyk Syonyk commented Jan 30, 2025

NaN handling in SIMDe is very far from hardware, in terms of properly handling and propagating input NaNs to operations. However, this is not a primary concern for many users of the library. Obviously, at least one user particularly cares...

My proposed solution, in this PR (general working sample code to discuss, but certainly not a final proposal!), is to add optional NaN checking after results have been generated. This permits the use of whatever vector acceleration may be applicable, instead of requiring the slow fallback path to be used.

The concept is simple enough: After results are generated in the non-native path, check for input operand NaN conditions. This is done with x86 SSE intrinsics, as I'm working on x86, and these should work properly on other platforms as well through emulation. If any NaN values are found, then the "ARM NaN Propagation Algorithm" is applied to the elements to create the correct output NaN in slots as required.

Note that this does not solve the problem of hardware generating an inconsistent NaN on faulty operation - it only corrects propagating the NaNs through functions. I don't have tests written yet in the SIMDe test library, but in my own "throw large amounts of random data at the intrinsics" tester, it matches hardware exactly for silencing and propagating NaNs.

Feedback desired. Several points I am unclear on:

  • Should this be gated on SIMDE_FAST_NANs or SIMDE_NO_FAST_NANs? The default does not seem to be either one set. I am fine with this behavior off by default.
  • Is there a better way to handle the 64-bit vector into a 128-bit vector case? I would prefer to avoid MMX registers for performance reasons, as they share with the x86 registers on many processor.

NaN handling in SIMDe is very far from hardware, in terms of properly
handling and propagating input NaNs to operations.  However, this is
not a primary concern for many users of the library.  Obviously,
at least one user particularly cares...

My proposed solution, in this PR (general working sample code to
discuss, but certainly not a final proposal!), is to add optional NaN
checking after results have been generated.  This permits the use of
whatever vector acceleration may be applicable, instead of requiring
the slow fallback path to be used.

The concept is simple enough: After results are generated in the
non-native path, check for input operand NaN conditions.  This is
done with x86 SSE intrinsics, as I'm working on x86, and these should
work properly on other platforms as well through emulation.  If any
NaN values are found, then the "ARM NaN Propagation Algorithm" is
applied to the elements to create the correct output NaN in slots
as required.

Note that this does not solve the problem of hardware generating an
inconsistent NaN on faulty operation - it only corrects propagating
the NaNs through functions.  I don't have tests written yet in the
SIMDe test library, but in my own "throw large amounts of random data
at the intrinsics" tester, it matches hardware exactly for silencing
and propagating NaNs.

Feedback desired.  Several points I am unclear on:
- Should this be gated on SIMDE_FAST_NANs or SIMDE_NO_FAST_NANs? The
default does not seem to be either one set.  I am fine with this
behavior off by default.
- Is there a better way to handle the 64-bit vector into a 128-bit
vector case?  I would prefer to avoid MMX registers for performance
reasons, as they share with the x86 registers on many processor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant