WIP: DO NOT MERGE: ARMv8 NaN Correctness Proposal #1273
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NaN handling in SIMDe is very far from hardware, in terms of properly handling and propagating input NaNs to operations. However, this is not a primary concern for many users of the library. Obviously, at least one user particularly cares...
My proposed solution, in this PR (general working sample code to discuss, but certainly not a final proposal!), is to add optional NaN checking after results have been generated. This permits the use of whatever vector acceleration may be applicable, instead of requiring the slow fallback path to be used.
The concept is simple enough: After results are generated in the non-native path, check for input operand NaN conditions. This is done with x86 SSE intrinsics, as I'm working on x86, and these should work properly on other platforms as well through emulation. If any NaN values are found, then the "ARM NaN Propagation Algorithm" is applied to the elements to create the correct output NaN in slots as required.
Note that this does not solve the problem of hardware generating an inconsistent NaN on faulty operation - it only corrects propagating the NaNs through functions. I don't have tests written yet in the SIMDe test library, but in my own "throw large amounts of random data at the intrinsics" tester, it matches hardware exactly for silencing and propagating NaNs.
Feedback desired. Several points I am unclear on: