Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify correct rounding for sqrt #826

Open
hpkfft opened this issue Jul 28, 2024 · 8 comments · May be fixed by #882
Open

Specify correct rounding for sqrt #826

hpkfft opened this issue Jul 28, 2024 · 8 comments · May be fixed by #882
Labels
Maintenance Bug fix, typo fix, or general maintenance.
Milestone

Comments

@hpkfft
Copy link
Contributor

hpkfft commented Jul 28, 2024

In accuracy.rst, correct rounding is required for add, sub, mul, and divide. I propose adding sqrt to this list.
Note that sqrt is not listed later among the mathematical functions whose accuracy is not precisely defined.

Note that the IEEE Std 754-2019 for Floating-Point Arithmetic mandates correct rounding for squareRoot.

@asmeurer
Copy link
Member

Is this something that's actually the case for existing libraries/devices? If so, there shouldn't be an issue with adding it.

By the way, is your proposal for this to apply to both real and complex inputs or just to real inputs?

@hpkfft
Copy link
Contributor Author

hpkfft commented Jul 29, 2024

Excellent point! My proposal is for this to apply just to real inputs. Given industry adoption of IEEE Std 754-2019, correct rounding is actually the case for existing libraries/devices.
Sometimes there is an option for less accuracy (and presumably higher performance). For example, CUDA in single precision requires compiling with -prec-sqrt=true, and division requires -prec-div=true, for correct rounding. Since this document requires correctly rounded division, it equally well ought to require correctly rounded square root. [Double precision CUDA divide and square root are always correctly rounded. See the "Mathematical Functions" section of the "CUDA C++ Programming Guide".]

@hpkfft
Copy link
Contributor Author

hpkfft commented Jul 30, 2024

I believe the existing accuracy requirements for element-wise arithmetic operations are only intended to apply to real inputs.
For example, the usual 6 flop implementation of complex multiplication

(a + bi)(c + di) = (ac - bd) + (ad + bc)i

does not result in correctly rounded real and imaginary components.
I do not believe this array standard means to prohibit this implementation, but this should be clarified.

@kgryte kgryte added the Maintenance Bug fix, typo fix, or general maintenance. label Jul 30, 2024
@kgryte kgryte added this to the v2024 milestone Jul 30, 2024
@kgryte kgryte added the Needs Discussion Needs further discussion. label Jul 30, 2024
@kgryte
Copy link
Contributor

kgryte commented Sep 19, 2024

I do not believe this array standard means to prohibit this implementation

Yes, I think this is correct. As the specification advocates for IEEE 754 compliance (with some caveats; e.g., subnormals) and IEEE 754 only applies to reals, we don't have an equivalent mandate for complex number operations. Accordingly, we should make this distinction explicit.

Re: sqrt. Yes, I think this is a reasonable recommendation. The main point for the arithmetic operations being correctly rounded is to limit error accumulation. As sqrt is a fundamental operation--and more fundamental than the various transcendentals--I agree that requiring correctly rounded behavior is reasonable and should be added.

@hpkfft
Copy link
Contributor Author

hpkfft commented Oct 2, 2024

For those interested in the accuracy of complex operators/functions, my colleagues and I have done a comparison with different
compilers/libraries: complex.pdf

@kgryte
Copy link
Contributor

kgryte commented Jan 9, 2025

PR: #882

@kgryte kgryte removed the Needs Discussion Needs further discussion. label Jan 9, 2025
@hpkfft
Copy link
Contributor Author

hpkfft commented Jan 10, 2025

By default, the CUDA compiler sets -prec-div=true, -prec-sqrt=true, and -ftz=false.
https://docs.nvidia.com/cuda/floating-point/index.html#compiler-flags

The CuPy library is compiled with -ftz=true (overriding the default for this particular flag).
Thanks, @leofang, for this info.

@hpkfft
Copy link
Contributor Author

hpkfft commented Jan 10, 2025

Note that for IEEE square root a subnormal result can never be produced. So, a hardware mode that flushes subnormal results to zero is irrelevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Maintenance Bug fix, typo fix, or general maintenance.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants