Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Quaddtype #98

Merged
merged 44 commits into from
Aug 16, 2024
Merged

Adding Quaddtype #98

merged 44 commits into from
Aug 16, 2024

Conversation

SwayamInSync
Copy link
Contributor

This PR adds the initial support of SLEEF backed QuadDtype

@mattip
Copy link
Member

mattip commented Aug 8, 2024

Heh. Over at openblas-lib we were just discussing what would happen if someone started using the libquadmath functions linked from gfortran in MacPython/openblas-libs#85. Will this eventually percolate into linalg calls? It would require a new build of OpenBLAS, since by default support is disabled with an admonition that the feature is not well implemented.

How will this interact with numpy ufuncs, are you planning to add required loops?

@shibatch
Copy link

shibatch commented Aug 8, 2024

tlfloat is a library positioned completely differently from sleef.
In a word, it is similar to mpfr, but tlfloat works everywhere.

@ngoldbaum
Copy link
Member

@mattip if we upstream this to NumPy (still lots of work before we can do that) we'd probably need to use SLEEF exclusively, which has the boost (BSD-like) license.

The plan is to add ufuncs and casts in followups.

If you'd like we should chat about this project sometime so you can meet Swayam.

@ngoldbaum
Copy link
Member

@shibatch thanks for your interest here! We'd love your opinion bout this work.

The goal here is to maybe eventually replace numpy's longdouble dtype with something that doesn't consistently produce annoying platform-specific bugs that maintainers who do not care about longdouble need to fix. Also providing true 128 bit floats. NumPy has long had a float128 DType that is not really a float128.

tlfloat seems like a good option for the MPFR DType @seberg wrote, if anyone wants to push that farther forward.

@shibatch
Copy link

shibatch commented Aug 8, 2024

I still don't understand, but are you guys interested in correctly-rounded results?
Or are you more interested in speeding up computation with SIMD instructions?
How much are you interested in computing in octuple-precision?

One of the reasons I started developing tlfloat was the goal of creating a library that can produce proper correctly-rounded results.
The four basic arithmetic operations in quad-precision with sleef are actually slow, and you need to use AVX-512 to get a clear speed advantage.

@carlkl
Copy link
Member

carlkl commented Aug 8, 2024

I guess there is interest to have something similar to Intels 80-bit longdouble support, that is higher precision compared to double but without to much perfomance impact compared to double. This is the reason for the existence of libraries like D.Bailey's double-double library or the now available libxprec library.

There is also interest in full quad-precision. libquadmath supports this, but with a non-compatible license (LGPL) for scipy.

It would be good to know wether tfloat's quads is more performant compared to sleef's quad.

Personally I have no opinion about octupole precision.

@ngoldbaum
Copy link
Member

It's not correctly rounded results that we care about so much as a migration path for users of longdouble. Also a way to get higher precision floats from a cross-platform library.

Agreed about the speed of SLEEF. Is there something faster we can use? That said, speed is less of a concern IMO than providing the functionality at all. Right now in NumPy if you're not on Linux you're out of luck if you need anything higher precision than 64 bit floats.

We also talked about writing the code in such a way that you could select the float backend at runtime with a parameter for the dtype constructor, since that would help with benchmarking and experimentation.

@ngoldbaum
Copy link
Member

Oh, I misinterpreted your earlier message, tlfloat does implement quads.

@shibatch
Copy link

shibatch commented Aug 8, 2024

@ngoldbaum Thank you for your interest in sleef and tlfloat.

I have not yet benchmarked the computation speed of tlfloat, but I believe it is similar to libquadmath in terms of the basic four arithmetic operations, since the internal processing is equivalent to libquadmath.

For elementary functions, sleef should be significantly faster. Sleef internally performs operations in triple-double format, and the reason it is slower for the basic four arithmetic operations is due to the slow conversion between triple-double and IEEE format.
For elementary functions, it performs many calculations with triple-doubles, so it is easier to gain a speed advantage.

@ngoldbaum ngoldbaum mentioned this pull request Aug 8, 2024
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments inline but it's awesome to see how much progress you've already made!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the contents here rather than just delete the README. No need to go into detail, just a sentence or two explaining what it is. It might also help to explain you need to install SLEEF and how to build against SLEEF (e.g. with environment variables or by installing it system-wide).

quaddtype/quaddtype/src/casts.cpp Outdated Show resolved Hide resolved
inline Sleef_quad
to_quad<long double>(long double x)
{
return Sleef_cast_from_doubleq1(x);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth looking in the SLEEF docs or implementation at whether x needs to be memory-aligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure here, but as handling earlier we can use memcpy to copy the long double value to an aligned buffer before passing it to the Sleef_cast_from_doubleq1
that way it can handle the non-alignment, something like this

alignas(long double) long double aligned_x;
memcpy(&aligned_x, &x, sizeof(long double));
return Sleef_cast_from_doubleq1(aligned_x);

quaddtype/quaddtype/src/casts.cpp Outdated Show resolved Hide resolved

loop_descrs[0] = PyArray_GetDefaultDescr(dtypes[0]);
*view_offset = 0;
return NPY_SAFE_CASTING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always safe? I don't actually know if, for example, all 64 bit ints are safely convertible to float128 without rounding.

Copy link
Contributor Author

@SwayamInSync SwayamInSync Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that make sense looking on the sizes as,
A 64-bit integer has a maximum value of $2^{64} - 1$ for unsigned integers and $-2^{63}$ to $2^{63} - 1$ for signed integers. The float128 format has a 113-bit significand (mantissa) so technically it should represent without rounding
Also like currently dealing with dtypes that all are smaller than float128 but the actual individual implementation might be different and may require rounding

npy_intp *view_offset)
{
if (given_descrs[1] == NULL) {
loop_descrs[1] = (PyArray_Descr *)new_quaddtype_instance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your DType isn't parametric so it's perfectly valid to create a singleton instance and then return the singleton instead of creating a new instance whenever you need one.

quaddtype/quaddtype/src/umath.cpp Show resolved Hide resolved

// comparison functions

template <cmp_def comp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above, unnecessary template

return 0;
}

template <cmp_def comp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too


assert np.abs(np.float64(quad_result) - float_result) < 1e-10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is testing much, let's chat more about how to test these ufuncs at our next meeting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't get a chance to talk about this last night. The idea I had was to use sympy and/or mpfloat to calculate these results with arbitrary precision, then make sure the result you get is within the expected error bounds (1-4 ULP depending on the operation IIRC in the SLEEF docs).

@ngoldbaum
Copy link
Member

OK, tests are passing and I think all my comments have been addressed, let's bring this in. Let's try to keep followups more bite-sized to make reviewing these easier.

@ngoldbaum ngoldbaum merged commit cf6cf91 into numpy:main Aug 16, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants