Adding Quaddtype #98

SwayamInSync · 2024-08-07T23:34:41Z

This PR adds the initial support of SLEEF backed QuadDtype

mattip · 2024-08-08T04:23:49Z

Heh. Over at openblas-lib we were just discussing what would happen if someone started using the libquadmath functions linked from gfortran in MacPython/openblas-libs#85. Will this eventually percolate into linalg calls? It would require a new build of OpenBLAS, since by default support is disabled with an admonition that the feature is not well implemented.

How will this interact with numpy ufuncs, are you planning to add required loops?

shibatch · 2024-08-08T09:33:41Z

tlfloat is a library positioned completely differently from sleef.
In a word, it is similar to mpfr, but tlfloat works everywhere.

ngoldbaum · 2024-08-08T12:32:52Z

@mattip if we upstream this to NumPy (still lots of work before we can do that) we'd probably need to use SLEEF exclusively, which has the boost (BSD-like) license.

The plan is to add ufuncs and casts in followups.

If you'd like we should chat about this project sometime so you can meet Swayam.

ngoldbaum · 2024-08-08T12:44:42Z

@shibatch thanks for your interest here! We'd love your opinion bout this work.

The goal here is to maybe eventually replace numpy's longdouble dtype with something that doesn't consistently produce annoying platform-specific bugs that maintainers who do not care about longdouble need to fix. Also providing true 128 bit floats. NumPy has long had a float128 DType that is not really a float128.

tlfloat seems like a good option for the MPFR DType @seberg wrote, if anyone wants to push that farther forward.

shibatch · 2024-08-08T12:58:58Z

I still don't understand, but are you guys interested in correctly-rounded results?
Or are you more interested in speeding up computation with SIMD instructions?
How much are you interested in computing in octuple-precision?

One of the reasons I started developing tlfloat was the goal of creating a library that can produce proper correctly-rounded results.
The four basic arithmetic operations in quad-precision with sleef are actually slow, and you need to use AVX-512 to get a clear speed advantage.

carlkl · 2024-08-08T13:13:36Z

I guess there is interest to have something similar to Intels 80-bit longdouble support, that is higher precision compared to double but without to much perfomance impact compared to double. This is the reason for the existence of libraries like D.Bailey's double-double library or the now available libxprec library.

There is also interest in full quad-precision. libquadmath supports this, but with a non-compatible license (LGPL) for scipy.

It would be good to know wether tfloat's quads is more performant compared to sleef's quad.

Personally I have no opinion about octupole precision.

ngoldbaum · 2024-08-08T13:22:43Z

It's not correctly rounded results that we care about so much as a migration path for users of longdouble. Also a way to get higher precision floats from a cross-platform library.

Agreed about the speed of SLEEF. Is there something faster we can use? That said, speed is less of a concern IMO than providing the functionality at all. Right now in NumPy if you're not on Linux you're out of luck if you need anything higher precision than 64 bit floats.

We also talked about writing the code in such a way that you could select the float backend at runtime with a parameter for the dtype constructor, since that would help with benchmarking and experimentation.

ngoldbaum · 2024-08-08T13:43:01Z

Oh, I misinterpreted your earlier message, tlfloat does implement quads.

shibatch · 2024-08-08T14:54:56Z

@ngoldbaum Thank you for your interest in sleef and tlfloat.

I have not yet benchmarked the computation speed of tlfloat, but I believe it is similar to libquadmath in terms of the basic four arithmetic operations, since the internal processing is equivalent to libquadmath.

For elementary functions, sleef should be significantly faster. Sleef internally performs operations in triple-double format, and the reason it is slower for the basic four arithmetic operations is due to the slow conversion between triple-double and IEEE format.
For elementary functions, it performs many calculations with triple-doubles, so it is easier to gain a speed advantage.

quaddtype/tests/test_arr.py

ngoldbaum

I left a bunch of comments inline but it's awesome to see how much progress you've already made!

ngoldbaum · 2024-08-12T19:58:32Z

quaddtype/README.md

Let's update the contents here rather than just delete the README. No need to go into detail, just a sentence or two explaining what it is. It might also help to explain you need to install SLEEF and how to build against SLEEF (e.g. with environment variables or by installing it system-wide).