-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate cpp_double_fp_backend #648
base: develop
Are you sure you want to change the base?
Conversation
check if integer width is adequate in split()
Fixes #92 with good final report
minor change in header order
fix silly mistakes
Gsoc2021 double float chris
…into gsoc2021_double_float_chris # Conflicts: # .github/workflows/multiprecision_quad_double_only.yml # .gitignore # performance/performance_test.cpp # test/test_arithmetic.hpp
Gsoc2021 double float chris
Gsoc2021 double float chris
Wow, with last single small change I reduced the |
Hi Chris (@ckormanyos), what happens if you replace |
It ruins the performance completely and entirely. What a great question Janek. It took so long that I am still waiting for the timing result. i had real/imag components separated. See the pic below. In summary, the pow function killed performance on that particular benchmark. I went from 17 seconds to 170 seconds, a factor of 10 |
Whew. That's good news for me actually, because yade uses
Meaning that in yade Before removing calls to
So you may notice that |
I am not sure if on that screenshot the lines 140 and 141 are correct? You have |
You are right Janek. That was a silly, late evening, hurried blunder. When I used the proper |
Just curious, did we not optimize the default pow function for integer exponents? Also just FYI the boost::math::pow(x) function is designed to optimise exactly this case: a power with a constant integer exponent. So far as I know there is no way within the language to detect that pow(T, int) is being called with an integer literal? |
Yes John, you are right. The generic collection of functions in Multiprecision DOES include specializations of I am experimenting with a local version, but I am not able to get significantly faster than the default version in Multiprecision, maybe only At the moment, I am not able to see any more clear bottlenecks in the overall performance of |
@ckormanyos does this PR improve power performance at all: #649 ? |
Chris (@ckormanyos) can you share your Mandelbrot benchmark code? I want to make sure that I can reproduce your results. Because if I don't, then we know it's not a problem with |
Chris (@ckormanyos) in this post with the Mandelbrot benchmark which version of g++ and optimization flags ( |
See also: BoostGSoC21#190 Hi Janek (@cosurgi), I have made a dedicated issue for this discussion. In that issue, I will provide the benchmark code and, yes, it does offer the ability to compare bin-float, dec-float, float128 and double-double. Give me a day or so to prepare a branch of the Mandelbrot for your dedicated use. Cc: @jzmaddock and @sinandredemption |
Hi John (@jzmaddock). In a word, yes. Treating small powers in that super-fast way is something we should probably do. Another thing I have been playing around with is a more subtle issue. In my recent pushes here, I have introduced a concept called In As it turns out, the floating-point-class checks actually do slow down these tiny, tiny backends significantly. We also found this to be relevant for the work in decimal. So a bit down the evolutionary road I will also be separating the work from the safety of mul/div operations as well. So if you have already checked edge cases in a function like As for your changes there, I think they definitely help all of multiprecision, but I still might end up specializing Cc: @cosurgi |
…into cpp_double_fp_backend
…into cpp_double_fp_backend
I posted some latest YADE benchmark results in BoostGSoC21#190 , suddenly it starts to look good with clang. |
Note to self: TODO Hit the edge-cases of the new |
Performance of algebraic functions re-affirmed in BoostGSoC21#190 |
OK, so the bad performance mystery was solved and I did benchmarks of YADE software Here are the results:
|
type | calculation speed | factor |
---|---|---|
cpp_double_double g++ 12.2 |
449.15 iter/sec | 1 |
float128 g++ 12.2 |
263.15 iter/sec | 1.70 |
cpp_bin_float<32> g++ 12.2 |
211.81 iter/sec | 2.12 |
cpp_dec_float<31> g++ 12.2 |
78.15 iter/sec | 5.74 |
mpfr_float_backend<31> g++ 12.2 |
51.01 iter/sec | 8.80 |
Here we can see that cpp_double_double
beats everyone else by over a factor of two.
cpp_double_long_double
type | calculation speed | factor |
---|---|---|
cpp_bin_float<39> g++ 12.2 |
122.55 iter/sec | 1 |
cpp_double_long_double clang++ 19.1.4 |
108.79 iter/sec | 1.12 |
cpp_bin_float<39> clang++ 19.1.4 |
102.19 iter/sec | 1.20 |
cpp_dec_float<39> g++ 12.2 |
71.42 iter/sec | 1.71 |
mpfr_float_backend<39> g++ 12.2 |
45.75 iter/sec | 2.67 |
cpp_double_long_double g++ 12.2 |
14.97 iter/sec | 8.18 |
Here we can see that cpp_double_long_double
performs very good. But the compiler developers will have a mystery to solve: cpp_bin_float<39>
g++ 12.2 is faster than cpp_double_long_double
clang++ 19.1.4 by just a little, which in turn is faster than cpp_double_long_double
g++ 12.2 by a factor of 8.
cpp_double_float128
type | calculation speed | factor |
---|---|---|
cpp_bin_float<67> g++ 12.2 |
118.43 iter/sec | 1 |
mpfr_float_backend<67> g++ 12.2 |
43.34 iter/sec | 2.73 |
cpp_dec_float<67> g++ 12.2 |
40.09 iter/sec | 2.95 |
cpp_double_float128 g++ 12.2 |
14.99 iter/sec | 7.90 |
Here we can see that cpp_double_float128
has a lot of potential to beat cpp_bin_float<67>
once the g++ developers sort out the problems with cpp_double_long_double
g++ 12.2. The increase in performance should be about by a factor of 8 :)
So all is good. I think we can merge this branch once documentation and other small TODOs are complete.
Thank you Janek (@cosurgi) that was a big effort, and it really provided a lot of information and clarity. Some of the results on The newer i7 processors have extremely powerful 64-bit floating-point hardware operations, and it seems like these are being very well supported nowdays in hardware and software. Down the road I will be doing some non-x86_64 measurements on M1 and/or M2 and a few embedded bare-metal controllers like an ARM(R) Cortex(R) M7, having double-precision floating-point FPU support. All-in-all I'm somewhat surprised at how fast I'm happy enough with it to make a first release out of this state. Cc: @sinandredemption and @jzmaddock |
There might be one more thing to check: that each of the backend/compiler configurations are doing (roughly) the same amount of work. Something that can happen when there is a tolerance set for termination is you can hit "unfortunate" parameters which cause the code to thrash through many needless iterations which don't actually get you any closer to the end result. I have no idea if this is the case here, but because they don't behave quite like exactly rounded IEEE types, things like |
Indeed. There are several potential dangers. Let's say we use At the same time, we know that Even worse, this backend is new, so there might be undiscovered problems in the areas of subnormal/zero. So you might iterate until the maximum iteration setting. We actually had several cases like this when John helped me see through the last tricky spots in the specfun tests. Who knows if we really got all the edge cases? Cc: @jzmaddock and @cosurgi |
No description provided.