Add benchmark coverage for parabola based cosine approximation #2

milianw · 2022-03-31T14:49:22Z

Add benchmark coverage for parabola based cosine approximation

This covers both the original version by Nick from 1 and
the slightly modified and optimized versions that I came up with
a couple years ago over and shared at 2.

Note though that the original version 1 is only defined for the
ranges [-pi, pi] but the accuracy test harness here tests the
range [0, 2pi], which shines a bad light on these versions.

My version 2 doesn't suffer from this accuracy issue - you
can throw arbitrary input values at it. The performance is
pretty good too, the imprecise version is even the fastest
cos implementation on my machine now. The lookup table
implementations are directly behind it, but I have to note:
In real-world testing, cache eviction effects through
interactions with the rest of the application code will
further decrease the performance of lookup tables. Finally,
this code is easily autovectorized by compilers like icc and
even gcc.

On my machine, the results for all tests are as follows:

Compiler:

g++ (GCC) 11.2.0
compiling code with `-flto -march=native -O3`

CPU:

11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

output:

ACCURACY
cos_taylor_literal_4terms_naive     19.9880092736029695
cos_taylor_literal_6terms_naive     1.4652889617438571
cos_taylor_literal_6terms_2pi       1.4652889617438571
cos_taylor_literal_6terms_pi        0.0001004702941281
cos_taylor_literal_6terms           0.0001004702941279
cos_taylor_literal_10terms          0.0000000000756514
cos_taylor_running_6terms           0.0001004702941287
cos_taylor_running_8terms           0.0000001352604422
cos_taylor_running_10terms          0.0000000000756513
cos_taylor_running_16terms          0.0000000000000009
cos_table_1                         0.4944578886434219
cos_table_0_1                       0.0499943500331001
cos_table_0_01                      0.0049999938268771
cos_table_0_001                     0.0004999999109268
cos_table_0_0001                    0.0000499999164148
cos_table_1_LERP                    0.1147496616359112
cos_table_0_1_LERP                  0.0012496954434600
cos_table_0_01_LERP                 0.0000124999013960
cos_table_0_001_LERP                0.0000001249999969
cos_table_0_0001_LERP               0.0000000012500020
cos_math_h                          0.0000000000000000
cos_parabola                        15.9999999810748665
cos_parabola_extra                  63.2499998575883708
cos_parabola_opt                    0.0560095959541279
cos_parabola_extra_opt              0.0010902926026140

TIME
cos_taylor_literal_4terms_naive     0.3642890000000000
cos_taylor_literal_6terms_naive     0.5741620000000000
cos_taylor_literal_6terms_2pi       0.7144020000000000
cos_taylor_literal_6terms_pi        0.7745180000000000
cos_taylor_literal_6terms           0.7218470000000000
cos_taylor_literal_10terms          1.1426369999999999
cos_taylor_running_6terms           0.6787260000000001
cos_taylor_running_8terms           0.9333120000000000
cos_taylor_running_10terms          1.1113160000000000
cos_taylor_running_16terms          1.7794570000000001
cos_table_1                         0.2014240000000000
cos_table_0_1                       0.2031010000000000
cos_table_0_01                      0.2034710000000000
cos_table_0_001                     0.2042740000000000
cos_table_0_0001                    0.2036450000000000
cos_table_1_LERP                    0.3107120000000000
cos_table_0_1_LERP                  0.3346280000000000
cos_table_0_01_LERP                 0.3342020000000000
cos_table_0_001_LERP                0.3342410000000000
cos_table_0_0001_LERP               0.3327000000000000
cos_math_h                          0.7697060000000000
cos_parabola                        0.1096080000000000
cos_parabola_extra                  0.1190130000000000
cos_parabola_opt                    0.1476240000000000
cos_parabola_extra_opt              0.2056920000000000

This covers both the original version by Nick from [1] and the slightly modified and optimized versions that I came up with a couple years ago over and shared at [2]. [1]: https://web.archive.org/web/20171228230531/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 [2]: https://stackoverflow.com/a/28050328/35250 Note though that the original version [1] is only defined for the ranges [-pi, pi] but the accuracy test harness here tests the range [0, 2pi], which shines a bad light on these versions. My version [2] doesn't suffer from this accuracy issue - you can throw arbitrary input values at it. The performance is pretty good too, the imprecise version is even the fastest cos implementation on my machine now. The lookup table implementations are directly behind it, but I have to note: In real-world testing, cache eviction effects through interactions with the rest of the application code will further decrease the performance of lookup tables. Finally, this code is easily autovectorized by compilers like icc and even gcc. On my machine, the results for all tests are as follows: Compiler: ``` g++ (GCC) 11.2.0 compiling code with `-flto -march=native -O3` ``` CPU: ``` 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz ``` output: ``` ACCURACY cos_taylor_literal_4terms_naive 19.9880092736029695 cos_taylor_literal_6terms_naive 1.4652889617438571 cos_taylor_literal_6terms_2pi 1.4652889617438571 cos_taylor_literal_6terms_pi 0.0001004702941281 cos_taylor_literal_6terms 0.0001004702941279 cos_taylor_literal_10terms 0.0000000000756514 cos_taylor_running_6terms 0.0001004702941287 cos_taylor_running_8terms 0.0000001352604422 cos_taylor_running_10terms 0.0000000000756513 cos_taylor_running_16terms 0.0000000000000009 cos_table_1 0.4944578886434219 cos_table_0_1 0.0499943500331001 cos_table_0_01 0.0049999938268771 cos_table_0_001 0.0004999999109268 cos_table_0_0001 0.0000499999164148 cos_table_1_LERP 0.1147496616359112 cos_table_0_1_LERP 0.0012496954434600 cos_table_0_01_LERP 0.0000124999013960 cos_table_0_001_LERP 0.0000001249999969 cos_table_0_0001_LERP 0.0000000012500020 cos_math_h 0.0000000000000000 cos_parabola 15.9999999810748665 cos_parabola_extra 63.2499998575883708 cos_parabola_opt 0.0560095959541279 cos_parabola_extra_opt 0.0010902926026140 TIME cos_taylor_literal_4terms_naive 0.3642890000000000 cos_taylor_literal_6terms_naive 0.5741620000000000 cos_taylor_literal_6terms_2pi 0.7144020000000000 cos_taylor_literal_6terms_pi 0.7745180000000000 cos_taylor_literal_6terms 0.7218470000000000 cos_taylor_literal_10terms 1.1426369999999999 cos_taylor_running_6terms 0.6787260000000001 cos_taylor_running_8terms 0.9333120000000000 cos_taylor_running_10terms 1.1113160000000000 cos_taylor_running_16terms 1.7794570000000001 cos_table_1 0.2014240000000000 cos_table_0_1 0.2031010000000000 cos_table_0_01 0.2034710000000000 cos_table_0_001 0.2042740000000000 cos_table_0_0001 0.2036450000000000 cos_table_1_LERP 0.3107120000000000 cos_table_0_1_LERP 0.3346280000000000 cos_table_0_01_LERP 0.3342020000000000 cos_table_0_001_LERP 0.3342410000000000 cos_table_0_0001_LERP 0.3327000000000000 cos_math_h 0.7697060000000000 cos_parabola 0.1096080000000000 cos_parabola_extra 0.1190130000000000 cos_parabola_opt 0.1476240000000000 cos_parabola_extra_opt 0.2056920000000000 ```

This range is often much better to approximate for 0-symmetric functions like cos. I.e. compare: ``` [0, 2pi]: ACCURACY cos_taylor_literal_4terms_naive 19.9880092736029695 cos_taylor_literal_6terms_naive 1.4652889617438571 cos_taylor_literal_6terms_2pi 1.4652889617438571 ... cos_parabola 15.9999999810748665 cos_parabola_extra 63.2499998575883708 [-pi, pi]: ACCURACY cos_taylor_literal_4terms_naive 0.0239777873763927 cos_taylor_literal_6terms_naive 0.0001004702957825 cos_taylor_literal_6terms_2pi 0.0001004702957825 cos_parabola 1.9999999739033667 cos_parabola_extra 3.3499999445446544 ```

This is basically the opposite of the new -r arg - we now increase the value range to [-10pi, 10pi]. Anything out of 2pi will be abysmal for naive functions that don't account for this, see: ``` ./benchmarks -R Cosine benchmark ACCURACY cos_taylor_literal_4terms_naive 22237893.9080788344144821 cos_taylor_literal_6terms_naive 1693743289.4118604660034180 cos_taylor_literal_6terms_2pi 1.4652888121124259 cos_taylor_literal_6terms_pi 1.4652886805053995 cos_taylor_literal_6terms 1.4652886805053986 cos_taylor_literal_10terms 0.0003012239456650 cos_taylor_running_6terms 0.0001004702740058 cos_taylor_running_8terms 0.0000001352604069 cos_taylor_running_10terms 0.0000000000756512 cos_taylor_running_16terms 0.0000000000000014 cos_table_1 0.4944578224012448 cos_table_0_1 0.0499941818532710 cos_table_0_01 0.0049999017702790 cos_table_0_001 0.0004999070288996 cos_table_0_0001 0.0000499070860950 cos_table_1_LERP 0.1147496616359124 cos_table_0_1_LERP 0.0012496954434598 cos_table_0_01_LERP 0.0000124999013927 cos_table_0_001_LERP 0.0000001249999925 cos_table_0_0001_LERP 0.0000000012499975 cos_math_h 0.0000000000000000 cos_parabola 399.9999987128100543 cos_parabola_extra 36130.4497678874759004 cos_parabola_opt 0.0560095959541315 cos_parabola_extra_opt 0.0010902926026148 ```

No fancy compiler args are added, but can be set manually using standard CMake procedures.

milianw force-pushed the parabola-approx branch 2 times, most recently from da856bd to a379bde Compare March 31, 2022 16:30

milianw added 4 commits March 31, 2022 18:31

Add basic cmake based buildsystem

73e5430

No fancy compiler args are added, but can be set manually using standard CMake procedures.

milianw force-pushed the parabola-approx branch from a379bde to 73e5430 Compare March 31, 2022 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark coverage for parabola based cosine approximation #2

Add benchmark coverage for parabola based cosine approximation #2

milianw commented Mar 31, 2022 •

edited

Loading

Add benchmark coverage for parabola based cosine approximation #2

Are you sure you want to change the base?

Add benchmark coverage for parabola based cosine approximation #2

Conversation

milianw commented Mar 31, 2022 • edited Loading

milianw commented Mar 31, 2022 •

edited

Loading