Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark coverage for parabola based cosine approximation #2

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

milianw
Copy link

@milianw milianw commented Mar 31, 2022

Add benchmark coverage for parabola based cosine approximation

This covers both the original version by Nick from 1 and
the slightly modified and optimized versions that I came up with
a couple years ago over and shared at 2.

Note though that the original version 1 is only defined for the
ranges [-pi, pi] but the accuracy test harness here tests the
range [0, 2pi], which shines a bad light on these versions.

My version 2 doesn't suffer from this accuracy issue - you
can throw arbitrary input values at it. The performance is
pretty good too, the imprecise version is even the fastest
cos implementation on my machine now. The lookup table
implementations are directly behind it, but I have to note:
In real-world testing, cache eviction effects through
interactions with the rest of the application code will
further decrease the performance of lookup tables. Finally,
this code is easily autovectorized by compilers like icc and
even gcc.

On my machine, the results for all tests are as follows:

Compiler:

g++ (GCC) 11.2.0
compiling code with `-flto -march=native -O3`

CPU:

11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

output:

ACCURACY
cos_taylor_literal_4terms_naive     19.9880092736029695
cos_taylor_literal_6terms_naive     1.4652889617438571
cos_taylor_literal_6terms_2pi       1.4652889617438571
cos_taylor_literal_6terms_pi        0.0001004702941281
cos_taylor_literal_6terms           0.0001004702941279
cos_taylor_literal_10terms          0.0000000000756514
cos_taylor_running_6terms           0.0001004702941287
cos_taylor_running_8terms           0.0000001352604422
cos_taylor_running_10terms          0.0000000000756513
cos_taylor_running_16terms          0.0000000000000009
cos_table_1                         0.4944578886434219
cos_table_0_1                       0.0499943500331001
cos_table_0_01                      0.0049999938268771
cos_table_0_001                     0.0004999999109268
cos_table_0_0001                    0.0000499999164148
cos_table_1_LERP                    0.1147496616359112
cos_table_0_1_LERP                  0.0012496954434600
cos_table_0_01_LERP                 0.0000124999013960
cos_table_0_001_LERP                0.0000001249999969
cos_table_0_0001_LERP               0.0000000012500020
cos_math_h                          0.0000000000000000
cos_parabola                        15.9999999810748665
cos_parabola_extra                  63.2499998575883708
cos_parabola_opt                    0.0560095959541279
cos_parabola_extra_opt              0.0010902926026140

TIME
cos_taylor_literal_4terms_naive     0.3642890000000000
cos_taylor_literal_6terms_naive     0.5741620000000000
cos_taylor_literal_6terms_2pi       0.7144020000000000
cos_taylor_literal_6terms_pi        0.7745180000000000
cos_taylor_literal_6terms           0.7218470000000000
cos_taylor_literal_10terms          1.1426369999999999
cos_taylor_running_6terms           0.6787260000000001
cos_taylor_running_8terms           0.9333120000000000
cos_taylor_running_10terms          1.1113160000000000
cos_taylor_running_16terms          1.7794570000000001
cos_table_1                         0.2014240000000000
cos_table_0_1                       0.2031010000000000
cos_table_0_01                      0.2034710000000000
cos_table_0_001                     0.2042740000000000
cos_table_0_0001                    0.2036450000000000
cos_table_1_LERP                    0.3107120000000000
cos_table_0_1_LERP                  0.3346280000000000
cos_table_0_01_LERP                 0.3342020000000000
cos_table_0_001_LERP                0.3342410000000000
cos_table_0_0001_LERP               0.3327000000000000
cos_math_h                          0.7697060000000000
cos_parabola                        0.1096080000000000
cos_parabola_extra                  0.1190130000000000
cos_parabola_opt                    0.1476240000000000
cos_parabola_extra_opt              0.2056920000000000

@milianw milianw force-pushed the parabola-approx branch 2 times, most recently from da856bd to a379bde Compare March 31, 2022 16:30
This covers both the original version by Nick from [1] and
the slightly modified and optimized versions that I came up with
a couple years ago over and shared at [2].

[1]: https://web.archive.org/web/20171228230531/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648
[2]: https://stackoverflow.com/a/28050328/35250

Note though that the original version [1] is only defined for the
ranges [-pi, pi] but the accuracy test harness here tests the
range [0, 2pi], which shines a bad light on these versions.

My version [2] doesn't suffer from this accuracy issue - you
can throw arbitrary input values at it. The performance is
pretty good too, the imprecise version is even the fastest
cos implementation on my machine now. The lookup table
implementations are directly behind it, but I have to note:
In real-world testing, cache eviction effects through
interactions with the rest of the application code will
further decrease the performance of lookup tables. Finally,
this code is easily autovectorized by compilers like icc and
even gcc.

On my machine, the results for all tests are as follows:

Compiler:
```
g++ (GCC) 11.2.0
compiling code with `-flto -march=native -O3`
```

CPU:
```
11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
```

output:
```
ACCURACY
cos_taylor_literal_4terms_naive     19.9880092736029695
cos_taylor_literal_6terms_naive     1.4652889617438571
cos_taylor_literal_6terms_2pi       1.4652889617438571
cos_taylor_literal_6terms_pi        0.0001004702941281
cos_taylor_literal_6terms           0.0001004702941279
cos_taylor_literal_10terms          0.0000000000756514
cos_taylor_running_6terms           0.0001004702941287
cos_taylor_running_8terms           0.0000001352604422
cos_taylor_running_10terms          0.0000000000756513
cos_taylor_running_16terms          0.0000000000000009
cos_table_1                         0.4944578886434219
cos_table_0_1                       0.0499943500331001
cos_table_0_01                      0.0049999938268771
cos_table_0_001                     0.0004999999109268
cos_table_0_0001                    0.0000499999164148
cos_table_1_LERP                    0.1147496616359112
cos_table_0_1_LERP                  0.0012496954434600
cos_table_0_01_LERP                 0.0000124999013960
cos_table_0_001_LERP                0.0000001249999969
cos_table_0_0001_LERP               0.0000000012500020
cos_math_h                          0.0000000000000000
cos_parabola                        15.9999999810748665
cos_parabola_extra                  63.2499998575883708
cos_parabola_opt                    0.0560095959541279
cos_parabola_extra_opt              0.0010902926026140

TIME
cos_taylor_literal_4terms_naive     0.3642890000000000
cos_taylor_literal_6terms_naive     0.5741620000000000
cos_taylor_literal_6terms_2pi       0.7144020000000000
cos_taylor_literal_6terms_pi        0.7745180000000000
cos_taylor_literal_6terms           0.7218470000000000
cos_taylor_literal_10terms          1.1426369999999999
cos_taylor_running_6terms           0.6787260000000001
cos_taylor_running_8terms           0.9333120000000000
cos_taylor_running_10terms          1.1113160000000000
cos_taylor_running_16terms          1.7794570000000001
cos_table_1                         0.2014240000000000
cos_table_0_1                       0.2031010000000000
cos_table_0_01                      0.2034710000000000
cos_table_0_001                     0.2042740000000000
cos_table_0_0001                    0.2036450000000000
cos_table_1_LERP                    0.3107120000000000
cos_table_0_1_LERP                  0.3346280000000000
cos_table_0_01_LERP                 0.3342020000000000
cos_table_0_001_LERP                0.3342410000000000
cos_table_0_0001_LERP               0.3327000000000000
cos_math_h                          0.7697060000000000
cos_parabola                        0.1096080000000000
cos_parabola_extra                  0.1190130000000000
cos_parabola_opt                    0.1476240000000000
cos_parabola_extra_opt              0.2056920000000000
```
This range is often much better to approximate for 0-symmetric
functions like cos.

I.e. compare:
```
[0, 2pi]:
ACCURACY
cos_taylor_literal_4terms_naive     19.9880092736029695
cos_taylor_literal_6terms_naive     1.4652889617438571
cos_taylor_literal_6terms_2pi       1.4652889617438571
...
cos_parabola                        15.9999999810748665
cos_parabola_extra                  63.2499998575883708

[-pi, pi]:
ACCURACY
cos_taylor_literal_4terms_naive     0.0239777873763927
cos_taylor_literal_6terms_naive     0.0001004702957825
cos_taylor_literal_6terms_2pi       0.0001004702957825
cos_parabola                        1.9999999739033667
cos_parabola_extra                  3.3499999445446544
```
This is basically the opposite of the new -r arg - we now increase
the value range to [-10pi, 10pi]. Anything out of 2pi will be
abysmal for naive functions that don't account for this, see:

```
./benchmarks -R
Cosine benchmark

ACCURACY
cos_taylor_literal_4terms_naive     22237893.9080788344144821
cos_taylor_literal_6terms_naive     1693743289.4118604660034180
cos_taylor_literal_6terms_2pi       1.4652888121124259
cos_taylor_literal_6terms_pi        1.4652886805053995
cos_taylor_literal_6terms           1.4652886805053986
cos_taylor_literal_10terms          0.0003012239456650
cos_taylor_running_6terms           0.0001004702740058
cos_taylor_running_8terms           0.0000001352604069
cos_taylor_running_10terms          0.0000000000756512
cos_taylor_running_16terms          0.0000000000000014
cos_table_1                         0.4944578224012448
cos_table_0_1                       0.0499941818532710
cos_table_0_01                      0.0049999017702790
cos_table_0_001                     0.0004999070288996
cos_table_0_0001                    0.0000499070860950
cos_table_1_LERP                    0.1147496616359124
cos_table_0_1_LERP                  0.0012496954434598
cos_table_0_01_LERP                 0.0000124999013927
cos_table_0_001_LERP                0.0000001249999925
cos_table_0_0001_LERP               0.0000000012499975
cos_math_h                          0.0000000000000000
cos_parabola                        399.9999987128100543
cos_parabola_extra                  36130.4497678874759004
cos_parabola_opt                    0.0560095959541315
cos_parabola_extra_opt              0.0010902926026148
```
No fancy compiler args are added, but can be set manually using
standard CMake procedures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant