We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hpx::reverse() performs worse with par execution policy compared to seq execution policy. Mentioned in : https://devblogs.microsoft.com/cppblog/using-c17-parallel-algorithms-for-better-performance/
Median execution time for 100'000'000 elements Par : 9787040.0 Seq : 6424720.0
The performance difference is as mentioned in the microsoft blog
perf-stat output:
Par Performance counter stats for './par':
6,856.13 msec task-clock # 3.624 CPUs utilized 1,734 context-switches # 252.912 /sec 25 cpu-migrations # 3.646 /sec 1,051,732 page-faults # 153.400 K/sec 27,791,403,155 cycles # 4.054 GHz (83.55%) 68,433,024 stalled-cycles-frontend # 0.25% frontend cycles idle (83.52%) 524,413,414 stalled-cycles-backend # 1.89% backend cycles idle (83.16%) 21,075,188,028 instructions # 0.76 insn per cycle # 0.02 stalled cycles per insn (83.47%) 3,606,953,700 branches # 526.092 M/sec (83.21%) 3,547,443 branch-misses # 0.10% of all branches (83.12%) 1.891962963 seconds time elapsed 5.695293000 seconds user 1.143055000 seconds sys
Performance counter stats for './par':
426,032,401 cache-references 51,362,077 cache-misses # 12.056 % of all cache refs 24,980,933,230 cycles 18,996,816,372 instructions # 0.76 insn per cycle 3,239,373,302 branches 1,051,732 faults 29 migrations 1.605927855 seconds time elapsed 5.177040000 seconds user 0.884079000 seconds sys
Seq: Performance counter stats for './seq':
2,531.75 msec task-clock # 1.589 CPUs utilized 401 context-switches # 158.388 /sec 28 cpu-migrations # 11.060 /sec 1,051,725 page-faults # 415.414 K/sec 9,805,342,294 cycles # 3.873 GHz (83.76%) 13,256,239 stalled-cycles-frontend # 0.14% frontend cycles idle (83.86%) 457,911,915 stalled-cycles-backend # 4.67% backend cycles idle (83.12%) 23,989,030,223 instructions # 2.45 insn per cycle # 0.02 stalled cycles per insn (83.14%) 4,369,579,747 branches # 1.726 G/sec (83.40%) 3,370,898 branch-misses # 0.08% of all branches (82.81%) 1.593337873 seconds time elapsed 1.519018000 seconds user 1.014025000 seconds sys
Performance counter stats for './seq':
422,525,132 cache-references 20,034,041 cache-misses # 4.742 % of all cache refs 9,436,269,563 cycles 23,995,389,789 instructions # 2.54 insn per cycle 4,378,401,854 branches 1,051,726 faults 28 migrations 1.492262393 seconds time elapsed 1.418568000 seconds user 0.974770000 seconds sys
Significantly higher cache misses might be cause of performance slowdown
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Expected Behavior
hpx::reverse() performs worse with par execution policy compared to seq execution policy.
Mentioned in : https://devblogs.microsoft.com/cppblog/using-c17-parallel-algorithms-for-better-performance/
Median execution time for 100'000'000 elements
Par : 9787040.0
Seq : 6424720.0
The performance difference is as mentioned in the microsoft blog
perf-stat output:
Par
Performance counter stats for './par':
Performance counter stats for './par':
Seq:
Performance counter stats for './seq':
Performance counter stats for './seq':
Significantly higher cache misses might be cause of performance slowdown
The text was updated successfully, but these errors were encountered: