Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] SHM based allreduce improvement for small message size #5571

Merged
merged 37 commits into from
Jun 12, 2024
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
5830213
add profile for naive all_reduce
delock May 8, 2024
fec2c9b
add multi parallel copy
delock May 9, 2024
4c642a1
alternative multi-parallel memcpy
delock May 10, 2024
ed748ce
use double buffer
delock May 13, 2024
e2312ec
change naive all reduce to symmetric
delock May 15, 2024
3e4b6c3
clean up
delock May 15, 2024
031b831
don't use coll_begin set in naive_all_reduce
delock May 15, 2024
2b15c22
seperate buffer for different algorithm
delock May 15, 2024
d5865aa
turn off profile
delock May 16, 2024
2588277
fix distributed naive allreduce
delock May 16, 2024
2f69443
cleanup
delock May 16, 2024
d1b2f09
Remove profiling code
delock May 21, 2024
0ba1f07
add back original naive_all_reduce
delock May 21, 2024
05fc250
remove naive_all_reduce
delock May 21, 2024
7bc708d
cleanup
delock May 21, 2024
af7d4fa
remove barrier which is not needed
delock May 21, 2024
b76937c
cleanup
delock May 21, 2024
0da84b6
can handle > 16 rank with efficiency
delock May 21, 2024
49c2153
Remove REPEAT
delock May 21, 2024
a3cc129
clean up state
delock May 23, 2024
7b41d2f
fix distributed allreduce perf
delock May 23, 2024
87accf4
remove unnecessary state change
delock May 23, 2024
a1ff77e
double buffer for distributed_naive_all_reduce
delock May 23, 2024
8af6113
fix result error
delock May 23, 2024
00a1c27
multiparallel copy #1
delock May 24, 2024
31b3643
single omp region multi parallel copy
delock May 24, 2024
8e5639e
add alternaive path
delock May 24, 2024
c0733cb
remove multi-memcpy which actually cause perf drop
delock May 24, 2024
b7713b6
fix distributed accuracy issue
delock May 27, 2024
3f088e4
cleanup
delock May 27, 2024
6c7ec55
fix format
delock May 27, 2024
dabae15
Merge branch 'master' into gma/symmetric_naive_allreduce
delock May 27, 2024
f063421
Merge branch 'master' into gma/symmetric_naive_allreduce
tjruwase Jun 9, 2024
608cf7c
Follow comments, remove unneeded codes and syncs.
delock Jun 12, 2024
1847a10
Merge branch 'master' into gma/symmetric_naive_allreduce
adk9 Jun 12, 2024
7f614cb
fix format
delock Jun 12, 2024
e1853f6
Merge branch 'master' into gma/symmetric_naive_allreduce
adk9 Jun 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading