Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work in progress: Parallel Benchmark #78

Open
wants to merge 79 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
d1bd46f
Created folder for parallelism
Paramuths Oct 6, 2024
bea23ac
Changed graph.py
Paramuths Oct 15, 2024
9ff8f14
Create parallel_col_atomic; Need to check for error in kernel-generat…
Paramuths Oct 15, 2024
70a8801
Create method for debugging
Paramuths Oct 15, 2024
900b791
Remove unnecessary fill
Paramuths Oct 15, 2024
88edfef
Create separate memory
Paramuths Oct 21, 2024
3e86559
Create slurm script
Oct 21, 2024
c23b28e
Separed lock and atomic
Paramuths Oct 21, 2024
17c755d
Run on slurm
Oct 21, 2024
2198e41
Plot result
Paramuths Oct 21, 2024
c38e42a
Fixed error function name alias
Paramuths Oct 22, 2024
a290133
Run test locally
Paramuths Oct 22, 2024
9800edd
Result slurm
Oct 22, 2024
3d48e65
Merge branch 'parallel' of github.com:willow-ahrens/FinchBenchmarks i…
Oct 22, 2024
945d21c
Fix merge error
Oct 22, 2024
1eacd07
Plot result
Paramuths Oct 22, 2024
15a7cbb
Plot result; remove log scale
Paramuths Oct 22, 2024
776354a
Create helper script
Paramuths Oct 22, 2024
c566bbf
Fix script
Paramuths Oct 22, 2024
5200ece
Run 12 threads
Oct 22, 2024
c7ee166
Update add
Paramuths Oct 22, 2024
b07a4fe
better atomics
willow-ahrens Oct 22, 2024
6220c2b
Merge commit 'refs/pull/73/head' of github.com:finch-tensor/FinchBenc…
willow-ahrens Oct 22, 2024
41091f7
Run on 12 threads
Oct 25, 2024
9a944ae
Implement separated_memory_add with concat
Paramuths Oct 29, 2024
dddccc8
Run SpAdd on 12 threads
Oct 29, 2024
6545d34
Run result
Paramuths Oct 30, 2024
058af8e
Parallel concatenate
Paramuths Nov 4, 2024
8e3595e
Separated sparselist for spmv
Paramuths Nov 5, 2024
b8125fe
Run result
Nov 5, 2024
6c4f932
Plot result with very sparse matrix
Paramuths Nov 5, 2024
1f947a2
Rename file
Paramuths Nov 5, 2024
2757208
Run result
Nov 5, 2024
b8ae05c
Run result final
Nov 5, 2024
11b8939
Plot result with new names
Paramuths Nov 5, 2024
dc8c09a
Make separate sparselist similar to others
Paramuths Nov 5, 2024
04f1f5a
Run result for modified separate sparselist
Nov 6, 2024
89ff013
Rename parallel default
Paramuths Nov 6, 2024
fc68b73
Merge branch 'parallel' of github.com:finch-tensor/FinchBenchmarks in…
Paramuths Nov 6, 2024
eebc7b7
Plot latest result
Paramuths Nov 6, 2024
79824e0
Fix atomic add error
Paramuths Nov 6, 2024
a6bbb2e
Run atomic
Nov 6, 2024
cc73820
Plot atomci result
Paramuths Nov 6, 2024
a1877de
Add result
Nov 12, 2024
98bc345
Fix error
Paramuths Nov 12, 2024
56385f1
Merge branch 'parallel' of github.com:finch-tensor/FinchBenchmarks in…
Paramuths Nov 12, 2024
ad7d472
Plot result
Paramuths Nov 12, 2024
d4496e9
Update Finch
Paramuths Nov 19, 2024
36d547b
Add gitignore
Nov 19, 2024
6694087
Add SparseRooflineBenchmark deps
Paramuths Nov 19, 2024
15420a1
Change submodule
Paramuths Nov 19, 2024
544e40a
Implement load balance static
Paramuths Nov 24, 2024
f64999b
Run load balance static
Nov 25, 2024
91d869a
Plot result
Paramuths Nov 25, 2024
27471ea
Test multiple grain size
Paramuths Nov 25, 2024
c5a0988
Change grain size
Paramuths Nov 25, 2024
4901c41
create lambda grain
Paramuths Nov 25, 2024
39b57e1
Run result
Nov 25, 2024
e8ce653
Plot result
Paramuths Nov 25, 2024
f4afc73
Add atomic element
Paramuths Nov 26, 2024
42978d6
Run result
Nov 27, 2024
d9c595e
plot result
Paramuths Nov 27, 2024
7ce04ae
Run atomic test
Nov 27, 2024
3ac51c1
Add split rows
Paramuths Nov 27, 2024
3c40550
Run split rows
Nov 27, 2024
2cf6b13
Create mutex_level, create skeleton for split_rows_grain
Paramuths Nov 28, 2024
da0f26d
Implement split_rows_grain
Paramuths Nov 28, 2024
779fd24
Run mutex and split grain size
Nov 29, 2024
6c4e456
Plot graph result
Paramuths Nov 29, 2024
77ebc1d
Create spmv_row_kernel & spmv_row_no_init
Paramuths Dec 10, 2024
c02d64e
Add result
Dec 11, 2024
9128fae
Rename algorithms; Use threadpinning
Paramuths Dec 17, 2024
cf86d61
Run result with threadpinning
Dec 17, 2024
c65359f
Add spmv_taco
Paramuths Dec 17, 2024
3e8913a
Merge branch 'parallel' of github.com:finch-tensor/FinchBenchmarks in…
Paramuths Dec 17, 2024
2366d25
Add spmv_taco
Paramuths Dec 17, 2024
203267a
Fix naming error for atomics and mutex
Paramuths Dec 17, 2024
0009a97
Rename permute to transpose
Paramuths Dec 17, 2024
7db3078
Add Finch.@barrier in split rows dynamic grain
Paramuths Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[submodule "deps/SparseRooflineBenchmark"]
path = deps/SparseRooflineBenchmark
url = https://github.com/SparseRooflineBenchmark/SparseRooflineBenchmark
url = git@github.com:Paramuths/SparseRooflineBenchmark.git
[submodule "deps/taco"]
path = deps/taco
url = https://github.com/tensor-compiler/taco
2 changes: 1 addition & 1 deletion deps/SparseRooflineBenchmark
Submodule SparseRooflineBenchmark updated 46 files
+35 −0 .github/workflows/CI.yml
+4 −3 .gitignore
+49 −0 CONTRIBUTING.md
+3 −3 Readme.md
+14 −0 build.sh
+2 −2 example/Makefile
+20 −0 example/run_slurm.sl
+212 −68 example/spmv.cpp
+170 −0 graph.py
+ graph/runtime/1024-0.1.png
+ graph/runtime/1048576-3000000.png
+ graph/runtime/8192-0.1.png
+ graph/runtime/FEMLAB-poisson3Da.png
+ graph/runtime/FEMLAB-poisson3Db.png
+ graph/speedup/1024-0.1.png
+ graph/speedup/1048576-3000000.png
+ graph/speedup/8192-0.1.png
+ graph/speedup/FEMLAB-poisson3Da.png
+ graph/speedup/FEMLAB-poisson3Db.png
+30 −0 result/1024-0.1/measurements.json
+30 −0 result/1048576-3000000/measurements.json
+30 −0 result/8192-0.1/measurements.json
+30 −0 result/FEMLAB/poisson3Da/measurements.json
+30 −0 result/FEMLAB/poisson3Db/measurements.json
+266 −0 result_julia/spmv_10_threads.json
+266 −0 result_julia/spmv_11_threads.json
+266 −0 result_julia/spmv_12_threads.json
+266 −0 result_julia/spmv_1_threads.json
+266 −0 result_julia/spmv_2_threads.json
+266 −0 result_julia/spmv_3_threads.json
+266 −0 result_julia/spmv_4_threads.json
+266 −0 result_julia/spmv_5_threads.json
+266 −0 result_julia/spmv_6_threads.json
+266 −0 result_julia/spmv_7_threads.json
+266 −0 result_julia/spmv_8_threads.json
+266 −0 result_julia/spmv_9_threads.json
+12 −0 src/Generator/Project.toml
+57 −0 src/Generator/generator.jl
+280 −0 src/Generator/spmv.jl
+53 −0 src/Generator/util.jl
+3 −2 src/Project.toml
+8 −3 src/benchmark.hpp
+0 −167 src/generate.jl
+44 −0 src/output_checker.jl
+7 −0 src/run_local.sh
+16 −0 src/run_slurm.sl
1 change: 1 addition & 0 deletions parallel/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Manifest.toml
13 changes: 13 additions & 0 deletions parallel/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[deps]
ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63"
Atomix = "a9b6321e-bd34-4604-b9c9-b65b8de01458"
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
Finch = "9177782c-1635-4eb9-9bfb-d9dfa25e6bce"
IterativeSolvers = "42fd0dbc-a981-5370-80f2-aaf504508153"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MatrixDepot = "b51810bb-c9f3-55da-ae3c-350fc1fbce05"
SuiteSparseGraphBLAS = "c2e53296-7b14-11e9-1210-bddfa8111e1d"
TensorMarket = "8b7d4fe7-0b45-4d0d-9dd8-5cc9b23b4b77"
ThreadPinning = "811555cd-349b-4f26-b7bc-1f208b848042"
2 changes: 2 additions & 0 deletions parallel/spadd/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Manifest.toml
slurm*
66 changes: 66 additions & 0 deletions parallel/spadd/concat.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
using Finch
using Base.Threads

function concat(A::Tensor{DenseLevel{Int64,SparseListLevel{Int64,Vector{Int64},Vector{Int64},ElementLevel{0.0,Float64,Int64,Vector{Float64}}}}}, B::Tensor{DenseLevel{Int64,SparseListLevel{Int64,Vector{Int64},Vector{Int64},ElementLevel{0.0,Float64,Int64,Vector{Float64}}}}})
@inbounds @fastmath(begin
A_lvl = A.lvl # DenseLevel
A_lvl_2 = A_lvl.lvl # SparseListLevel
A_lvl_ptr = A_lvl_2.ptr # Vector{Int64}
A_lvl_idx = A_lvl_2.idx # Vector{Int64}
# A_lvl_3 = A_lvl_2.lvl # ElementLevel
A_lvl_2_val = A_lvl_2.lvl.val # Vector{Float64}

B_lvl = B.lvl # DenseLevel
B_lvl_2 = B_lvl.lvl # SparseListLevel
B_lvl_ptr = B_lvl_2.ptr # Vector{Int64}
B_lvl_idx = B_lvl_2.idx # Vector{Int64}
# B_lvl_3 = B_lvl_2.lvl # ElementLevel
B_lvl_2_val = B_lvl_2.lvl.val # Vector{Float64}

# val
C_lvl_2_val = vcat(A_lvl_2_val, B_lvl_2_val)
C_lvl_3 = Element{0.0,Float64,Int64}(C_lvl_2_val)
# shape
A_lvl_2.shape == B_lvl_2.shape || throw(DimensionMismatch("mismatched dimension limits ($(A_lvl_2.shape) != $(B_lvl_2.shape))"))
C_lvl_shape = A_lvl_2.shape
# pointer
B_lvl_ptr_shift = B_lvl_ptr[2:end] .+ (last(A_lvl_ptr) - 1)
C_lvl_ptr = vcat(A_lvl_ptr, B_lvl_ptr_shift)
# index
C_lvl_idx = vcat(A_lvl_idx, B_lvl_idx)

C_lvl_2 = SparseList{Int64}(C_lvl_3, C_lvl_shape, C_lvl_ptr, C_lvl_idx)
C_lvl = Dense{Int64}(C_lvl_2, A_lvl.shape + B_lvl.shape)

C = Tensor(C_lvl)
return C
end)
end

function concat_vec(V::Vector{Tensor{DenseLevel{Int64,SparseListLevel{Int64,Vector{Int64},Vector{Int64},ElementLevel{0.0,Float64,Int64,Vector{Float64}}}}}}, nonzero_offset::Vector{Int64}, columns::Vector{Int64})
@inbounds @fastmath(begin
# val
B_lvl_2_val = Vector{Float64}(undef, last(nonzero_offset))
# shape
B_lvl_shape = V[1].lvl.lvl.shape
# pointer
B_lvl_ptr = Vector{Int64}(undef, last(columns) + 1)
B_lvl_ptr[1] = 1
# idx
B_lvl_idx = Vector{Int64}(undef, last(nonzero_offset))

Threads.@threads for i in 1:length(V)
B_lvl_2_val[nonzero_offset[i]+1:nonzero_offset[i+1]] .= V[i].lvl.lvl.lvl.val
B_lvl_idx[nonzero_offset[i]+1:nonzero_offset[i+1]] .= V[i].lvl.lvl.idx
B_lvl_ptr[columns[i]+2:columns[i+1]+1] = V[i].lvl.lvl.ptr[2:end] .+ nonzero_offset[i]
end
B_lvl_3 = Element{0.0,Float64,Int64}(B_lvl_2_val)

B_lvl_2 = SparseList{Int64}(B_lvl_3, B_lvl_shape, B_lvl_ptr, B_lvl_idx)
B_lvl = Dense{Int64}(B_lvl_2, mapreduce(A -> A.lvl.shape, +, V))

B = Tensor(B_lvl)
return B
end)
end

116 changes: 116 additions & 0 deletions parallel/spadd/graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
import json
from collections import defaultdict

import matplotlib.pyplot as plt

GRAPH_FOLDER = "graph"
SPEEDUP_FOLDER = "speedup"
RUNTIME_FOLDER = "runtime"
RESULTS_FOLDER = "results"

NTHREADS = [i + 1 for i in range(12)]

DEFAULT_METHOD = "serial_default_implementation"
METHODS = [
DEFAULT_METHOD,
# "parallel_col_separate_sparselist_results",
"separated_memory_concatenate_results",
]

DATASETS = [
{"uniform": ["1000-0.1", "10000-0.1", "1000000-3000000"]},
{"FEMLAB": ["FEMLAB-poisson3Da", "FEMLAB-poisson3Db"]},
]

COLORS = ["gray", "cadetblue", "saddlebrown", "navy", "black"]


def load_json():
combine_results = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: {})))
for n_thread in NTHREADS:
results_json = json.load(
open(f"{RESULTS_FOLDER}/spadd_{n_thread}_threads.json", "r")
)
for result in results_json:

matrix = (
result["matrix"].replace("/", "-")
if result["dataset"] != "uniform"
else f"{result['matrix']['size']}-{result['matrix']['sparsity']}"
)
combine_results[result["dataset"]][matrix][result["method"]][
result["n_threads"]
] = result["time"]

return combine_results


def plot_speedup_result(results, dataset, matrix, save_location):
plt.figure(figsize=(10, 6))
for method, color in zip(METHODS, COLORS):
plt.plot(
NTHREADS,
[
results[dataset][matrix][DEFAULT_METHOD][n_thread]
/ results[dataset][matrix][method][n_thread]
for n_thread in NTHREADS
],
label=method,
color=color,
marker="o",
linestyle="-",
linewidth=1,
)

plt.title(
f"SpAdd - Speedup for {dataset}: {matrix} (with respect to {DEFAULT_METHOD})"
)
# plt.yscale("log", base=10)
plt.xticks(NTHREADS)
plt.xlabel("Number of Threads")
plt.ylabel(f"Speedup")

plt.legend()
plt.savefig(save_location)


def plot_runtime_result(results, dataset, matrix, save_location):
plt.figure(figsize=(10, 6))
for method, color in zip(METHODS, COLORS):
plt.plot(
NTHREADS,
[results[dataset][matrix][method][n_thread] for n_thread in NTHREADS],
label=method,
color=color,
marker="o",
linestyle="-",
linewidth=1,
)

plt.title(f"SpAdd - Runtime for {dataset}: {matrix}")
# plt.yscale("log", base=10)
plt.xticks(NTHREADS)
plt.xlabel("Number of Threads")
plt.ylabel(f"Runtime (in seconds)")

plt.legend()
plt.savefig(save_location)


if __name__ == "__main__":
results = load_json()
for datasets in DATASETS:
for dataset, matrices in datasets.items():
for matrix in matrices:
plot_speedup_result(
results,
dataset,
matrix,
f"{GRAPH_FOLDER}/{SPEEDUP_FOLDER}/{dataset}-{matrix}.png",
)
plot_runtime_result(
results,
dataset,
matrix,
f"{GRAPH_FOLDER}/{RUNTIME_FOLDER}/{dataset}-{matrix}.png",
)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added parallel/spadd/graph/runtime/uniform-1000-0.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions parallel/spadd/parallel_col_separate_sparselist_results.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
using Finch
using BenchmarkTools


function parallel_col_separate_sparselist_results_add(A, B)
_A = Tensor(Dense(SparseList(Element(0.0))), A)
_B = Tensor(Dense(SparseList(Element(0.0))), B)
time = @belapsed begin
(_A, _B) = $(_A, _B)
global _C = Tensor(Dense(Separate(SparseList(Element(0.0)))))
@finch mode = :fast begin
_C .= 0
for j = parallel(_), i = _
_C[i, j] = _A[i, j] + _B[i, j]
end
end
end
return (; time=time, C=_C)
end
134 changes: 134 additions & 0 deletions parallel/spadd/results/spadd_10_threads.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
[
{
"time": 0.00759134,
"n_threads": 10,
"method": "serial_default_implementation",
"dataset": "FEMLAB",
"matrix": "FEMLAB/poisson3Da"
},
{
"time": 0.048477826,
"n_threads": 10,
"method": "parallel_col_separate_sparselist_results",
"dataset": "FEMLAB",
"matrix": "FEMLAB/poisson3Da"
},
{
"time": 0.003060656,
"n_threads": 10,
"method": "separated_memory_concatenate_results",
"dataset": "FEMLAB",
"matrix": "FEMLAB/poisson3Da"
},
{
"time": 0.140822392,
"n_threads": 10,
"method": "serial_default_implementation",
"dataset": "FEMLAB",
"matrix": "FEMLAB/poisson3Db"
},
{
"time": 0.343739416,
"n_threads": 10,
"method": "parallel_col_separate_sparselist_results",
"dataset": "FEMLAB",
"matrix": "FEMLAB/poisson3Db"
},
{
"time": 0.025895345,
"n_threads": 10,
"method": "separated_memory_concatenate_results",
"dataset": "FEMLAB",
"matrix": "FEMLAB/poisson3Db"
},
{
"time": 0.002042569,
"n_threads": 10,
"method": "serial_default_implementation",
"dataset": "uniform",
"matrix": {
"size": 1000,
"sparsity": 0.1
}
},
{
"time": 0.004510134,
"n_threads": 10,
"method": "parallel_col_separate_sparselist_results",
"dataset": "uniform",
"matrix": {
"size": 1000,
"sparsity": 0.1
}
},
{
"time": 0.000682558,
"n_threads": 10,
"method": "separated_memory_concatenate_results",
"dataset": "uniform",
"matrix": {
"size": 1000,
"sparsity": 0.1
}
},
{
"time": 0.671362515,
"n_threads": 10,
"method": "serial_default_implementation",
"dataset": "uniform",
"matrix": {
"size": 10000,
"sparsity": 0.1
}
},
{
"time": 0.122503556,
"n_threads": 10,
"method": "parallel_col_separate_sparselist_results",
"dataset": "uniform",
"matrix": {
"size": 10000,
"sparsity": 0.1
}
},
{
"time": 0.059856956,
"n_threads": 10,
"method": "separated_memory_concatenate_results",
"dataset": "uniform",
"matrix": {
"size": 10000,
"sparsity": 0.1
}
},
{
"time": 0.142538191,
"n_threads": 10,
"method": "serial_default_implementation",
"dataset": "uniform",
"matrix": {
"size": 1000000,
"sparsity": 3000000
}
},
{
"time": 3.792686368,
"n_threads": 10,
"method": "parallel_col_separate_sparselist_results",
"dataset": "uniform",
"matrix": {
"size": 1000000,
"sparsity": 3000000
}
},
{
"time": 0.036291917,
"n_threads": 10,
"method": "separated_memory_concatenate_results",
"dataset": "uniform",
"matrix": {
"size": 1000000,
"sparsity": 3000000
}
}
]
Loading