Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't call xpti if there are no subscribers #2230

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

pbalcer
Copy link
Contributor

@pbalcer pbalcer commented Oct 22, 2024

This avoids the overhead of preparing data for xpti, and the cost of the xpti call itself, if nothing is subscribed to the ur.call xpti call stream.

This avoids the overhead of preparing data for xpti,
and the cost of the xpti call itself, if nothing is
subscribed to the ur.call xpti call stream.
@pbalcer pbalcer requested a review from a team as a code owner October 22, 2024 12:23
@github-actions github-actions bot added the loader Loader related feature/bug label Oct 22, 2024
Copy link

Compute Benchmarks level_zero run (with params: --env UR_ENABLE_LAYERS=UR_LAYER_TRACING --compare baseline --compare baseline_traced):
https://github.com/oneapi-src/unified-runtime/actions/runs/11498655354

Copy link

Compute Benchmarks level_zero run (--env UR_ENABLE_LAYERS=UR_LAYER_TRACING --compare baseline --compare baseline_traced):
https://github.com/oneapi-src/unified-runtime/actions/runs/11498655354
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order 24.291 μs 23.242000 μs 26.110 μs
api_overhead_benchmark_sycl SubmitKernel in order 23.946 μs 22.947000 μs 28.589 μs
api_overhead_benchmark_ur SubmitKernel out of order 14.893 μs 13.977000 μs 17.805 μs
api_overhead_benchmark_ur SubmitKernel in order 13.843 μs 13.669000 μs 14.498 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.251 μs 2.071000 μs 2.449 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.669 μs 1.658000 μs 1.721 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 230.420 μs 225.294000 μs 265.568 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 119.622 μs 118.564000 μs 134.536 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.831 μs 5.784000 μs 6.032 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.116 μs 3.175 μs 2.919000 μs
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 859.376 μs 857.858000 μs 858.421 μs
Relative perf in group Velocity-Bench (6): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
Velocity-Bench Hashtable 384.367 M keys/sec 388.603440 M keys/sec 386.834 M keys/sec
Velocity-Bench Bitcracker 35.186 s 35.199 s 35.165200 s
Velocity-Bench CudaSift 203.033000 ms 203.035 ms 204.129 ms
Velocity-Bench Easywave 230.000000 ms 230.000 ms 230.000 ms
Velocity-Bench QuickSilver 118.590 MMS/CTT 118.490 MMS/CTT 118.930000 MMS/CTT
Velocity-Bench Sobel Filter 525.762 ms 522.235000 ms 525.882 ms
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 289.587 ms 278.923000 ms 294.286 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 288.659 ms 282.848000 ms 296.029 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 282.375 ms 275.142000 ms 288.554 ms
Runtime_IndependentDAGTaskThroughput_SingleTask 282.547 ms 260.384000 ms 279.751 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor 1845.266 ms 1679.876000 ms 1929.469 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1871.862 ms 1700.060000 ms 1964.097 ms
Runtime_DAGTaskThroughput_SingleTask 1826.101 ms 1655.006000 ms 1916.744 ms
Runtime_DAGTaskThroughput_BasicParallelFor 1909.025 ms 1722.549000 ms 2021.880 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 4.338000 ms 4.432 ms 4.465 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 4.469000 ms 4.501 ms 4.608 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 4.346000 ms 4.373 ms 4.474 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 4.379 ms 4.241000 ms 4.475 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 617.767000 ms 617.777 ms 617.862 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.040000 ms 617.079 ms 617.208 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.029000 ms 617.101 ms 617.199 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 617.770000 ms 617.807 ms 617.829 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.238 ms 4.220000 ms 4.282 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 4.354 ms 4.301000 ms 4.517 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.213000 ms 4.247 ms 4.291 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 4.480000 ms 4.495 ms 4.600 ms
MicroBench_LocalMem_fp32_4096 30.452 ms 30.420000 ms 30.447 ms
MicroBench_LocalMem_int32_4096 30.430 ms 30.393000 ms 30.433 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
Pattern_Reduction_Hierarchical_int32 16.620 ms 16.605 ms 15.222000 ms
Pattern_Reduction_NDRange_int32 16.548 ms 16.382 ms 15.836000 ms
Pattern_SegmentedReduction_NDRange_int64 6.182 ms 6.179000 ms 6.188 ms
Pattern_SegmentedReduction_Hierarchical_fp32 11.594 ms 11.592000 ms 11.595 ms
Pattern_SegmentedReduction_NDRange_int16 6.074 ms 6.065000 ms 6.075 ms
Pattern_SegmentedReduction_Hierarchical_int64 11.778 ms 11.786 ms 11.776000 ms
Pattern_SegmentedReduction_NDRange_int32 5.707000 ms 5.714 ms 5.721 ms
Pattern_SegmentedReduction_Hierarchical_int16 11.807 ms 11.802000 ms 11.811 ms
Pattern_SegmentedReduction_Hierarchical_int32 11.599 ms 11.598000 ms 11.603 ms
Pattern_SegmentedReduction_NDRange_fp32 5.712000 ms 5.714 ms 5.724 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
ScalarProduct_Hierarchical_fp32 9.938 ms 9.933000 ms 9.962 ms
ScalarProduct_Hierarchical_int64 11.320 ms 11.301000 ms 11.313 ms
ScalarProduct_NDRange_int32 6.302000 ms 6.314 ms 6.304 ms
ScalarProduct_NDRange_fp32 6.300 ms 6.286000 ms 6.297 ms
ScalarProduct_NDRange_int64 8.212 ms 8.189000 ms 8.197 ms
ScalarProduct_Hierarchical_int32 10.290 ms 10.285000 ms 10.289 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
USM_Allocation_latency_fp32_shared 0.065 ms 0.061000 ms 0.062 ms
USM_Allocation_latency_fp32_host 37.523000 ms 37.544 ms 38.343 ms
USM_Allocation_latency_fp32_device 0.067 ms 0.065000 ms 0.066 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.040 ms 1.023000 ms 1.070 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.202 ms 1.188000 ms 1.231 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.694 ms 1.626000 ms 1.723 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.848 ms 1.792000 ms 1.883 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
VectorAddition_int64 3.063 ms 3.066 ms 3.059000 ms
VectorAddition_fp32 1.446000 ms 1.467 ms 1.448 ms
VectorAddition_int32 1.477 ms 1.468000 ms 1.477 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
Polybench_2mm 1.224 ms 1.219000 ms 1.221 ms
Polybench_3mm 1.736 ms 1.731 ms 1.730000 ms
Polybench_Atax 6.883 ms 6.688000 ms 6.854 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
Kmeans_fp32 16.162 ms 16.165 ms 16.158000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
LinearRegressionCoeff_fp32 969.636 ms 969.435000 ms 969.581 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline baseline_traced Relative perf Change -
MolecularDynamics 0.030 ms 0.026000 ms 0.032 ms

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.291,24.261,3.03%,23.534,228.056,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.946,23.929,1.47%,23.011,89.248,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.893,14.779,3.75%,14.236,72.544,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),13.843,13.827,4.68%,13.184,197.688,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),230.420,230.322,1.11%,226.828,456.071,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),119.622,119.489,2.06%,114.951,322.503,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.831,5.494,12.49%,5.043,52.252,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.116,3.124,2.72%,0.556,3.342,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.251,2.216,6.68%,2.002,13.078,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.669,1.664,3.52%,1.571,6.832,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),859.376,860.076,0.41%,825.380,870.188,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.349192 s
384.366865 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00405375 s
bitcracker - total time for whole calculation: 35.1859 s

Velocity-Bench CudaSift

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1210 1260 32.8537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1255 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1148 1266 31.1702% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1264 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1243 1276 33.7497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1263 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1276 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1269 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1274 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1201 1250 32.6093% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1274 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1216 1250 33.0166% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1271 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1260 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1260 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1131 1254 30.7087% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1267 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1267 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1131 1271 30.7087% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1093 1259 29.6769% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1139 1275 30.9259% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1278 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1193 1267 32.3921% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1266 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1275 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1257 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1128 1265 30.6272% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1249 1287 33.9126% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1262 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1254 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1260 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1264 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1261 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1256 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1276 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1272 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1271 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1085 1266 29.4597% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1262 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1267 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1096 1263 29.7583% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 203.033 ms

Velocity-Bench Easywave

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.30049+10)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING
QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.768760e-01 6.072850e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.385360e-01 7.455460e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.361730e-01 7.600260e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.730200e-01 8.255770e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.364300e-01 7.888260e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.340040e-01 7.645140e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.315010e-01 7.622300e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.313560e-01 7.848440e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.321480e-01 8.000620e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.330530e-01 7.574550e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.102e+07 1.102e+07 1.102e+07 0.000e+00 100.00
cycleInit 10 3.423e+06 3.423e+06 3.423e+06 0.000e+00 100.00
cycleTracking 10 7.596e+06 7.596e+06 7.596e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.924e+06 4.924e+06 4.924e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.932e+05 1.932e+05 1.932e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.090e+02 4.090e+02 4.090e+02 0.000e+00 100.00
Figure Of Merit 118.59 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.47468 s
sobelfilter - total time for whole calculation: 0.525762 s

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.290868', '0.289587', '0.288743', '0.288743 0.289488 0.289587 0.289857 0.296664', '0.003266', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.290830', '0.288659', '0.282681', '0.282681 0.283558 0.288659 0.292359 0.306891', '0.009802', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.287025', '0.282375', '0.281118', '0.281118 0.281247 0.282375 0.294182 0.296203', '0.007506', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.281219', '0.282547', '0.272972', '0.272972 0.273318 0.282547 0.287315 0.289945', '0.007834', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.845401', '1.845266', '1.840535', '1.840535 1.842311 1.845266 1.846372 1.852523', '0.004607', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.871766', '1.871862', '1.870502', '1.870502 1.871151 1.871862 1.871947 1.873369', '0.001071', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.824745', '1.826101', '1.819092', '1.819092 1.825260 1.826101 1.826109 1.827164', '0.003232', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.909756', '1.909025', '1.905838', '1.905838 1.908509 1.909025 1.911945 1.913464', '0.002999', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004347', '0.004338', '0.004314', '0.004314 0.004327 0.004338 0.004364 0.004392', '0.000031', '28.976416', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004458', '0.004469', '0.004409', '0.004409 0.004439 0.004469 0.004482 0.004492', '0.000034', '28.351602', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004332', '0.004346', '0.004231', '0.004231 0.004314 0.004346 0.004365 0.004405', '0.000065', '29.543578', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004368', '0.004379', '0.004341', '0.004341 0.004341 0.004379 0.004387 0.004391', '0.000025', '28.795626', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617738', '0.617767', '0.617631', '0.617631 0.617720 0.617767 0.617771 0.617799', '0.000066', '0.202386', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617042', '0.617040', '0.616990', '0.616990 0.617034 0.617040 0.617047 0.617101', '0.000040', '0.202597', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617005', '0.617029', '0.616926', '0.616926 0.617000 0.617029 0.617033 0.617039', '0.000047', '0.202617', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617775', '0.617770', '0.617731', '0.617731 0.617768 0.617770 0.617777 0.617832', '0.000037', '0.202354', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004666', '0.004238', '0.004142', '0.004142 0.004224 0.004238 0.004246 0.006478', '0.001014', '30.177171', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004359', '0.004354', '0.004296', '0.004296 0.004333 0.004354 0.004370 0.004440', '0.000053', '29.097308', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004218', '0.004213', '0.004190', '0.004190 0.004196 0.004213 0.004223 0.004266', '0.000030', '29.833341', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004439', '0.004480', '0.004324', '0.004324 0.004415 0.004480 0.004489 0.004489', '0.000072', '28.909749', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030445', '0.030452', '0.030394', '0.030394 0.030442 0.030452 0.030467 0.030471', '0.000031', '10265.258896', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030434', '0.030430', '0.030413', '0.030413 0.030419 0.030430 0.030453 0.030456', '0.000020', '10258.908547', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016583', '0.016620', '0.016377', '0.016377 0.016550 0.016620 0.016633 0.016737', '0.000133', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016447', '0.016548', '0.015969', '0.015969 0.016298 0.016548 0.016686 0.016735', '0.000317', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.009931', '0.009938', '0.009886', '0.009886 0.009926 0.009938 0.009949 0.009954', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011297', '0.011320', '0.011211', '0.011211 0.011277 0.011320 0.011336 0.011343', '0.000055', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006311', '0.006302', '0.006299', '0.006299 0.006301 0.006302 0.006307 0.006348', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006302', '0.006300', '0.006285', '0.006285 0.006292 0.006300 0.006305 0.006326', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008208', '0.008212', '0.008188', '0.008188 0.008195 0.008212 0.008218 0.008228', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010293', '0.010290', '0.010286', '0.010286 0.010287 0.010290 0.010297 0.010307', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006181', '0.006182', '0.006175', '0.006175 0.006177 0.006182 0.006185 0.006187', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011599', '0.011594', '0.011591', '0.011591 0.011593 0.011594 0.011596 0.011621', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006074', '0.006074', '0.006061', '0.006061 0.006066 0.006074 0.006082 0.006088', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011779', '0.011778', '0.011760', '0.011760 0.011764 0.011778 0.011780 0.011812', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005712', '0.005707', '0.005706', '0.005706 0.005706 0.005707 0.005719 0.005723', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011810', '0.011807', '0.011801', '0.011801 0.011804 0.011807 0.011815 0.011824', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011598', '0.011599', '0.011573', '0.011573 0.011597 0.011599 0.011600 0.011621', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005714', '0.005712', '0.005707', '0.005707 0.005711 0.005712 0.005714 0.005725', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000067', '0.000065', '0.000059', '0.000059 0.000065 0.000065 0.000065 0.000082', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037514', '0.037523', '0.037376', '0.037376 0.037437 0.037523 0.037615 0.037621', '0.000108', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000064', '0.000067', '0.000050', '0.000050 0.000067 0.000067 0.000068 0.000069', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001053', '0.001040', '0.001038', '0.001038 0.001039 0.001040 0.001043 0.001105', '0.000029', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001201', '0.001202', '0.001195', '0.001195 0.001199 0.001202 0.001202 0.001206', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.002064', '0.001694', '0.001669', '0.001669 0.001674 0.001694 0.001710 0.003572', '0.000843', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001859', '0.001848', '0.001844', '0.001844 0.001846 0.001848 0.001858 0.001901', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003064', '0.003063', '0.003039', '0.003039 0.003057 0.003063 0.003073 0.003087', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001449', '0.001446', '0.001441', '0.001441 0.001442 0.001446 0.001452 0.001465', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001476', '0.001477', '0.001439', '0.001439 0.001469 0.001477 0.001483 0.001510', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001219', '0.001224', '0.001209', '0.001209 0.001211 0.001224 0.001224 0.001230', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001732', '0.001736', '0.001715', '0.001715 0.001734 0.001736 0.001736 0.001740', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006818', '0.006883', '0.006700', '0.006700 0.006704 0.006883 0.006900 0.006904', '0.000106', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016165', '0.016162', '0.016151', '0.016151 0.016153 0.016162 0.016179 0.016180', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.971036', '0.969636', '0.969362', '0.969362 0.969591 0.969636 0.969869 0.976724', '0.003184', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000035', '0.000030', '0.000026', '0.000026 0.000027 0.000030 0.000030 0.000061', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

@pbalcer
Copy link
Contributor Author

pbalcer commented Oct 30, 2024

@oneapi-src/unified-runtime-maintain please review

@pbalcer pbalcer added the v0.10.x Include in the v0.10.x release label Oct 31, 2024
@pbalcer pbalcer merged commit 6ade245 into oneapi-src:main Oct 31, 2024
74 of 77 checks passed
@pbalcer pbalcer deleted the check-xpti-enabled branch October 31, 2024 08:17
sommerlukas pushed a commit to intel/llvm that referenced this pull request Oct 31, 2024
Pulls in a few L0 and sanitizer changes, and an optimization for the
tracing layer.

oneapi-src/unified-runtime#2230
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
loader Loader related feature/bug v0.10.x Include in the v0.10.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants