don't call xpti if there are no subscribers #2230

pbalcer · 2024-10-22T12:23:52Z

This avoids the overhead of preparing data for xpti, and the cost of the xpti call itself, if nothing is subscribed to the ur.call xpti call stream.

github-actions · 2024-10-24T11:49:04Z

Compute Benchmarks level_zero run (with params: --env UR_ENABLE_LAYERS=UR_LAYER_TRACING --compare baseline --compare baseline_traced):
https://github.com/oneapi-src/unified-runtime/actions/runs/11498655354

github-actions · 2024-10-24T12:11:04Z

Compute Benchmarks level_zero run (--env UR_ENABLE_LAYERS=UR_LAYER_TRACING --compare baseline --compare baseline_traced):
https://github.com/oneapi-src/unified-runtime/actions/runs/11498655354
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate

Benchmark	This PR	baseline	baseline_traced
api_overhead_benchmark_sycl SubmitKernel out of order	24.291 μs	23.242000 μs	26.110 μs
api_overhead_benchmark_sycl SubmitKernel in order	23.946 μs	22.947000 μs	28.589 μs
api_overhead_benchmark_ur SubmitKernel out of order	14.893 μs	13.977000 μs	17.805 μs
api_overhead_benchmark_ur SubmitKernel in order	13.843 μs	13.669000 μs	14.498 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	2.251 μs	2.071000 μs	2.449 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.669 μs	1.658000 μs	1.721 μs

Relative perf in group memory (4): cannot calculate

Benchmark	This PR	baseline	baseline_traced
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	230.420 μs	225.294000 μs	265.568 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	119.622 μs	118.564000 μs	134.536 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.831 μs	5.784000 μs	6.032 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.116 μs	3.175 μs	2.919000 μs

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	baseline_traced	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	859.376 μs	857.858000 μs	858.421 μs

Relative perf in group Velocity-Bench (6): cannot calculate

Benchmark	This PR	baseline	baseline_traced
Velocity-Bench Hashtable	384.367 M keys/sec	388.603440 M keys/sec	386.834 M keys/sec
Velocity-Bench Bitcracker	35.186 s	35.199 s	35.165200 s
Velocity-Bench CudaSift	203.033000 ms	203.035 ms	204.129 ms
Velocity-Bench Easywave	230.000000 ms	230.000 ms	230.000 ms
Velocity-Bench QuickSilver	118.590 MMS/CTT	118.490 MMS/CTT	118.930000 MMS/CTT
Velocity-Bench Sobel Filter	525.762 ms	522.235000 ms	525.882 ms

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline	baseline_traced
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	289.587 ms	278.923000 ms	294.286 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	288.659 ms	282.848000 ms	296.029 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	282.375 ms	275.142000 ms	288.554 ms
Runtime_IndependentDAGTaskThroughput_SingleTask	282.547 ms	260.384000 ms	279.751 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	1845.266 ms	1679.876000 ms	1929.469 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1871.862 ms	1700.060000 ms	1964.097 ms
Runtime_DAGTaskThroughput_SingleTask	1826.101 ms	1655.006000 ms	1916.744 ms
Runtime_DAGTaskThroughput_BasicParallelFor	1909.025 ms	1722.549000 ms	2021.880 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline	baseline_traced
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	4.338000 ms	4.432 ms	4.465 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	4.469000 ms	4.501 ms	4.608 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	4.346000 ms	4.373 ms	4.474 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	4.379 ms	4.241000 ms	4.475 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	617.767000 ms	617.777 ms	617.862 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	617.040000 ms	617.079 ms	617.208 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	617.029000 ms	617.101 ms	617.199 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	617.770000 ms	617.807 ms	617.829 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	4.238 ms	4.220000 ms	4.282 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	4.354 ms	4.301000 ms	4.517 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	4.213000 ms	4.247 ms	4.291 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	4.480000 ms	4.495 ms	4.600 ms
MicroBench_LocalMem_fp32_4096	30.452 ms	30.420000 ms	30.447 ms
MicroBench_LocalMem_int32_4096	30.430 ms	30.393000 ms	30.433 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline	baseline_traced
Pattern_Reduction_Hierarchical_int32	16.620 ms	16.605 ms	15.222000 ms
Pattern_Reduction_NDRange_int32	16.548 ms	16.382 ms	15.836000 ms
Pattern_SegmentedReduction_NDRange_int64	6.182 ms	6.179000 ms	6.188 ms
Pattern_SegmentedReduction_Hierarchical_fp32	11.594 ms	11.592000 ms	11.595 ms
Pattern_SegmentedReduction_NDRange_int16	6.074 ms	6.065000 ms	6.075 ms
Pattern_SegmentedReduction_Hierarchical_int64	11.778 ms	11.786 ms	11.776000 ms
Pattern_SegmentedReduction_NDRange_int32	5.707000 ms	5.714 ms	5.721 ms
Pattern_SegmentedReduction_Hierarchical_int16	11.807 ms	11.802000 ms	11.811 ms
Pattern_SegmentedReduction_Hierarchical_int32	11.599 ms	11.598000 ms	11.603 ms
Pattern_SegmentedReduction_NDRange_fp32	5.712000 ms	5.714 ms	5.724 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline	baseline_traced
ScalarProduct_Hierarchical_fp32	9.938 ms	9.933000 ms	9.962 ms
ScalarProduct_Hierarchical_int64	11.320 ms	11.301000 ms	11.313 ms
ScalarProduct_NDRange_int32	6.302000 ms	6.314 ms	6.304 ms
ScalarProduct_NDRange_fp32	6.300 ms	6.286000 ms	6.297 ms
ScalarProduct_NDRange_int64	8.212 ms	8.189000 ms	8.197 ms
ScalarProduct_Hierarchical_int32	10.290 ms	10.285000 ms	10.289 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline	baseline_traced
USM_Allocation_latency_fp32_shared	0.065 ms	0.061000 ms	0.062 ms
USM_Allocation_latency_fp32_host	37.523000 ms	37.544 ms	38.343 ms
USM_Allocation_latency_fp32_device	0.067 ms	0.065000 ms	0.066 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	1.040 ms	1.023000 ms	1.070 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.202 ms	1.188000 ms	1.231 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.694 ms	1.626000 ms	1.723 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.848 ms	1.792000 ms	1.883 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline	baseline_traced
VectorAddition_int64	3.063 ms	3.066 ms	3.059000 ms
VectorAddition_fp32	1.446000 ms	1.467 ms	1.448 ms
VectorAddition_int32	1.477 ms	1.468000 ms	1.477 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline	baseline_traced
Polybench_2mm	1.224 ms	1.219000 ms	1.221 ms
Polybench_3mm	1.736 ms	1.731 ms	1.730000 ms
Polybench_Atax	6.883 ms	6.688000 ms	6.854 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	baseline_traced	Relative perf	Change	-
Kmeans_fp32	16.162 ms	16.165 ms	16.158000 ms

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	baseline	baseline_traced	Relative perf	Change	-
LinearRegressionCoeff_fp32	969.636 ms	969.435000 ms	969.581 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	baseline_traced	Relative perf	Change	-
MolecularDynamics	0.030 ms	0.026000 ms	0.032 ms

Details

Benchmark details - environment, command, output...

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.291,24.261,3.03%,23.534,228.056,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.946,23.929,1.47%,23.011,89.248,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.893,14.779,3.75%,14.236,72.544,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),13.843,13.827,4.68%,13.184,197.688,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),230.420,230.322,1.11%,226.828,456.071,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),119.622,119.489,2.06%,114.951,322.503,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.831,5.494,12.49%,5.043,52.252,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.116,3.124,2.72%,0.556,3.342,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.251,2.216,6.68%,2.002,13.078,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.669,1.664,3.52%,1.571,6.832,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),859.376,860.076,0.41%,825.380,870.188,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.349192 s
384.366865 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00405375 s
bitcracker - total time for whole calculation: 35.1859 s

Velocity-Bench CudaSift

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1210 1260 32.8537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1255 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1148 1266 31.1702% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1264 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1243 1276 33.7497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1263 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1276 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1269 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1274 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1201 1250 32.6093% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1274 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1216 1250 33.0166% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1271 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1260 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1260 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1131 1254 30.7087% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1267 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1267 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1131 1271 30.7087% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1093 1259 29.6769% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1139 1275 30.9259% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1278 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1193 1267 32.3921% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1266 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1275 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1257 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1128 1265 30.6272% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1249 1287 33.9126% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1262 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1254 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1260 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1264 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1261 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1256 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1276 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1272 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1271 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1085 1266 29.4597% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1262 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1267 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1096 1263 29.7583% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 203.033 ms

Velocity-Bench Easywave

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.30049+10)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING
QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.768760e-01 6.072850e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.385360e-01 7.455460e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.361730e-01 7.600260e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.730200e-01 8.255770e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.364300e-01 7.888260e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.340040e-01 7.645140e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.315010e-01 7.622300e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.313560e-01 7.848440e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.321480e-01 8.000620e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.330530e-01 7.574550e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.102e+07 1.102e+07 1.102e+07 0.000e+00 100.00
cycleInit 10 3.423e+06 3.423e+06 3.423e+06 0.000e+00 100.00
cycleTracking 10 7.596e+06 7.596e+06 7.596e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.924e+06 4.924e+06 4.924e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.932e+05 1.932e+05 1.932e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.090e+02 4.090e+02 4.090e+02 0.000e+00 100.00
Figure Of Merit 118.59 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.47468 s
sobelfilter - total time for whole calculation: 0.525762 s

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.290868', '0.289587', '0.288743', '0.288743 0.289488 0.289587 0.289857 0.296664', '0.003266', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.290830', '0.288659', '0.282681', '0.282681 0.283558 0.288659 0.292359 0.306891', '0.009802', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.287025', '0.282375', '0.281118', '0.281118 0.281247 0.282375 0.294182 0.296203', '0.007506', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.281219', '0.282547', '0.272972', '0.272972 0.273318 0.282547 0.287315 0.289945', '0.007834', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.845401', '1.845266', '1.840535', '1.840535 1.842311 1.845266 1.846372 1.852523', '0.004607', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.871766', '1.871862', '1.870502', '1.870502 1.871151 1.871862 1.871947 1.873369', '0.001071', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.824745', '1.826101', '1.819092', '1.819092 1.825260 1.826101 1.826109 1.827164', '0.003232', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.909756', '1.909025', '1.905838', '1.905838 1.908509 1.909025 1.911945 1.913464', '0.002999', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004347', '0.004338', '0.004314', '0.004314 0.004327 0.004338 0.004364 0.004392', '0.000031', '28.976416', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004458', '0.004469', '0.004409', '0.004409 0.004439 0.004469 0.004482 0.004492', '0.000034', '28.351602', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004332', '0.004346', '0.004231', '0.004231 0.004314 0.004346 0.004365 0.004405', '0.000065', '29.543578', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004368', '0.004379', '0.004341', '0.004341 0.004341 0.004379 0.004387 0.004391', '0.000025', '28.795626', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617738', '0.617767', '0.617631', '0.617631 0.617720 0.617767 0.617771 0.617799', '0.000066', '0.202386', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617042', '0.617040', '0.616990', '0.616990 0.617034 0.617040 0.617047 0.617101', '0.000040', '0.202597', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617005', '0.617029', '0.616926', '0.616926 0.617000 0.617029 0.617033 0.617039', '0.000047', '0.202617', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617775', '0.617770', '0.617731', '0.617731 0.617768 0.617770 0.617777 0.617832', '0.000037', '0.202354', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004666', '0.004238', '0.004142', '0.004142 0.004224 0.004238 0.004246 0.006478', '0.001014', '30.177171', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004359', '0.004354', '0.004296', '0.004296 0.004333 0.004354 0.004370 0.004440', '0.000053', '29.097308', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004218', '0.004213', '0.004190', '0.004190 0.004196 0.004213 0.004223 0.004266', '0.000030', '29.833341', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004439', '0.004480', '0.004324', '0.004324 0.004415 0.004480 0.004489 0.004489', '0.000072', '28.909749', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030445', '0.030452', '0.030394', '0.030394 0.030442 0.030452 0.030467 0.030471', '0.000031', '10265.258896', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030434', '0.030430', '0.030413', '0.030413 0.030419 0.030430 0.030453 0.030456', '0.000020', '10258.908547', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016583', '0.016620', '0.016377', '0.016377 0.016550 0.016620 0.016633 0.016737', '0.000133', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016447', '0.016548', '0.015969', '0.015969 0.016298 0.016548 0.016686 0.016735', '0.000317', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.009931', '0.009938', '0.009886', '0.009886 0.009926 0.009938 0.009949 0.009954', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011297', '0.011320', '0.011211', '0.011211 0.011277 0.011320 0.011336 0.011343', '0.000055', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006311', '0.006302', '0.006299', '0.006299 0.006301 0.006302 0.006307 0.006348', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006302', '0.006300', '0.006285', '0.006285 0.006292 0.006300 0.006305 0.006326', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008208', '0.008212', '0.008188', '0.008188 0.008195 0.008212 0.008218 0.008228', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010293', '0.010290', '0.010286', '0.010286 0.010287 0.010290 0.010297 0.010307', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006181', '0.006182', '0.006175', '0.006175 0.006177 0.006182 0.006185 0.006187', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011599', '0.011594', '0.011591', '0.011591 0.011593 0.011594 0.011596 0.011621', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006074', '0.006074', '0.006061', '0.006061 0.006066 0.006074 0.006082 0.006088', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011779', '0.011778', '0.011760', '0.011760 0.011764 0.011778 0.011780 0.011812', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005712', '0.005707', '0.005706', '0.005706 0.005706 0.005707 0.005719 0.005723', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011810', '0.011807', '0.011801', '0.011801 0.011804 0.011807 0.011815 0.011824', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011598', '0.011599', '0.011573', '0.011573 0.011597 0.011599 0.011600 0.011621', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005714', '0.005712', '0.005707', '0.005707 0.005711 0.005712 0.005714 0.005725', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000067', '0.000065', '0.000059', '0.000059 0.000065 0.000065 0.000065 0.000082', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037514', '0.037523', '0.037376', '0.037376 0.037437 0.037523 0.037615 0.037621', '0.000108', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000064', '0.000067', '0.000050', '0.000050 0.000067 0.000067 0.000068 0.000069', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001053', '0.001040', '0.001038', '0.001038 0.001039 0.001040 0.001043 0.001105', '0.000029', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001201', '0.001202', '0.001195', '0.001195 0.001199 0.001202 0.001202 0.001206', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.002064', '0.001694', '0.001669', '0.001669 0.001674 0.001694 0.001710 0.003572', '0.000843', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001859', '0.001848', '0.001844', '0.001844 0.001846 0.001848 0.001858 0.001901', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003064', '0.003063', '0.003039', '0.003039 0.003057 0.003063 0.003073 0.003087', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001449', '0.001446', '0.001441', '0.001441 0.001442 0.001446 0.001452 0.001465', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001476', '0.001477', '0.001439', '0.001439 0.001469 0.001477 0.001483 0.001510', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001219', '0.001224', '0.001209', '0.001209 0.001211 0.001224 0.001224 0.001230', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001732', '0.001736', '0.001715', '0.001715 0.001734 0.001736 0.001736 0.001740', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006818', '0.006883', '0.006700', '0.006700 0.006704 0.006883 0.006900 0.006904', '0.000106', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016165', '0.016162', '0.016151', '0.016151 0.016153 0.016162 0.016179 0.016180', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.971036', '0.969636', '0.969362', '0.969362 0.969591 0.969636 0.969869 0.976724', '0.003184', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

UR_ENABLE_LAYERS=UR_LAYER_TRACING

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000035', '0.000030', '0.000026', '0.000026 0.000027 0.000030 0.000030 0.000061', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

pbalcer · 2024-10-30T14:02:15Z

@oneapi-src/unified-runtime-maintain please review

Pulls in a few L0 and sanitizer changes, and an optimization for the tracing layer. oneapi-src/unified-runtime#2230

don't call xpti if there are no subscribers

9d1d3ac

This avoids the overhead of preparing data for xpti, and the cost of the xpti call itself, if nothing is subscribed to the ur.call xpti call stream.

pbalcer requested a review from a team as a code owner October 22, 2024 12:23

github-actions bot added the loader Loader related feature/bug label Oct 22, 2024

aarongreig approved these changes Oct 30, 2024

View reviewed changes

pbalcer added the v0.10.x Include in the v0.10.x release label Oct 31, 2024

pbalcer merged commit 6ade245 into oneapi-src:main Oct 31, 2024
74 of 77 checks passed

pbalcer deleted the check-xpti-enabled branch October 31, 2024 08:17

pbalcer mentioned this pull request Oct 31, 2024

Candidate for the v0.10.11 release tag #2271

Merged

kbenzie mentioned this pull request Oct 31, 2024

[UR] update to latest intel/llvm#15940

Merged

sommerlukas pushed a commit to intel/llvm that referenced this pull request Oct 31, 2024

[UR] update to latest (#15940)

53ca518

Pulls in a few L0 and sanitizer changes, and an optimization for the tracing layer. oneapi-src/unified-runtime#2230

don't call xpti if there are no subscribers #2230

don't call xpti if there are no subscribers #2230

Conversation

pbalcer commented Oct 22, 2024

github-actions bot commented Oct 24, 2024

github-actions bot commented Oct 24, 2024

Summary

Performance change in benchmark groups

Details

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

================================== Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0

================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

==================================
Retrieving Info

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!