We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The output of profile bandwidth is as follows: size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s
size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s
Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?
The output of profile matmul is as follows: device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186 device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026
device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488 device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924
which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops? Thanks
The text was updated successfully, but these errors were encountered:
Have you figured out this question, I have this question too
Sorry, something went wrong.
No branches or pull requests
The output of profile bandwidth is as follows:
size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s
size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s
size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s
size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s
size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s
size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s
Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?
The output of profile matmul is as follows:
device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186
device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026
device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488
device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924
which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops?
Thanks
The text was updated successfully, but these errors were encountered: