resnet50 fp16 No tensorcore was used #19630

hi20240217 · 2025-01-08T09:53:14Z

What happened?

resnet50 fp16 No tensorcore was used

main_graph_async_dispatch_78_matmul_1x1000x2048_f16xf16xf32 (1000, 1, 1)x(128, 1, 1), Context 1, Stream 13, Device 0, CC 8.6
Warning: Data collection happened without fixed GPU frequencies. Profiling results may be inconsistent.
Section: Command line profiler metrics
------------------------------------------------------------------------------------ ------------- ------------
Metric Name Metric Unit Metric Value
------------------------------------------------------------------------------------ ------------- ------------
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed % 73.87
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed % 13.92
sm__cycles_elapsed.avg cycle 61517.49
sm__cycles_elapsed.max cycle 61774
sm__cycles_elapsed.min cycle 61365
sm__cycles_elapsed.sum cycle 5044434
sm__cycles_elapsed.avg.per_second cycle/nsecond 1.95
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.max.pct_of_peak_sustained_elapsed (!) n/a
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.max.per_second (!) n/a
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.min.pct_of_peak_sustained_elapsed (!) n/a
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum (!) n/a
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum.pct_of_peak_sustained_elapsed (!) n/a
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum.peak_sustained (!) n/a
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum.per_second (!) n/a
sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active % 0
sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed % 0
------------------------------------------------------------------------------------ ------------- ------------

Steps to reproduce your issue

cat> iree_forward_resnet50.py<<-'EOF'  
import numpy as np
import iree.turbine.aot as aot
import torch
import torchvision.models as models
import iree.runtime as rt
import time

input_tensor = torch.ones((1,3,224,224),dtype=torch.half)
model = models.resnet50(pretrained=False).half()
model.eval() 

export_output = aot.export(model, input_tensor)
export_output.save_mlir("resnet50.mlir")
compiled_binary = export_output.compile(save_to=None,target_backends="cuda")

config = rt.Config("cuda://GPU-b915ad16-a0ba-3cc2-faac-2b6397113fa0")
vmm = rt.load_vm_module(
	rt.VmModule.copy_buffer(config.vm_instance, compiled_binary.map_memory()),
	config)
	
# warm up
for i in range(3):
    y = vmm.main(input_tensor)

# benchmark
t0=time.time()
for i in range(1000):
    y = vmm.main(input_tensor)
t1=time.time()
print("{:.2f} FPS".format(1000/(t1-t0)))

EOF
python iree_forward_resnet50.py
ncu --clock-control=none --metrics \
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.max.pct_of_peak_sustained_elapsed,\
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.min.pct_of_peak_sustained_elapsed,\
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum.pct_of_peak_sustained_elapsed,\
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum,\
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum.peak_sustained,\
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.sum.per_second,\
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off.max.per_second,\
sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed,\
sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active,\
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed,\
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed,\
sm__cycles_elapsed.avg.per_second,\
sm__cycles_elapsed python iree_forward.py

What component(s) does this issue relate to?

No response

Version information

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

ScottTodd · 2025-01-08T16:18:38Z

Just an idea: have you tried using a specific CUDA target?

https://iree.dev/guides/deployment-configurations/gpu-cuda/#compile-a-program

Canonically a CUDA target (iree-cuda-target) matching the LLVM NVPTX backend of the form sm_<arch_number> is needed to compile towards each GPU architecture. If no architecture is specified then we will default to sm_60.

That --iree-cuda-target option should be set prior to this call:

compiled_binary = export_output.compile(save_to=None,target_backends="cuda")

I think like this? (e.g. for A100, with is sm_80)

export_output.session.set_flags("--iree-cuda-target=sm_80")

hi20240217 · 2025-01-09T04:52:00Z

have no effect

cat> iree_forward_resnet50.py<<-'EOF'  
import numpy as np
import iree.turbine.aot as aot
import torch
import torchvision.models as models
import iree.runtime as rt
import time

input_tensor = torch.ones((1,3,224,224),dtype=torch.half)
model = models.resnet50(pretrained=False).half()
model.eval() 

export_output = aot.export(model, input_tensor)
export_output.save_mlir("resnet50.mlir")
export_output.session.set_flags("--iree-cuda-target=rtx3090")
compiled_binary = export_output.compile(save_to=None,target_backends="cuda")

config = rt.Config("cuda://GPU-b915ad16-a0ba-3cc2-faac-2b6397113fa0")
vmm = rt.load_vm_module(
	rt.VmModule.copy_buffer(config.vm_instance, compiled_binary.map_memory()),
	config)
	
# warm up
for i in range(3):
    y = vmm.main(input_tensor)

# benchmark
t0=time.time()
for i in range(1000):
    y = vmm.main(input_tensor)
t1=time.time()
print("{:.2f} FPS".format(1000/(t1-t0)))

EOF
python iree_forward_resnet50.py

hi20240217 added the bug 🐞 Something isn't working label Jan 8, 2025

ScottTodd added codegen/nvvm NVVM code generation compiler backend support Request support or ask a question labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resnet50 fp16 No tensorcore was used #19630

resnet50 fp16 No tensorcore was used #19630

hi20240217 commented Jan 8, 2025

ScottTodd commented Jan 8, 2025

hi20240217 commented Jan 9, 2025

resnet50 fp16 No tensorcore was used #19630

resnet50 fp16 No tensorcore was used #19630

Comments

hi20240217 commented Jan 8, 2025

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

ScottTodd commented Jan 8, 2025

hi20240217 commented Jan 9, 2025