Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: nvidia-smi not found #515

Open
joerowell opened this issue Feb 20, 2024 · 6 comments
Open

[Issue]: nvidia-smi not found #515

joerowell opened this issue Feb 20, 2024 · 6 comments

Comments

@joerowell
Copy link

Problem Description

The estimate_matmul functionality in Triton relies rather heavily on the underlying stats of the GPU. On CUDA platforms, this functionality is realised by calling nvidia-smi and then parsing the results. I see that this code is still present in this fork of Triton:

def nvsmi(attrs):

Would it be possible to get support added for rocm-smi here instead? This makes autotuning Triton kernels for GEMM etc much easier.

Operating System

CPU

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@zhanglx13
Copy link

@joerowell We can add it later after we merge this fork with upstream.
For gemm tuning, we have a dedicated script to tune gemm kernels. You can refer to this README for more info and let me know if you have more questions.

@zhanglx13
Copy link

@jataylo @micmelesse This seems to be related to the nvsmi related test failure. What is the status of that test?

@MARD1NO
Copy link

MARD1NO commented Dec 25, 2024

@jataylo @micmelesse This seems to be related to the nvsmi related test failure. What is the status of that test?

Hi, any update of rocm-smi version of estimate_matmul function ? I also encounter this problem

@jataylo
Copy link

jataylo commented Dec 25, 2024

@jataylo @micmelesse This seems to be related to the nvsmi related test failure. What is the status of that test?

We got around this in inductor by hard coding flops for specific arch when we required this, @zhanglx13 @micmelesse may have to consider writing amdsmi equivalents in triton.

@MARD1NO
Copy link

MARD1NO commented Dec 25, 2024

@jataylo @micmelesse This seems to be related to the nvsmi related test failure. What is the status of that test?

We got around this in inductor by hard coding flops for specific arch when we required this, @zhanglx13 @micmelesse may have to consider writing amdsmi equivalents in triton.

Can you give me the relevant code paths?

@jataylo
Copy link

jataylo commented Dec 31, 2024

@MARD1NO Not sure if this helps at all https://github.com/pytorch/pytorch/blob/main/torch/_utils_internal.py#L208

This how we had to get around the nvsmi call from triton previously, by hard coding max-clock rates for our gfx arches. Not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants