Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add triton to kernel bench #18

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

PaliC
Copy link

@PaliC PaliC commented Feb 4, 2025

Adds triton support to kernel bench (not including CoT or multiturn).

The simple bit is just adding a triton prompt and support for switching from cuda to triton.

The more risky bit is evaluation. Because triton usually uses decorators like @triton.jit which are not supported in exec, now instead of taking the model from generated code using exec, we use a hacky solution of writing a temp file and importing directly from that file. Unfortunately, that temp file has to be deleted manually, but afaict (without just using the on disk source file for the generated code which we could do), there isn't really another way to cleanly run decorators outside of modifying the generated code.

Test Plan:

I ran the following commands and things seemed to work as expected:

python scripts/generate_samples.py run_name="test_hf_level_
1" dataset_src="huggingface" level="1" num_workers=50 server_type="deepseek" model_name="deepseek-coder" temperature=0 framework="cuda"

python scripts/eval_from_generations.py level=1 run_name="test_hf_level_1" dataset_src="local" level="1" num_gpu_devices=8 timeout=300

python scripts/generate_samples.py run_name="test_hf_level_
1_triton" dataset_src="huggingface" level="1" num_workers=50 server_type="deepseek" model_name="deepseek-coder" temperature=0 framework="triton"

scripts/eval_from_generations.py level=1 run_name="test_hf_level_1_triton" dataset_src="local" level="1" num_gpu_devices=8 timeout=300

python scripts/generate_and_eval_single_sample.py dataset_src="huggingface" level=2 problem_id=40

python scripts/generate_and_eval_single_sample.py dataset_src="huggingface" level=2 problem_id=40 framework="triton"

@PaliC PaliC marked this pull request as ready for review February 4, 2025 21:33
src/prompt_constructor.py Outdated Show resolved Hide resolved
src/eval.py Show resolved Hide resolved
src/prompt_constructor.py Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@Zacharias030
Copy link

Zacharias030 commented Feb 5, 2025

Would it make sense to add the artefacts produced (ie, results) of one of the LLMs at least that can be obtained when executing via this PR to its description for reference?

For anything that we do with this new "KernelBench-Triton" variant, it might prove helpful to have some expected numbers to compare against in order to check correctness of this and subsequent implementations.

Appending a bunch of the generated prompt->response pairs of both success and failure cases may also help us convincing ourselves that everything makes sense as suggested in here. For example, if any particular model should obtain a 0% score, I think we should quickly rule out that a trivial issue is causing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants