Add triton to kernel bench #18

PaliC · 2025-02-04T20:41:51Z

Adds triton support to kernel bench (not including CoT or multiturn).

The simple bit is just adding a triton prompt and support for switching from cuda to triton.

The more risky bit is evaluation. Because triton usually uses decorators like @triton.jit which are not supported in exec, now instead of taking the model from generated code using exec, we use a hacky solution of writing a temp file and importing directly from that file. Unfortunately, that temp file has to be deleted manually, but afaict (without just using the on disk source file for the generated code which we could do), there isn't really another way to cleanly run decorators outside of modifying the generated code.

Test Plan:

I ran the following commands and things seemed to work as expected:

python scripts/generate_samples.py run_name="test_hf_level_
1" dataset_src="huggingface" level="1" num_workers=50 server_type="deepseek" model_name="deepseek-coder" temperature=0 framework="cuda"

python scripts/eval_from_generations.py level=1 run_name="test_hf_level_1" dataset_src="local" level="1" num_gpu_devices=8 timeout=300

python scripts/generate_samples.py run_name="test_hf_level_
1_triton" dataset_src="huggingface" level="1" num_workers=50 server_type="deepseek" model_name="deepseek-coder" temperature=0 framework="triton"

scripts/eval_from_generations.py level=1 run_name="test_hf_level_1_triton" dataset_src="local" level="1" num_gpu_devices=8 timeout=300

python scripts/generate_and_eval_single_sample.py dataset_src="huggingface" level=2 problem_id=40

python scripts/generate_and_eval_single_sample.py dataset_src="huggingface" level=2 problem_id=40 framework="triton"

scripts/inspect_kernel_pytorch_profiler.py

scripts/generate_and_eval_single_sample.py

src/prompts/model_new_ex_add_triton.py

src/prompt_constructor.py

src/eval.py

src/prompt_constructor.py

README.md

Zacharias030 · 2025-02-05T22:20:00Z

Would it make sense to add the artefacts produced (ie, results) of one of the LLMs at least that can be obtained when executing via this PR to its description for reference?

For anything that we do with this new "KernelBench-Triton" variant, it might prove helpful to have some expected numbers to compare against in order to check correctness of this and subsequent implementations.

Appending a bunch of the generated prompt->response pairs of both success and failure cases may also help us convincing ourselves that everything makes sense as suggested in here. For example, if any particular model should obtain a 0% score, I think we should quickly rule out that a trivial issue is causing that.

PaliC and others added 3 commits February 4, 2025 12:40

Add triton to kernel bench

75d28eb

add triton

ec273d7

add triton

4b38254

PaliC marked this pull request as ready for review February 4, 2025 21:33