InterCode-ALFA

Description

A fork of the InterCode benchmark used to evaluate natural language to Bash command translation.
HuggingFace Dataset
PyPI Package

Installation

Install Docker Engine - Instructions
Configure Docker for non-sudo users - Instructions
Create a python virtual environment

apt install python3.12-venv
python3 -m venv icalfa-venv
source icalfa-venv/bin/activate

Install InterCode-ALFA

pip install icalfa datasets tqdm

[Optional] If you want to use a local LLM, install Ollama

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:70b

[Optional] If you want to use the embedding comparison method, install mxbai-embed-large

ollama pull mxbai-embed-large

Usage

Run the benchmark

import os
from icalfa import submit_command
from datasets import load_dataset
from tqdm import tqdm

# Store OpenAI key as environment variable 
os.environ['ICALFA_OPENAI_API_KEY'] = '...'

# Load dataset
dataset = load_dataset("westenfelder/NL2SH-ALFA", "test", split="train")

# Iterate through the dataset
score = 0
for index, row in tqdm(enumerate(dataset), total=len(dataset)):

    # Retrieve natural language prompt
    prompt = row['nl']

    # Convert natural language prompt to Bash command here

    # Submit Bash command for benchmark scoring. 0 = incorrect, 1 = correct
    score += submit_command(index=index, command="...")

    # Retrieve ground truth commands
    ground_truth_command = row['bash']
    ground_truth_command2 = row['bash2']

# Print the benchmark result
print(score/len(dataset))

submit_command parameters

# By default icalfa uses OpenAI's GPT-4 model and expects an API key
submit_command(index, command, eval_mode="openai", eval_param="gpt-4-0613")

# A local model can be used via Ollama and does not require an API key
submit_command(index, command, eval_mode="ollama", eval_param="llama3.1:70b")

# You can also test the original method used in Princeton's InterCode benchmark
submit_command(index, command, eval_mode="tfidf")

# An embedding based comparison method is also available
# This uses the mxbai-embed-large model via Ollama, with the eval_param specifying the similarity threshold
submit_command(index, command, eval_mode="embed", eval_param=0.75)

Manage Docker containers

# Stop containers
docker stop $(docker ps -a --filter "name=intercode*" -q)

# Delete containers
docker rm $(docker ps -a --filter "name=intercode*" -q)

Building

# pip install build twine
# update version in pyproject.toml and __init__.py
rm -rf dist
python3 -m build
python3 -m twine upload --repository pypi dist/*
pip install --upgrade icalfa

Credits

InterCode-ALFA is a fork of the InterCode benchmark developed by the Princeton NLP group.
InterCode Website
InterCode PyPI Package

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src/icalfa		src/icalfa
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
icalfa.png		icalfa.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InterCode-ALFA

Description

Installation

Usage

Building

Credits

About

Languages

License

westenfelder/InterCode-ALFA

Folders and files

Latest commit

History

Repository files navigation

InterCode-ALFA

Description

Installation

Usage

Building

Credits

About

Resources

License

Stars

Watchers

Forks

Languages