Skip to content

portal-cornell/robotouille

Repository files navigation


Logo

A challenging benchmark for testing LLM agent planning capabilities!

Paper | Project Website | Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Leaderboard
  4. Usage
  5. Contributing
  6. Built With
  7. Citation
  8. License
  9. Contact
  10. Acknowledgments

About The Project

Many robots working in many kitchens to cook many dishes

Robotouille is a challenging benchmark environment designed to test LLM agents on 30 complex long-horizon planning, including synchronous, asynchronous, and multi-agent scenarios. Each scenario comes with a curated dataset containing 10 unique tasks each with 10 procedurally generated instances, designed to evaluate reasoning over time delays, diverse long-horizon tasks, and coordination challenges.

Check out the following papers where we've used Robotouille!

(back to top)

Leaderboard

Strategy Synchronous (%) Asynchronous (%)
[ReAct] (gpt-4o) 47.0 11.0
[ReAct] (gpt-4o-mini) 11.0 0.00
[ReAct] (Qwen2-72B-Instruct) 7.00 2.00
[ReAct] (Qwen2-32B-Instruct) 6.00 1.00
[ReAct] (claude-3-haiku) 2.00 0.00
[ReAct] (Meta-Llama-3.1-70B-Instruct) 2.00 0.00
[ReAct] (Meta-Llama-3.1-8B-Instruct) 1.00 0.00

(back to top)

Getting Started

It is super easy to get started by trying out an existing environment or creating your own environment!

Setup

  1. Create and activate your virtual environment
    # Python venv module
    python -m venv robotouille
    source robotouille/bin/activate
    # Conda (must have anaconda installed)
    conda create --name robotouille python=3.9
    conda activate robotouille
    # Pyenv (must have pyenv and pyenv-virtualenv installed)
    pyenv install 3.9
    pyenv virtualenv 3.9 robotouille
  2. Install Robotouille and its dependencies
    pip install -e .
    pip install -e agents/prompt_builder/gpt-cost-estimator
  3. Run Robotouille!
    python main.py
    or import the simulator to any code by adding
    from robotouille import run_robotouille
    
    run_robotouille("original", "human", max_steps=10)

(back to top)

Usage

Running an LLM Agent

Refer to the README.md under agents/ for details on how to run an LLM agent in Robotouille.

Use Existing Environments

To play an existing environment, you can choose from the JSON files under environments/env_generator/examples/. For example, to play the high_level_lettuce_burger environment, simply run

python main.py ++game.environment_name=high_level_lettuce_burger

You can interact with the environment with keyboard and mouse, using the following keys:

  • Click to move the robot to stations and pick up or place down objects. You can also stack and unstack objects by clicking.
  • 'e' can be used to cut objects at cutting boards or cook patties at stoves.
  • 'space' can be used to stay in place (e.g. you are waiting for a patty to cook)

If you would like to procedurally generate an environment based off a JSON file, run the following commands

python main.py ++game.environment_name=high_level_lettuce_burger ++game.seed=42

Refer to the README.md under environments/env_generator/ for details on procedural generation.

Create your own Environment!

To create your own environment, add another example into environments/env_generator/examples/. Follow the README.md under environments/env_generator/ for details on how to customize the environment JSON. If you would like to modify the transitions of the environment entirely, refer to robotouille.json under environments. We are always adding more objects and transitions into Robotouille to increase the diversity of tasks. Please contact [email protected] for more details if interested in contributing or learning more.

(back to top)

Contributing

We appreciate all contributions to Robotouille. Bug fixes are always welcome, but we recommend opening an issue with feature requests with the Feature Request label or reaching out to us if you want to implement a new feature.

(back to top)

Built With

We build atop Gym environment and we render and take keyboard input using PyGame, building on the tutorial for making custom gym environments.

[Currently broken #37] We also support PDDLGym; we programatically translate Robotouille into a PDDL domain and problem file which PDDLGym converts into a Gym environment.

(back to top)

Citation

Please cite the Robotouille paper if you use our dataset or code in your research:

@inproceedings{
  gonzalez-pumariega2025robotouille,
  title={Robotouille: An Asynchronous Planning Benchmark for {LLM} Agents},
  author={Gonzalo Gonzalez-Pumariega and Leong Su Yean and Neha Sunkara and Sanjiban Choudhury},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=OhUoTMxFIH}
}

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Gonzalo Gonzalez - [email protected]

Project Link: https://github.com/portal-cornell/robotouille

(back to top)

Acknowledgments

We thank Nicole Thean (@nicolethean) for her help with creating the assets that bring Robotouille to life!

(back to top)