GitHub - portal-cornell/robotouille

A challenging benchmark for testing LLM agent planning capabilities!

Paper | Project Website | Request Feature

Table of Contents

About The Project
Getting Started
- Setup
Leaderboard
Usage
- Use Existing Environments
- Create your own Environment!
Contributing
Built With
Citation
License
Contact
Acknowledgments

About The Project

Robotouille is a challenging benchmark environment designed to test LLM agents on 30 complex long-horizon planning, including synchronous, asynchronous, and multi-agent scenarios. Each scenario comes with a curated dataset containing 10 unique tasks each with 10 procedurally generated instances, designed to evaluate reasoning over time delays, diverse long-horizon tasks, and coordination challenges.

Check out the following papers where we've used Robotouille!

(back to top)

Leaderboard

Strategy	Synchronous (%)	Asynchronous (%)
[ReAct] (gpt-4o)	47.0	11.0
[ReAct] (gpt-4o-mini)	11.0	0.00
[ReAct] (Qwen2-72B-Instruct)	7.00	2.00
[ReAct] (Qwen2-32B-Instruct)	6.00	1.00
[ReAct] (claude-3-haiku)	2.00	0.00
[ReAct] (Meta-Llama-3.1-70B-Instruct)	2.00	0.00
[ReAct] (Meta-Llama-3.1-8B-Instruct)	1.00	0.00

(back to top)

Getting Started

It is super easy to get started by trying out an existing environment or creating your own environment!

Setup

Create and activate your virtual environment

# Python venv module
python -m venv robotouille
source robotouille/bin/activate
# Conda (must have anaconda installed)
conda create --name robotouille python=3.9
conda activate robotouille
# Pyenv (must have pyenv and pyenv-virtualenv installed)
pyenv install 3.9
pyenv virtualenv 3.9 robotouille

Install Robotouille and its dependencies

pip install -e .
pip install -e agents/prompt_builder/gpt-cost-estimator

Run Robotouille!

python main.py

or import the simulator to any code by adding

from robotouille import run_robotouille

run_robotouille("original", "human", max_steps=10)

(back to top)

Usage

Running an LLM Agent

Refer to the README.md under agents/ for details on how to run an LLM agent in Robotouille.

Use Existing Environments

To play an existing environment, you can choose from the JSON files under environments/env_generator/examples/. For example, to play the high_level_lettuce_burger environment, simply run

python main.py ++game.environment_name=high_level_lettuce_burger

You can interact with the environment with keyboard and mouse, using the following keys:

Click to move the robot to stations and pick up or place down objects. You can also stack and unstack objects by clicking.
'e' can be used to cut objects at cutting boards or cook patties at stoves.
'space' can be used to stay in place (e.g. you are waiting for a patty to cook)

If you would like to procedurally generate an environment based off a JSON file, run the following commands

python main.py ++game.environment_name=high_level_lettuce_burger ++game.seed=42

Refer to the README.md under environments/env_generator/ for details on procedural generation.

Create your own Environment!

To create your own environment, add another example into environments/env_generator/examples/. Follow the README.md under environments/env_generator/ for details on how to customize the environment JSON. If you would like to modify the transitions of the environment entirely, refer to robotouille.json under environments. We are always adding more objects and transitions into Robotouille to increase the diversity of tasks. Please contact [email protected] for more details if interested in contributing or learning more.

(back to top)

Contributing

We appreciate all contributions to Robotouille. Bug fixes are always welcome, but we recommend opening an issue with feature requests with the Feature Request label or reaching out to us if you want to implement a new feature.

(back to top)

Built With

We build atop Gym environment and we render and take keyboard input using PyGame, building on the tutorial for making custom gym environments.

[Currently broken #37] We also support PDDLGym; we programatically translate Robotouille into a PDDL domain and problem file which PDDLGym converts into a Gym environment.

(back to top)

Citation

Please cite the Robotouille paper if you use our dataset or code in your research:

@inproceedings{
  gonzalez-pumariega2025robotouille,
  title={Robotouille: An Asynchronous Planning Benchmark for {LLM} Agents},
  author={Gonzalo Gonzalez-Pumariega and Leong Su Yean and Neha Sunkara and Sanjiban Choudhury},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=OhUoTMxFIH}
}

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Gonzalo Gonzalez - [email protected]

Project Link: https://github.com/portal-cornell/robotouille

(back to top)

Acknowledgments

We thank Nicole Thean (@nicolethean) for her help with creating the assets that bring Robotouille to life!

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github		.github
README_assets		README_assets
agents		agents
assets		assets
backend		backend
conf		conf
domain		domain
environments		environments
pddlgym		pddlgym
renderer		renderer
robotouille		robotouille
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
figures.ipynb		figures.ipynb
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Leaderboard

Getting Started

Setup

Usage

Running an LLM Agent

Use Existing Environments

Create your own Environment!

Contributing

Built With

Citation

License

Contact

Acknowledgments

About

Releases 1

Packages

Contributors 8

Languages

License

portal-cornell/robotouille

Folders and files

Latest commit

History

Repository files navigation

About The Project

Leaderboard

Getting Started

Setup

Usage

Running an LLM Agent

Use Existing Environments

Create your own Environment!

Contributing

Built With

Citation

License

Contact

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 8

Languages

Packages