Prompt format for multi-step set up #11

Mayer123 · 2025-01-23T07:38:14Z

Hi there,

Congratulations on the great work!
I'm curious how should one format the prompt in agent evaluation? i.e. when there are multiple turns of user provided observations and agent actions.
Currently I tried the format below and tested a few tasks on OSWorld, however the results don't look good. The PROMPT_FOR_COMPUTER is just the prompt provided in the readme. So basically I only used the most recent one screenshot and condensed all history actions in the user turn as well.

previous_actions = "\n".join([f"Step {i+1}: {action}" for i, action in enumerate(self.actions)]) if self.actions else "None"
messages = []
messages.append({
    "role": "system",
    "content": [{"type": "text", "text": "You are a helpful assistant."}]
})
messages.append({
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": PROMPT_FOR_COMPUTER + f"{instruction}\nPrevious Actions:\n{previous_actions}" )
        },
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/png;base64,{encode_image(obs['screenshot'])}"}
        }
    ],
})

Could you please share some insights here? Thank you!

The text was updated successfully, but these errors were encountered:

pooruss · 2025-01-23T10:17:47Z

Hi! Here is a pseudocode for the multi step prompt logic:

# To predict third action
messages.append({
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": PROMPT_FOR_COMPUTER + f"{instruction}"
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_init
        },
        {
            "type": "text",
            "text": previous_actions[0],
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_state_0
        },
        {
            "type": "text",
            "text": previous_actions[1],
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_state_1
        }
    ],
})

Note that we apply the 'history 5' logic for multi step online tasks, as discussed in the report.
We will also share our infer codes later. Stay tuned!

llajan · 2025-01-23T19:13:37Z

Congrats on the great work and thanks for the comments.
When trying the prompt format above, the 72B DPO model complains that "More than 1 image is unsupported". Could you kindly comment on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt format for multi-step set up #11

Prompt format for multi-step set up #11

Mayer123 commented Jan 23, 2025

pooruss commented Jan 23, 2025 •

edited

Loading

llajan commented Jan 23, 2025

Prompt format for multi-step set up #11

Prompt format for multi-step set up #11

Comments

Mayer123 commented Jan 23, 2025

pooruss commented Jan 23, 2025 • edited Loading

llajan commented Jan 23, 2025

pooruss commented Jan 23, 2025 •

edited

Loading