Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt format for multi-step set up #11

Open
Mayer123 opened this issue Jan 23, 2025 · 2 comments
Open

Prompt format for multi-step set up #11

Mayer123 opened this issue Jan 23, 2025 · 2 comments

Comments

@Mayer123
Copy link

Hi there,

Congratulations on the great work!
I'm curious how should one format the prompt in agent evaluation? i.e. when there are multiple turns of user provided observations and agent actions.
Currently I tried the format below and tested a few tasks on OSWorld, however the results don't look good. The PROMPT_FOR_COMPUTER is just the prompt provided in the readme. So basically I only used the most recent one screenshot and condensed all history actions in the user turn as well.

previous_actions = "\n".join([f"Step {i+1}: {action}" for i, action in enumerate(self.actions)]) if self.actions else "None"
messages = []
messages.append({
    "role": "system",
    "content": [{"type": "text", "text": "You are a helpful assistant."}]
})
messages.append({
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": PROMPT_FOR_COMPUTER + f"{instruction}\nPrevious Actions:\n{previous_actions}" )
        },
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/png;base64,{encode_image(obs['screenshot'])}"}
        }
    ],
})

Could you please share some insights here? Thank you!

@pooruss
Copy link
Collaborator

pooruss commented Jan 23, 2025

Hi! Here is a pseudocode for the multi step prompt logic:

# To predict third action
messages.append({
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": PROMPT_FOR_COMPUTER + f"{instruction}"
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_init
        },
        {
            "type": "text",
            "text": previous_actions[0],
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_state_0
        },
        {
            "type": "text",
            "text": previous_actions[1],
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_state_1
        }
    ],
})

Note that we apply the 'history 5' logic for multi step online tasks, as discussed in the report.
We will also share our infer codes later. Stay tuned!

@llajan
Copy link

llajan commented Jan 23, 2025

Congrats on the great work and thanks for the comments.
When trying the prompt format above, the 72B DPO model complains that "More than 1 image is unsupported". Could you kindly comment on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants