You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Congratulations on the great work!
I'm curious how should one format the prompt in agent evaluation? i.e. when there are multiple turns of user provided observations and agent actions.
Currently I tried the format below and tested a few tasks on OSWorld, however the results don't look good. The PROMPT_FOR_COMPUTER is just the prompt provided in the readme. So basically I only used the most recent one screenshot and condensed all history actions in the user turn as well.
previous_actions = "\n".join([f"Step {i+1}: {action}" for i, action in enumerate(self.actions)]) if self.actions else "None"
messages = []
messages.append({
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
})
messages.append({
"role": "user",
"content": [
{
"type": "text",
"text": PROMPT_FOR_COMPUTER + f"{instruction}\nPrevious Actions:\n{previous_actions}" )
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{encode_image(obs['screenshot'])}"}
}
],
})
Could you please share some insights here? Thank you!
The text was updated successfully, but these errors were encountered:
Note that we apply the 'history 5' logic for multi step online tasks, as discussed in the report.
We will also share our infer codes later. Stay tuned!
Congrats on the great work and thanks for the comments.
When trying the prompt format above, the 72B DPO model complains that "More than 1 image is unsupported". Could you kindly comment on this?
Hi there,
Congratulations on the great work!
I'm curious how should one format the prompt in agent evaluation? i.e. when there are multiple turns of user provided observations and agent actions.
Currently I tried the format below and tested a few tasks on OSWorld, however the results don't look good. The PROMPT_FOR_COMPUTER is just the prompt provided in the readme. So basically I only used the most recent one screenshot and condensed all history actions in the user turn as well.
Could you please share some insights here? Thank you!
The text was updated successfully, but these errors were encountered: