Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline: pure LLM based agent #10

Open
vrodriguezf opened this issue Oct 17, 2023 · 7 comments
Open

Baseline: pure LLM based agent #10

vrodriguezf opened this issue Oct 17, 2023 · 7 comments

Comments

@vrodriguezf
Copy link
Contributor

Build one agent purely based on text to see as a baseline to the hybrid agents that we expect to actually work better here. I would expect everything that has domain specific knowledge to work better than this.

This can be done in many ways, and using different LLMs. To start with, a gpt model (using the python openai package) should be the easiest thing to connect to. Here, we would have:

  • the prompt that contains the updated observation history.

  • the system prompt, that describes the context of the agent in text. I put one example in the drive folder, feel free to modify or create new ones and share them there too.

  • A parser that takes the response from the llm and build the action vector out of it. Asking the llm to prompt directly numbers to plug in as an action would prevent the llm from actually "reasoning" the action, and this wouldn't be useful even as a baseline.

I recommend you guys to watch , if you haven't done already, Jeremy Howard's tutorial on how to interact with llms to learn about the basic usage of the openai python API (don't remember if that includes system prompts though).

@OhhTuRnz
Copy link
Collaborator

At the start of the challenge i was playing around with a GPT interaction (had some difficulty parsing the 4D vector, maybe i can pipeline another GPT chat for "decoding" the values).

I also checked the interaction context box where he uses "vv" for making a shortened answer so that may work for parsing.

@escharf320
Copy link
Contributor

I started building an agent to do this. It's in my branch (called eli) under arclab_mit/agents/pure_text_agent. I still need to do a lot of work, but the framework is there. If you have any comments, feel free to let me know as I continue working!

@vrodriguezf
Copy link
Contributor Author

At the start of the challenge i was playing around with a GPT interaction (had some difficulty parsing the 4D vector, maybe i can pipeline another GPT chat for "decoding" the values).

Yeah that's actually a good idea, to use a llm to parse the answer of a llm lol. It could be simpler though, probably making the system prompt say clearly that the final answer has to be given as a 4D vector could do the trick (same as it does with the "vv" trick of J. Howard's system prompt)

@DumplingLife
Copy link
Contributor

Another approach we could take is have it call a function when it wants to return the action vector, using the function calling capabilities. I'll look into this and try to build an agent that does this

@DumplingLife
Copy link
Contributor

I pushed an agent for this to main. It works end-to-end, so you can run it like a regular agent, but it often takes very long (30 seconds or more) and also sometimes calls the wrong function. I'll try mitigating these with better prompting

We can also reuse the function calling code to return the action vector in later agents as well, like the hybrid agents

@vrodriguezf
Copy link
Contributor Author

Very cool! What file is that? If you push directly to the main branch without a PR, add the id of the related issue in the commit message. That way we'll have it linked directly in this conversation.

@DumplingLife
Copy link
Contributor

it's here: arclab_mit/agents/jason_function_calling_llm_agent.py
I think I'll make my own branch and do PRs from now on, so it's more organized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants