Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code capability enhancement & Bot crash fix #272

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

Ninot1Quyi
Copy link
Contributor

@Ninot1Quyi Ninot1Quyi commented Nov 2, 2024

Last Modified Time: November 10, 2024, 5:53 PM

Latest changes are as follows:


  1. Improvement Effects

    • Model: GPT-4o

    • Initial Command: !goal("Your goal is: use only "!newAction" instructions and rely only on code execution to obtain a diamond pickaxe. You must complete this task step by step and by yourself. And can't use another "!command". You should promptly check to see what you have.")

    • Effect: After testing, under the condition of relying solely on generated code, the bot can run stably for at least 30 minutes without crashing (I manually ended the process at 30 minutes), during which it executed over 130 validated code snippets.

    • Remaining Issues:

      1. If illegal commands are executed, such as "attacking a non-existent entity," the server may kick the bot out.
      2. A very small number of tasks may lead to no execution result being obtained, causing code crashes. It is suspected that there may be unconsidered exceptional situations when receiving task results.
    • WARNING: If you use the command above or set a goal that requires a long time to work, please pay attention to the execution status and token consumption, as the LLM may continuously generate code in certain situations. For example, when "an iron pickaxe is available and diamonds need to be mined," it might stand still using its code abilities to search for nearby diamond locations. Since diamonds are rare, it may fail to find them continuously, repeatedly improving the code and getting stuck, leading to substantial token consumption.Please test with caution, it cost me $60 to test with gpt-4o for 60min. But gpt-4o-mini is much cheaper and can be used to test this command

  2. Added Features:
    2.1 During code generation, the top select_num relevant skillsDocs related to !newAction("task") will be selected and sent to the LLM in the prompt to help it focus better on the task. Currently, select_num is set to 5.
    2.2 Before running the code, use ESLint to perform syntax and exception checks on the generated code to detect issues in advance, check for undefined functions, and add exceptions to messages.
    2.3 During code execution, detailed error information will be included in messages.

  3. Added Files:
    3.1 file path: ./bots/codeCheckTemplate.js
    A template used for performing checks before code execution. ESLint cannot be used for detection in the sandbox.

    3.2 file path: ./eslint.config.js
    Manages the ESLint rules for code syntax and exception detection.

  4. Modified Code Content:

    4.1 package.json

    - Added: ESLint dependency.

    4.2 settings.js

    - Set: code_timeout_mins=3, ensuring timely code execution updates and preventing long blocks.

    4.3 coder.js

    - Added: checkCode function to pre-check for syntax and exceptions. First, it checks whether the functions used in the code exist. If they don't, it writes the illegal functions to the message, then proceeds with syntax and exception checks.

    - Modified: Modified the return value of stageCode function from return { main: mainFn }; to return { func: { main: mainFn }, src_check_copy: src_check_copy }; to ensure pre-execution exception detection.

    4.4 action_manager.js

    - Enhanced: catch (err) error detection to include detailed exception content and related code Docs in messages, improving the LLM's ability to fix code.

    4.5 index.js

    - Modified: docHelper and getSkillDocs return values to return the docArray of functions from the skill library for subsequent word embedding vector calculations.

    4.6 prompter.js

    - Added: this.skill_docs_embeddings = {}; to store the docArray word embedding vectors.

    - Added: Parallel initialization of this.skill_docs_embeddings in initExamples.

    - Added: getRelevantSkillDocs function to obtain select_num relevant doc texts based on input messages and select_num. If select_num >= 0, it is meaningful; otherwise, return all content sorted by relevance.

Note: This modification ensures code quality by making minimal changes only where necessary, while also clearing test outputs and comments. If further modifications are needed, please feel free to let me know.

@Ninot1Quyi Ninot1Quyi closed this Nov 2, 2024
@Ninot1Quyi Ninot1Quyi force-pushed the Tasks-more-relevant-docs-and-code-exception-fixes branch from ecaf5e8 to 02232e2 Compare November 2, 2024 18:03
…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes

# Conflicts:
#	src/agent/coder.js
#	src/agent/prompter.js
@Ninot1Quyi Ninot1Quyi reopened this Nov 2, 2024
@Ninot1Quyi
Copy link
Contributor Author

Resolve merge conflicts with the latest code

New additions

  1. Added the codeChackTemplate.js file under the bots directory for static syntax and exception detection.
  2. Modified the return value of stageCode and the part of generateCodeLoop that runs the code to resolve merge conflict issues.
  3. Added the ESLint configuration file eslint.config.js in the project root directory to manage code syntax and exception detection rules.

@JurassikLizard
Copy link
Contributor

Can you try re-running this with a stupider model (not state-of-the-art lol). I'm curious to see if they benefit too, or just advanced ones.

@Ninot1Quyi
Copy link
Contributor Author

Ninot1Quyi commented Nov 3, 2024

Comparison Experiment on Low-Performance Models

1. Objective

The objective is set using the following command:
!goal("Your goal is: use only "!newAction" instructions and rely only on code execution to obtain a diamondpickaxe. You must complete this task step by step and by yourself. And can't use another "!command". You should promptly check to see what you have")

2. Model Selection

First, I tested the lowest-performance model, gpt-3.5-turbo, but it could not limit itself to using only !newAction and was unable to complete the task. Subsequently, I tested gpt-4o-mini.
All subsequent tests were conducted using gpt-4o-mini.

3. Experimental Process

  • Created a world and made two copies to ensure the environment was the same.
  • Used both modified and unmodified code to enter the same position, input the goal command, and let the bot execute the task.

4. Experimental Results

4.1 Original

Total run time: 16 minutes 41 seconds.

  • 0 min: Start
  • 3 min: First crash.
  • 4 min: Second crash.
  • 13 min: Acquired wooden pickaxe. [The bot was continually collecting wood and only completed the wooden pickaxe after multiple reminders to check existing items.]
  • 16 min 24 s: Acquired a stone pickaxe.

4.2 Modified

I didn’t give any reminders to the bot while it was running.
Total run time: 16 minutes 22 seconds.

  • 0 min: Start
  • 4 min: Acquired a wooden pickaxe.
  • 5 min 12 s: First crash.
  • 8 min: Acquired a stone pickaxe.
  • 11 min: Acquired iron ingots.
  • 15 min: The content was obtained.
  • 16 min 22s: Second crash.

4.3 Complete Comparison Video

Total duration: 16 minutes 41 seconds.
Watch the full comparison video here

@Ninot1Quyi Ninot1Quyi closed this Nov 4, 2024
@Ninot1Quyi Ninot1Quyi force-pushed the Tasks-more-relevant-docs-and-code-exception-fixes branch from e1dfad9 to 0a21561 Compare November 4, 2024 15:05
…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes

# Conflicts:
#	src/agent/coder.js
@Ninot1Quyi Ninot1Quyi reopened this Nov 4, 2024
@Ninot1Quyi
Copy link
Contributor Author

Ninot1Quyi commented Nov 4, 2024

Resolved merge conflict with Action Manager

@Ninot1Quyi Ninot1Quyi closed this Nov 8, 2024
@Ninot1Quyi Ninot1Quyi force-pushed the Tasks-more-relevant-docs-and-code-exception-fixes branch from f6e309a to a6edd8f Compare November 8, 2024 10:20
…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes

# Conflicts:
#	src/agent/prompter.js
@Ninot1Quyi Ninot1Quyi reopened this Nov 8, 2024
@Ninot1Quyi
Copy link
Contributor Author

There is a part that needs improvement

@Ninot1Quyi
Copy link
Contributor Author

Improve the relevance of docs to !newAction("task")Fix Qwen api concurrency limit issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants