Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use? #50

Open
fredajiang opened this issue Sep 4, 2024 · 1 comment

Comments

@fredajiang
Copy link

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

  1. I replaced the original grounding-dino model with a GPU-supported version, reducing the time required from about 7 seconds to just 0.2 seconds. For more details on the GPU version of grounding-dino, please refer to the link: https://github.com/IDEA-Research/GroundingDINO
  2. For the OCR model, is there a similarly faster GPU-supported version? Currently, each OCR operation takes approximately 3 seconds.
  3. For calling chatgpt-4o, do you have any suggestions for improving its execution speed? At present, each call to chatgpt-4o takes approximately 6-7 seconds.
    Looking forward to your response.
@junyangwang0410
Copy link
Collaborator

Hello. As you said, both the OCR model and GroundingDino can be loaded via GPU. For the OCR model, you need to install the corresponding version of tensorflow-gpu. There is currently no suitable method to reduce the call latency for VLLM. However, when deploying projects, we often use the agent parallel solution. For example, the reflection agent can run concurrently with the planning agent in the next stage. If the reflection agent believes that the operation is correct, the time of calling the planning agent can be reduced. Of course, if you can accept the decline in model performance, gpt-4o-mini is a good choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants