How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use? #50

fredajiang · 2024-09-04T08:49:47Z

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

I replaced the original grounding-dino model with a GPU-supported version, reducing the time required from about 7 seconds to just 0.2 seconds. For more details on the GPU version of grounding-dino, please refer to the link: https://github.com/IDEA-Research/GroundingDINO
For the OCR model, is there a similarly faster GPU-supported version? Currently, each OCR operation takes approximately 3 seconds.
For calling chatgpt-4o, do you have any suggestions for improving its execution speed? At present, each call to chatgpt-4o takes approximately 6-7 seconds.
Looking forward to your response.

junyangwang0410 · 2024-09-04T11:16:33Z

Hello. As you said, both the OCR model and GroundingDino can be loaded via GPU. For the OCR model, you need to install the corresponding version of tensorflow-gpu. There is currently no suitable method to reduce the call latency for VLLM. However, when deploying projects, we often use the agent parallel solution. For example, the reflection agent can run concurrently with the planning agent in the next stage. If the reflection agent believes that the operation is correct, the time of calling the planning agent can be reduced. Of course, if you can accept the decline in model performance, gpt-4o-mini is a good choice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use? #50

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use? #50

fredajiang commented Sep 4, 2024

junyangwang0410 commented Sep 4, 2024

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use? #50

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use? #50

Comments

fredajiang commented Sep 4, 2024

junyangwang0410 commented Sep 4, 2024