-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when running run_evaluation_multi_uniad.sh #114
Comments
Hi @Eden-Wang1710, Carla doesn't work well. Please refer to #32 and #111 . |
您好,我这个是已经运行了两个小时了,8个json里有4个显示complete了,保存图像的文件夹也有12个了,也都有图片保存进去。 |
Good, just resume it, please refer to #89. |
谢谢,请问用8卡跑,结果会保存在8个json里,还是220个json里? |
8 jsons, 220 will be divided into 8 equal parts. |
Yes, you are right. |
weird,我八个卡加起来是148route |
split_xml.py is very clear and concise. Please check split_xml.py. |
Thanks! But why do we directly set the task number here to 12? In my understanding, it should follow the TASK_LIST. Also, you recommend GPU: Task(1:2), but the current code here (to allocate the route for each GPU) seems use 1:1 as default. Do we need to modify if we want to run 2 tasks for each GPU? Thanks a gain for your kindly help! |
@Eden-Wang1710 Thanks for finding the typo. The task number should be 8 and equal to len(TASK_LIST). I will fix it in the next update. |
Hi, after resuming GPU 5:
GPU 6:
GPU 7:
|
You can comment these crashed route in xml file, and resume it #89 . These route may be crashed by agent behavior. |
Yes, I check the saved images, the route is crashed by agent behavior. Now I'm trying to modify the script, so that it can automatically resume the program and skip the crashed route. It's complicated to rusume and comment manually. |
For safety reasons, we operate them manually. |
Roughly how many resumes and comments do you usually operate for one model? |
Our eval json is open source, please check eval json. |
Hi, sorry for bothering you again.
When running
run_evaluation_multi_uniad.sh
on 8 GPU for 2 hours, I encounter this error:At the same time, I find memory usage of some GPU fall down, see GPU 5 and 6:
Also, only 8 json files are recorded. No more json is updated. I guess something wrong happens when loading new route after finishing one?
The text was updated successfully, but these errors were encountered: