-
-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection error: HTTPConnectionPool(...): Read timed out. #775
Comments
We should probably try display a better error message (somehow?), but the reason you got a timeout error is most likely because your processing node ran out of memory and the operating system started killing processes at random to get the memory back, probably causing the timeout. 29GB of RAM are probably enough to process ~1000-1500 images, not 6500 (without split-merge, see https://docs.opendronemap.org/large.html) |
Thanks for the explanation and the pointer to the split option! 2 questions:
here is the json response for the processing list: [
{
"id":"9e04ce81-503b-4b60-ac0d-3190ef704696",
"project":1,
"processing_node":1,
"processing_node_name":"node-odm-1",
"can_rerun_from":[
"",
"dataset",
"split",
"merge",
"opensfm",
"mve",
"odm_filterpoints",
"odm_meshing",
"mvs_texturing",
"odm_georeferencing",
"odm_dem",
"odm_orthophoto"
],
"uuid":"",
"name":"2018 Unified",
"processing_time":-1,
"auto_processing_node":false,
"status":30,
"last_error":"Connection error: HTTPConnectionPool(host='webodm_node-odm_1', port=3000): Read timed out. (read timeout=30)",
"options":[
{
"name":"pc-classify",
"value":true
},
{
"name":"orthophoto-resolution",
"value":"4"
},
{
"name":"dtm",
"value":true
},
{
"name":"dem-resolution",
"value":"10"
},
{
"name":"dsm",
"value":true
},
{
"name":"verbose",
"value":true
}
],
"available_assets":[
],
"created_at":"2019-12-13T20:13:09.149805Z",
"pending_action":null,
"public":false,
"resize_to":-1,
"upload_progress":0.0,
"resize_progress":0.0,
"running_progress":0.0,
"import_url":"",
"images_count":5863,
"partial":false
}
] |
I have the feeling that once the task gets a timeout it doesn't check properly anymore. Uploading always seems to work fine, and sometimes I am able to see the task output in the dashboard. But once it times out once, it seems that even fully refreshing the page does not reconnect to the processing node. The last job finished and I needed to download the zip file from the processing node and then upload it into the dashboard. |
Possibly related: #727 |
I've done a little bit of digging, and I believe that the reason why this happens is that we don't get a response with a UUID (needed to track the progress) from (see PyODM/api.py#L268 for the start of the code). Potential solution:
I've made two PRs with the (potential) solution. I still need to verify it. See OpenDroneMap/PyODM#17 and #966. |
Workaround: $ docker exec -it db bash
# psql --user postgres --db webodm_dev SELECT id,project_id,name,uuid,status,pending_action,last_error FROM app_task WHERE uuid=''; # It may be `uuid is NULL` instead of `uuid=''`
UPDATE app_task SET uuid='<UUID from worker>',status=20,last_error=NULL WHERE id='<id with missing uuid>'; |
Hey @tsmock ✋ thanks for looking into this! The idea of having The case of this error seems that:
To me this points to something we should fix in NodeODM https://github.com/OpenDroneMap/NodeODM/blob/master/libs/taskNew.js#L227. Another (simpler) option could be to increase the timeout for the call to I think a simple way to test the hypothesis that this is indeed a problem with
From NodeODM's directory. Let me know what you find! |
I think that the assumption is the problem
WebODM records the timeout error (assumes a fail),
Longer time out is a start. Plus, some retries could make sense!
…On Sat, Mar 6, 2021 at 3:30 PM Piero Toffanin ***@***.***> wrote:
Hey @tsmock <https://github.com/tsmock> ✋ thanks for looking into this!
The idea of having create_task return a UUID in case of error is a bit...
of a hack (I like it, but there might be a better way). The idea is that
creating a task is either a pass or a fail. If it fails, there shouldn't be
a UUID.
The case of this error seems that:
1. Upload begins
2. create_task returns a timeout error a call somewhere (either
/task/new/upload or based on the observations, much more likely
/task/new/commit exceeds the 30 seconds limit)
3. WebODM records the timeout error (assumes a fail), but eventually
the request completes in NodeODM and the task happily starts, without
WebODM knowing.
To me this points to something we should fix in NodeODM
https://github.com/OpenDroneMap/NodeODM/blob/master/libs/taskNew.js#L227
Another (simpler) option could be to increase the timeout for the call to
/task/new/commit in PyODM.
I think a simple way to test the hypothesis that this is indeed a problem
with /task/new/commit would be to modify NodeODM's createTask function to
have a long setTimeout (longer than 30 seconds). You can launch NodeODM in
test mode with:
node index.js --test
From NodeODM's directory.
Let me know what you find!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#775 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCIW5VGUYYVMC5MLEPBXD3TCJC7XANCNFSM4J2XBQRQ>
.
|
I looked into because it affected me. :)
I agree that it is a bit of a hack. However, it does (at least) put the UUID in the db so that we could (potentially) recheck every so often. I was originally going to suggest increasing the timeout, but the issue is that no matter what the timeout is, it can still timeout but successfully create a task. My other solution would have been to add a callback of some sort, such that the node can tell the caller "hey, I'm done now!" So I went with the "UUID in error" route just so that we could record the UUID in the database, and take further action later (e.g., check and see if it is actually running on the node). Anyway, I probably won't be able to test until next weekend (I've got a run going right now, and it is at the 21 hour mark so I really don't want to accidentally screw it up). Sometime I'll have to look into reusing older pipelines with additional images (it is still matching images together, and it's a project where I'm adding new imagery every week or so). |
You're perfectly right, not an ideal solution. I think NodeODM should return immediately a UUID after a call to All of this assuming the call to |
Sorry for the late update; this should be fixed / significantly improved with the changes in OpenDroneMap/NodeODM@18a714b which will be merged in NodeODM soon. 🥂 |
Changes are in. Please update and see if the issue persists? If it does, feel free to re-open this. 🙏 |
What's your browser and operating system?
Chrome 79 on macOS (High Sierra)
What is the problem?
Connection error: HTTPConnectionPool(host='webodm_node-odm_1', port=3000): Read timed out. (read timeout=30)
What should be the expected behavior?
web-odm can reconnect to a running nodeodm processing node
How can we reproduce this?
29GB for 5800 images. DEM+DTM settings. I allocated 2cpus and 48GB RAM to the nodeodm container, the CPUs are running flat out. I can hit webodm_node-odm_1':3000 from broker/redis/ and webapp command line (if i bash into the containers and run
curl http://webodm_node-odm_1':3000
)webapp is accessible via traefik, the rest of the containers are on a separate network,
docker-compose.yml:
Maybe the nodeodm container is too busy processing to bother answering to webapp... but then I would expect it to be impossible to get an answer from
curl http://webodm_node-odm_1':3000
Does webodm continue to retry, or once it gets the read time out, then it gives up?
Sorry if this is already known, i couldn't find anything about this error, and it seems that the task is actually running, so I expect webodm to update the time that it is running to show that it is still alive and ongoing.
The text was updated successfully, but these errors were encountered: