-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fatal error during training #89
Comments
This looks like this is failing to update the config file because it doesn't exist for some reason... I will try to reproduce this locally and push a fix! In the meantime can you clarify a few things for me?
|
This is a fresh deployment with d5f41b7. The workflow to trigger this was retraining after initial labeling -> train -> active learning. We have two identical files under .histomicsui_config.yaml, neither generated by hand. |
I haven't been able to reproduce the error yet so this is very helpful, thank you! Can you please try to open both and make sure they are both valid yaml files? In the meantime you should be able to safely delete both files and this should clear up the error (the config will be recreated when the UI is accessed again). |
It's a valid yaml. Both files have identical sha512. Thanks for the quick workaround. Could this be a permissions problem? |
If it were a permissions issue I would expect an error about "access denied" or something along those lines... The PUT request should be creating or re-using the Have you seen this with any other projects? |
@manthey I cannot reproduce this behavior - Do you have any ideas on how we may have ended up in this state? It seems like there was an attempt to remove the old file but for some reason it didn't exist in the assetstore and we ended up with two copies of the config and a broken project... |
I have a thesis on what is happening: If two requests to write a config file happen in a short enough time span, there could end up with two yaml files in the same girder item. The solution is to add a guard to prevent this (since Mongo doesn't have cross-collection transactions) or to fix this once done. There will be a PR in large_image to address this condition. |
@bnmajor provided an easy workaround for the time being. Not urgent to address. |
I think this will have be eliminated via girder/large_image#1467. If you have the latest DSA containers and you see it again, please reopen the issue. |
The system crashes when retraining after labeling several chips in the active learning task. Girder is trying to
os.unlink
something that does not exist.[2024-02-07 21:52:01,937] ERROR: Failed to delete file /assetstore/4f/23/4f23e7e90af0798b8f38c7afa294c722320357446bf80409eb040371f450f6f998bc8ec87de6ab1679429e4800aba1a2d0b70477cd 4694526f1508c6d8a23459 Traceback (most recent call last): File "/opt/girder/girder/utility/filesystem_assetstore_adapter.py", line 306, in deleteFile os.unlink(path) FileNotFoundError: [Errno 2] No such file or directory: '/assetstore/4f/23/4f23e7e90af0798b8f38c7afa294c722320357446bf80409eb040371f450f6f998bc8ec87de6ab1679429e4800aba1a2d0b70477 cd4694526f1508c6d8a23459' Additional info: Request URL: PUT http://127.0.0.1:8080/api/v1/folder/65c2f5d5201f96fa9ca19598/yaml_config/.histomicsui_config.yaml Query string: Remote IP: 172.25.0.1 Request UID: 6f42862a-8b20-4f46-bc2b-c84ca093cfc6 [2024-02-07 21:59:20,409] ERROR: Failed to delete file /assetstore/94/98/9498c946becfd894633f47c3eccb0e61c7ba216d4576b485f57f682747c22ba34e3e2a6e0a152037a4a9738c42292b9978a013b062 daf6ba32a5f1b072dd3ccb Traceback (most recent call last): File "/opt/girder/girder/utility/filesystem_assetstore_adapter.py", line 306, in deleteFile os.unlink(path) FileNotFoundError: [Errno 2] No such file or directory: '/assetstore/94/98/9498c946becfd894633f47c3eccb0e61c7ba216d4576b485f57f682747c22ba34e3e2a6e0a152037a4a9738c42292b9978a013b0 62daf6ba32a5f1b072dd3ccb' Additional info: Request URL: PUT http://127.0.0.1:8080/api/v1/folder/65c2f5d5201f96fa9ca19598/yaml_config/.histomicsui_config.yaml Query string: Remote IP: 172.25.0.1 Request UID: 752968ce-1106-4f6f-8ccf-49a5cc90eae3
The text was updated successfully, but these errors were encountered: