Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deposition path gets wiped during archive cleanup and we don't tell the user this will happen #540

Open
krivard opened this issue Jan 23, 2025 · 2 comments
Labels
bug Something isn't working good-first-issue

Comments

@krivard
Copy link
Contributor

krivard commented Jan 23, 2025

When using --fsspec, the --deposition-path option receives the directory where the user expects files to be downloaded.

The orchestrator deletes the contents of this directory during cleanup, irrespective of whether those files were created by archiver or pre-date the run:

https://github.com/catalyst-cooperative/pudl-archiver/blob/43eacab34afb356ec430a698617e883a8eea568a/src/pudl_archiver/orchestrator.py#L29C1-L33C54

This could result in data loss if the user specifies a directory that is used for multiple purposes. We do not inform them of this risk in the README.

Potential resolutions:

  1. Warn the user that the --deposition-path directory gets wiped during cleanup, so they know what to expect. Candidate locations:
  2. Create and destroy our own directory within --deposition-path so it's guaranteed to be used only by us
@jdangerx
Copy link
Member

jdangerx commented Jan 23, 2025

Thanks for writing this up! I would personally vote for "don't do something scary" over "warn the user that the code does something scary, then do the thing anyway."

Creating a temp directory seems like a good enough workaround. If y'all want to take this on Margay, go for it - otherwise I'm happy to pull it into Inframundo-land for next sprint.

@jdangerx jdangerx moved this from New to Backlog in Catalyst Megaproject Jan 23, 2025
@e-belfer e-belfer added the bug Something isn't working label Jan 24, 2025
@jdangerx
Copy link
Member

jdangerx commented Feb 3, 2025

@mariahjrogers this might be a good issue to jump into, instead of #193 - let me know if this piques your interest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good-first-issue
Projects
Status: Backlog
Development

No branches or pull requests

3 participants