Local cache verification and update #38

nocollier · 2024-04-08T19:53:20Z

From the ESGF Compute Working Team: Especially in JupyterHub instances, it is common for teams to share a local cache of datasets on which they base their work. At some point (perhaps just before publishing a paper) they would like to update the datasets to their latest version. A few ideas and issues arise here:

We could do this automatically by walking through the cache and querying the dataset again and comparing versions.
Any updating should be loud and ask for confirmation, users need to understand what they are doing.
We should also look at checksums of the old files. It may be that a file has been changed locally to fix an error and an update would re-introduce that local fix.
This also carries the issue of one member of the group triggering an update that affects the analysis scripts written by others with no warning.
A first idea of how to address this would be that when to_dataset_dict is called, some local and perhaps hidden file should be saved detailing what was used. In this way when running the script again, we can check if there are differences and warn the user.
We probably need to do this anyway, as intake-esgf defaults to always downloading the most up to date data. So while the old version wouldn't be deleted from the cache, we would automatically (and silently) download and use the updated version.

The text was updated successfully, but these errors were encountered:

nocollier · 2024-04-08T19:54:55Z

Tagging those who I recall had input here in case I missed something. Will see what we can do here. @huard @aspinuso

nocollier added the enhancement New feature or request label Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local cache verification and update #38

Local cache verification and update #38

nocollier commented Apr 8, 2024

nocollier commented Apr 8, 2024

Local cache verification and update #38

Local cache verification and update #38

Comments

nocollier commented Apr 8, 2024

nocollier commented Apr 8, 2024