Tidier cannot succeed when all the oldest data is not in ICAT #111

stuartpullinger · 2020-06-30T11:36:56Z

I suspect that this may end up as a 'won't fix' but I am noting the issue here so that, if that is the case, we can record the decision. This problem was observed in a preproduction IDS where the connected ICAT has a subset of the production database.

In normal operation, the IDS puts data in its cache. If it finds that the volume of data in its cache has exceeded the high threshold, it requests that the storage plugin walk the filesystem, finding the list of 'old' files which, if deleted, would free up enough space to take it below its low threshold. It then looks up the file locations in ICAT and loops over the results to request that the files are archived. (Any that are currently requested won't be archived because of the logic in the deferredOpsQueue).

So, if all the 'old' files are not found in ICAT, then it never frees up any space. What it doesn't do (and it is debatable whether it should) is try to delete more data to correct for the files that it skipped.

I think the behaviour here may not be optimal but I'm not sure the use case of lots of unknown data sitting in the IDS cache was foreseen or even if it should be accommodated. Could this problem arise in a production environment?

Solutions to this are complicated because the Tidier delegates finding files to the storage plugin and archiving/deleting files to the deferredOpsQueue. Some possible approaches:

We issue a warning when an 'old' file is not found in ICAT (and therefore the space it occupies won't be freed)
We aggregate the size of all the skipped files in the loop and, if this total is >= the difference between the thresholds, then we know that the disk space will never be freed. We issue an error in this case as the Tidier can no longer prevent the disk from running out of space.
We document that the main storage should be clear or only contain ICAT files in the installation instructions.

RKrahl · 2020-06-30T13:10:12Z

There are a few fundamental assumptions that the design of IDS is based upon. These assumptions include:

IDS has exclusive access to the main and archive storage. If other processes have access to the storage, those processes must behave, e.g. their actions must be in line with the internal processes within IDS and concurrent access must be protected using locking.
the files in the storage must always be consistent with the content of ICAT, e.g. for every Datafile object in ICAT (having a non NULL location attribute) there must be a corresponding file in the storage and vice versa.

If these assumptions are not met, this constitutes an out-of-specification use which results in undefined behavior. Based on this, we could consider this bug as invalid.

However, I just checked the sources: error handling in the Tidier is essentially non-existent. So I agree with your first bullet and would even say that the Tidier should log an error for every Dataset or Datafile reported from the plugin it finds missing in ICAT. In that sense, this issue is a variant of #79. Obviously this would only help if anybody actually checks the error logs. I also agree with your last bullet that the documentation may need improvment.

stuartpullinger added the bug Something isn't working label Jun 30, 2020

RKrahl added this to the 2.0.0 milestone Jan 6, 2021

RKrahl modified the milestones: 2.0.0, 3.0.0 Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tidier cannot succeed when all the oldest data is not in ICAT #111

Tidier cannot succeed when all the oldest data is not in ICAT #111

stuartpullinger commented Jun 30, 2020

RKrahl commented Jun 30, 2020

Tidier cannot succeed when all the oldest data is not in ICAT #111

Tidier cannot succeed when all the oldest data is not in ICAT #111

Comments

stuartpullinger commented Jun 30, 2020

RKrahl commented Jun 30, 2020