Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic submission state reaping #111

Open
2 of 8 tasks
karlcz opened this issue Nov 23, 2020 · 3 comments
Open
2 of 8 tasks

Automatic submission state reaping #111

karlcz opened this issue Nov 23, 2020 · 3 comments
Assignees

Comments

@karlcz
Copy link
Contributor

karlcz commented Nov 23, 2020

Assuming the ingest workflow ends with a submission loaded into registry and built as review catalog (if not a pre-catalog failure), define and add mechanisms to handle cleanup of review catalogs based on lifecycle status, replacing old reaping strategy.

  • Enable restricted ACLs (and org policy?) so all new catalogs should be in CFDE registry
  • Disable legacy cron job in dev VPC which purges catalogs w/o understanding CFDE registry
  • Purge catalog w/ delay after some terminal failure status is reached?
  • Allow a failed-hold status to keep (partial) results for possible problem determination, with user input releasing this hold state?
  • Have secondary timeout based reaping even for held failures? (prevent neglect-based resource leaks)
  • What about rejected catalogs (not ingest failures)?
  • Have a cleanup policy for approved/released content?
  • Any quote or queue based limits or triggers to prevent large accumulation of submissions for one DCC?

It seems plausible we would want to keep a successful review catalog not only trough the review period but even after approval, for as long as it is still the pending release content for a DCC. Once the same content has been released, the review catalog is redundant, but may be convenient to look at in standalone review mode.

For rejected submissions, should we delete quickly or have some hold period before we clear them? Or, does it depend on whether a subsequent submission has been made...?

@karlcz karlcz changed the title Automatic review catalog reaping Automatic submission state reaping Dec 9, 2020
@karlcz
Copy link
Contributor Author

karlcz commented Feb 8, 2021

In prior discussions, I understood most of this to be declalred out of scope for epic 2. Right @lliming ? This, I think I should remove the epic 2 association from this issue and just leave it in backlog for future work.

@lliming
Copy link

lliming commented Feb 9, 2021

Yes, this doesn't need to happen for Epic 2, so it can be left on the backlog.

@karlcz
Copy link
Contributor Author

karlcz commented Aug 16, 2021

We have a prototype expiration script in app-dev which successfully does some of these things via SQL set operations...

  1. Start with a set of candidates to purge
  2. Except, disqualify a set of candidates matching other criteria
  3. Update the CFDE registry metadata with the planned purge
  4. Update the ermrest internal registry to emulate a DELETE /ermrest/catalong/N call for each purged catalog
  5. Wait for the usual ermrest maintenance to actually prune/drop the victims

The initial set of candidates is a UNION of these categories:

  • Submissions obsoleted by a newer "approved" submission for the same DCC
  • Submissions in a failed state (some may have an associated catalog)
  • Submissions older than 1 month from time the script runs

The set of disqualified candidates is a UNION of these categories:

  • Constituents of a pending (content-ready) release
  • Submissions submitted in last 2 weeks
  • Most recent submission for each DCC, regardless of status
  • Most recent failed submission for each DCC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants