-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DeleteIndexes #94
Conversation
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #94 +/- ##
==========================================
- Coverage 57.44% 56.49% -0.95%
==========================================
Files 12 12
Lines 947 1039 +92
==========================================
+ Hits 544 587 +43
- Misses 359 401 +42
- Partials 44 51 +7 ☔ View full report in Codecov by Sentry. |
Add server endpoint to delete encrypted multihashes.
4688073
to
bef626b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for picking this up 👍 Left some suggestions re alternatives.
it would be great to also have benchmarks for deletions (not necessarily in this PR).
pebble/pebble.go
Outdated
} | ||
} | ||
if len(encValueKeys) == 0 { | ||
if err = batch.Delete(mhk.buf, pebble.NoSync); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suspicion that this would be slow at scale. But it is certainly a good first pass in implementing deletion 👍
Before doing it any other way, I recommend adding benchmarks. We can then iterate on alternative approaches.
On alternative approaches:
I have thought of 2 other ways of doing this, which can be mixed and matched for a high performance opportunistic deletion mechanism:
- Accumulate indexes marked for deletion, sort them then delete in much larger balks using delete range.
- Update the merger implementation to intelligently recognise entires pending deletion and use merging function to exclude them from the merge, which will effectively result in deletion. This is the same approach I originally implemented in the non-encrypted pebble for opportunistic deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is at least a batched delete (batch.Delete
and not s.db.Delete
) so all the deletions are collected in a batch and done in one shot and it should be faster than a large number of individual calls to s.db.Delete
.
Since this is done by GC, and is not in the critical path of indexing or lookup, it is OK for it to be slow. I think, for us, it is more important to make sure that no deletion work is done in the indexing and lookup path, even if that makes deletion less optimal. GC can crawl along in the background at a very slow pace, as long as it does not interfere with indexing or lookup.
The suggested approaches are good in general, but I think that 2 requires some work in the ingestion path during merge processing. Number 1 may be more efficient, but if it only improves GC then the efficiency does not matter at the cost of storing temporary data (which does have a cost - compaction cost).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the compaction cost would be lower with range deletion. For approach No. 2 i think all the changes would remain in this repo if by ingestion path changes we mean changes in storetheindex?
Thank you for implementing the batch deletion. That should help us gather some data to see if it needs optimisation at all 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By ingestion path, I mean during the process of ingestion. I want writing new index records to remain free from anything other than writing new records.
No description provided.