Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DeleteIndexes #94

Merged
merged 3 commits into from
Nov 29, 2023
Merged

Implement DeleteIndexes #94

merged 3 commits into from
Nov 29, 2023

Conversation

gammazero
Copy link
Contributor

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Nov 28, 2023

Codecov Report

Attention: 49 lines in your changes are missing coverage. Please review.

Comparison is base (cea5ef7) 57.44% compared to head (11a4d1a) 56.49%.

Files Patch % Lines
pebble/pebble.go 44.28% 34 Missing and 5 partials ⚠️
server/server.go 54.54% 8 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
- Coverage   57.44%   56.49%   -0.95%     
==========================================
  Files          12       12              
  Lines         947     1039      +92     
==========================================
+ Hits          544      587      +43     
- Misses        359      401      +42     
- Partials       44       51       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add server endpoint to delete encrypted multihashes.
Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking this up 👍 Left some suggestions re alternatives.

it would be great to also have benchmarks for deletions (not necessarily in this PR).

pebble/pebble.go Outdated
}
}
if len(encValueKeys) == 0 {
if err = batch.Delete(mhk.buf, pebble.NoSync); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suspicion that this would be slow at scale. But it is certainly a good first pass in implementing deletion 👍

Before doing it any other way, I recommend adding benchmarks. We can then iterate on alternative approaches.


On alternative approaches:

I have thought of 2 other ways of doing this, which can be mixed and matched for a high performance opportunistic deletion mechanism:

  1. Accumulate indexes marked for deletion, sort them then delete in much larger balks using delete range.
  2. Update the merger implementation to intelligently recognise entires pending deletion and use merging function to exclude them from the merge, which will effectively result in deletion. This is the same approach I originally implemented in the non-encrypted pebble for opportunistic deletion.

Copy link
Contributor Author

@gammazero gammazero Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at least a batched delete (batch.Delete and not s.db.Delete) so all the deletions are collected in a batch and done in one shot and it should be faster than a large number of individual calls to s.db.Delete.

Since this is done by GC, and is not in the critical path of indexing or lookup, it is OK for it to be slow. I think, for us, it is more important to make sure that no deletion work is done in the indexing and lookup path, even if that makes deletion less optimal. GC can crawl along in the background at a very slow pace, as long as it does not interfere with indexing or lookup.

The suggested approaches are good in general, but I think that 2 requires some work in the ingestion path during merge processing. Number 1 may be more efficient, but if it only improves GC then the efficiency does not matter at the cost of storing temporary data (which does have a cost - compaction cost).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the compaction cost would be lower with range deletion. For approach No. 2 i think all the changes would remain in this repo if by ingestion path changes we mean changes in storetheindex?

Thank you for implementing the batch deletion. That should help us gather some data to see if it needs optimisation at all 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By ingestion path, I mean during the process of ingestion. I want writing new index records to remain free from anything other than writing new records.

@gammazero gammazero marked this pull request as ready for review November 29, 2023 02:32
@gammazero gammazero merged commit 47a57b4 into main Nov 29, 2023
@gammazero gammazero deleted the delete-indexes branch November 29, 2023 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants