Implement DeleteIndexes #94

gammazero · 2023-11-28T03:30:23Z

No description provided.

codecov-commenter · 2023-11-28T03:34:35Z

Codecov Report

Attention: 49 lines in your changes are missing coverage. Please review.

Comparison is base (cea5ef7) 57.44% compared to head (11a4d1a) 56.49%.

Files	Patch %	Lines
pebble/pebble.go	44.28%	34 Missing and 5 partials ⚠️
server/server.go	54.54%	8 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
- Coverage   57.44%   56.49%   -0.95%     
==========================================
  Files          12       12              
  Lines         947     1039      +92     
==========================================
+ Hits          544      587      +43     
- Misses        359      401      +42     
- Partials       44       51       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add server endpoint to delete encrypted multihashes.

masih

Thank you for picking this up 👍 Left some suggestions re alternatives.

it would be great to also have benchmarks for deletions (not necessarily in this PR).

masih · 2023-11-28T09:32:11Z

pebble/pebble.go

+			}
+		}
+		if len(encValueKeys) == 0 {
+			if err = batch.Delete(mhk.buf, pebble.NoSync); err != nil {


I have a suspicion that this would be slow at scale. But it is certainly a good first pass in implementing deletion 👍

Before doing it any other way, I recommend adding benchmarks. We can then iterate on alternative approaches.

On alternative approaches:

I have thought of 2 other ways of doing this, which can be mixed and matched for a high performance opportunistic deletion mechanism:

Accumulate indexes marked for deletion, sort them then delete in much larger balks using delete range.

Update the merger implementation to intelligently recognise entires pending deletion and use merging function to exclude them from the merge, which will effectively result in deletion. This is the same approach I originally implemented in the non-encrypted pebble for opportunistic deletion.

This is at least a batched delete (batch.Delete and not s.db.Delete) so all the deletions are collected in a batch and done in one shot and it should be faster than a large number of individual calls to s.db.Delete.

Since this is done by GC, and is not in the critical path of indexing or lookup, it is OK for it to be slow. I think, for us, it is more important to make sure that no deletion work is done in the indexing and lookup path, even if that makes deletion less optimal. GC can crawl along in the background at a very slow pace, as long as it does not interfere with indexing or lookup.

The suggested approaches are good in general, but I think that 2 requires some work in the ingestion path during merge processing. Number 1 may be more efficient, but if it only improves GC then the efficiency does not matter at the cost of storing temporary data (which does have a cost - compaction cost).

I suspect that the compaction cost would be lower with range deletion. For approach No. 2 i think all the changes would remain in this repo if by ingestion path changes we mean changes in storetheindex?

Thank you for implementing the batch deletion. That should help us gather some data to see if it needs optimisation at all 👍

By ingestion path, I mean during the process of ingestion. I want writing new index records to remain free from anything other than writing new records.

Implement DeleteIndexes

bef626b

Add server endpoint to delete encrypted multihashes.

gammazero force-pushed the delete-indexes branch from 4688073 to bef626b Compare November 28, 2023 04:21

masih approved these changes Nov 28, 2023

View reviewed changes

Add unit test for multihash delete

11a4d1a

gammazero marked this pull request as ready for review November 29, 2023 02:32

Test using multihash that maps to multiple providers

6618915

gammazero merged commit 47a57b4 into main Nov 29, 2023

gammazero deleted the delete-indexes branch November 29, 2023 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DeleteIndexes #94

Implement DeleteIndexes #94

gammazero commented Nov 28, 2023

codecov-commenter commented Nov 28, 2023 •

edited

Loading

masih left a comment

masih Nov 28, 2023

gammazero Nov 29, 2023 •

edited

Loading

masih Nov 29, 2023

gammazero Nov 29, 2023

Implement DeleteIndexes #94

Implement DeleteIndexes #94

Conversation

gammazero commented Nov 28, 2023

codecov-commenter commented Nov 28, 2023 • edited Loading

Codecov Report

masih left a comment

Choose a reason for hiding this comment

masih Nov 28, 2023

Choose a reason for hiding this comment

gammazero Nov 29, 2023 • edited Loading

Choose a reason for hiding this comment

masih Nov 29, 2023

Choose a reason for hiding this comment

gammazero Nov 29, 2023

Choose a reason for hiding this comment

codecov-commenter commented Nov 28, 2023 •

edited

Loading

gammazero Nov 29, 2023 •

edited

Loading