db: unconditionally interleave ignorable boundaries #3475

jbowens · 2024-03-29T15:13:12Z

db: unconditionally interleave ignorable boundaries

The levelIter has a concept of ignorable boundary keys. When a levelIter
exhausts a file's point keys, it's possible that the same file still contains
range deletions relevant to other levels of the LSM. The levelIter will, under
some conditions, interleave the largest boundary key of the sstable into
iteration as an 'ignorable boundary key,' so that the file's range deletions
remain accessible until all other levels progress beyound the file's boundary.

When block-property filters are in use, a file's point key iterator may become
exhausted early, before the file's range deletions are irrelevant, even if the
file's largest key is not a range deletion. To work around this subtlety, the
sstable iterator previously would surface knowledge of whether any point keys
may have been skipped through a MaybeFilteredKeys method. The levelIter used
this method to determine when to interleave an ignorable largest boundary in
case there may be additional relevant range deletions.

This commit removes the conditioning on the output of MaybeFilteredKeys,
instead unconditionally interleaving a synthetic boundary at a file's largest
point key if the levelIter user is using the file's range deletion iterator.
This simplifies the logic and removes a fragile reliance on the sstable's
iterators accounting of when it may have filtered keys.

Future work (#2863) will remove the need to interleave synthetic boundaries at
all, instead interleaving the range deletion bounds themselves among point
keys.

Informs #2863.
Close #3334.

sstable: remove Iterator.MaybeFilteredKeys

Remove the MaybeFilteredKeys method that surfaced knowledge of whether an
sstable iterator may have omitted keys due to block-property filters. With the
previous commit's refactor to unconditionally interleave sstable largest
boundaries when using a levelIter to drive range deletion iteration, we no
longer need to know whether keys might've been filtered. This reduces the
code complexity considerably.

sstable: strengthen assertion around mutual exclusion of seek optimizations

Previously an assertion that the TrySeekUsingNext and monotonic bounds
optimizations could not simultaneously be active was disabled due to the
TestIterRandomizedMaybeFilteredKeys test that could violate the invariant. The
test has been removed (along with the method it was testing), so we can
now uncondtionally restore the invariant.

cockroach-teamcity · 2024-03-29T15:13:20Z

This change is

RaduBerinde

Awesome!! Thanks for making this work and for the cleanup!

Reviewable status: 0 of 15 files reviewed, 2 unresolved discussions (waiting on @jbowens)

level_iter.go line 1037 at r1 (raw file):

			// the top of the heap and immediately skip the entry, advancing to
			// the next file.
			if l.iterFile.LargestPointKey.Kind() == InternalKeyKindRangeDelete ||

If the largest point key is a range delete, wouln't *rangeDelIterPtr always be non-nil?

Conversely, if the largest point key kind is not correct (the bound can be loose for vtables), is it correct if we still emit the boundary?

merging_iter.go line 955 at r1 (raw file):

			break
		}
		// Skip ignorable boundary keys. These are not real keys and exist to

I think this is what I was missing to make it work.

Isn't it strange that we need to do this in findPrevEntry but not for findNextEntry?

jbowens

TFTR!

Reviewable status: 0 of 15 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde)

level_iter.go line 1037 at r1 (raw file):

If the largest point key is a range delete, wouln't *rangeDelIterPtr always be non-nil?

I think we should be able to reduce it to this, but there's currently warts in the getIter where it retains the span after it's nil'd the rangeDelIterPtr. I left a TODO.

Conversely, if the largest point key kind is not correct (the bound can be loose for vtables), is it correct if we still emit the boundary?

Yeah, it is; we'll just pay the cost of an unnecessary extra step.

merging_iter.go line 955 at r1 (raw file):

Previously, RaduBerinde wrote…

I think this is what I was missing to make it work.

Isn't it strange that we need to do this in findPrevEntry but not for findNextEntry?

We actually already do it in findNextEntry:

pebble/merging_iter.go

Lines 806 to 816 in 1c7bcd1

    
           // Skip ignorable boundary keys. These are not real keys and exist to 
        
           // keep sstables open until we've surpassed their end boundaries so that 
        
           // their range deletions are visible. 
        
           if m.levels[item.index].isIgnorableBoundaryKey { 
        
           	m.err = m.nextEntry(item, nil /* succKey */) 
        
           	if m.err != nil { 
        
           		return nil, base.LazyValue{} 
        
           	} 
        
           	continue 
        
           }

RaduBerinde

Reviewed 5 of 6 files at r1, 9 of 10 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens)

merging_iter.go line 955 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

We actually already do it in findNextEntry:

pebble/merging_iter.go

Lines 806 to 816 in 1c7bcd1

// Skip ignorable boundary keys. These are not real keys and exist to

// keep sstables open until we've surpassed their end boundaries so that

// their range deletions are visible.

if m.levels[item.index].isIgnorableBoundaryKey {

m.err = m.nextEntry(item, nil /* succKey */)

if m.err != nil {

return nil, base.LazyValue{}

}

continue

}

Oh, I think I looked at the wrong findNextEntry. Great, that explains why I could do it for the lower boundary before. Thanks!!

The levelIter has a concept of ignorable boundary keys. When a levelIter exhausts a file's point keys, it's possible that the same file still contains range deletions relevant to other levels of the LSM. The levelIter will, under some conditions, interleave the largest boundary key of the sstable into iteration as an 'ignorable boundary key,' so that the file's range deletions remain accessible until all other levels progress beyound the file's boundary. When block-property filters are in use, a file's point key iterator may become exhausted early, before the file's range deletions are irrelevant, even if the file's largest key is not a range deletion. To work around this subtlety, the sstable iterator previously would surface knowledge of whether any point keys may have been skipped through a MaybeFilteredKeys method. The levelIter used this method to determine when to interleave an ignorable largest boundary in case there may be additional relevant range deletions. This commit removes the conditioning on the output of MaybeFilteredKeys, instead unconditionally interleaving a synthetic boundary at a file's largest point key if the levelIter user is using the file's range deletion iterator. This simplifies the logic and removes a fragile reliance on the sstable's iterators accounting of when it may have filtered keys. Future work (cockroachdb#2863) will remove the need to interleave synthetic boundaries at all, instead interleaving the range deletion bounds themselves among point keys. Informs cockroachdb#2863. Close cockroachdb#3334.

Remove the MaybeFilteredKeys method that surfaced knowledge of whether an sstable iterator may have omitted keys due to block-property filters. With the previous commit's refactor to unconditionally interleave sstable largest boundaries when using a levelIter to drive range deletion iteration, we no longer need to know whether keys might've been filtered. This reduces the code complexity considerably.

…ations Previously an assertion that the TrySeekUsingNext and monotonic bounds optimizations could not simultaneously be active was disabled due to the TestIterRandomizedMaybeFilteredKeys test that could violate the invariant. The test has been removed (along with the method it was testing), so we can now uncondtionally restore the invariant.

jbowens · 2024-04-05T21:44:25Z

TFTR!

jbowens requested review from a team and RaduBerinde March 29, 2024 15:19

jbowens force-pushed the leveliter branch 2 times, most recently from 815e47d to c8106d3 Compare March 29, 2024 15:45

RaduBerinde approved these changes Mar 29, 2024

View reviewed changes

jbowens force-pushed the leveliter branch from c8106d3 to f55673d Compare March 29, 2024 22:48

jbowens commented Mar 29, 2024

View reviewed changes

RaduBerinde approved these changes Mar 30, 2024

View reviewed changes

jbowens added 3 commits April 1, 2024 17:26

jbowens force-pushed the leveliter branch from f55673d to c614681 Compare April 1, 2024 21:26

jbowens merged commit 45b7a80 into cockroachdb:master Apr 5, 2024
11 checks passed

jbowens deleted the leveliter branch April 5, 2024 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db: unconditionally interleave ignorable boundaries #3475

db: unconditionally interleave ignorable boundaries #3475

jbowens commented Mar 29, 2024

cockroach-teamcity commented Mar 29, 2024

RaduBerinde left a comment

jbowens left a comment

RaduBerinde left a comment

jbowens commented Apr 5, 2024

	// Skip ignorable boundary keys. These are not real keys and exist to
	// keep sstables open until we've surpassed their end boundaries so that
	// their range deletions are visible.
	if m.levels[item.index].isIgnorableBoundaryKey {
	m.err = m.nextEntry(item, nil /* succKey */)
	if m.err != nil {
	return nil, base.LazyValue{}
	}
	continue
	}

db: unconditionally interleave ignorable boundaries #3475

db: unconditionally interleave ignorable boundaries #3475

Conversation

jbowens commented Mar 29, 2024

cockroach-teamcity commented Mar 29, 2024

RaduBerinde left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

jbowens commented Apr 5, 2024