Scan Delete Support Part 3: ArrowReader::build_deletes_row_selection
implementation
#951
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Third part of delete file read support. See #630
Builds on top of #950 and should be merged after that one.
build_deletes_row_selection
computes aRowSelection
from a&[usize]
representing the indexes of rows in a data file that have been marked as deleted by positional delete files that apply to the data file being read.The resulting
RowSelection
will be merged with aRowSelection
resulting from the scan's filter predicate (if present) and supplied to theParquetRecordBatchStreamBuilder
so that deleted rows are omitted from theRecordBatchStream
returned by the reader.NB: I encountered quite a few edge cases in this method and the logic is quite complex. There is a good chance that a keen-eyed reviewer would be able to conceive of an edge-case that I haven't covered.