Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan Delete Support Part 3: ArrowReader::build_deletes_row_selection implementation #951

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sdd
Copy link
Contributor

@sdd sdd commented Feb 8, 2025

Third part of delete file read support. See #630

Builds on top of #950 and should be merged after that one.

build_deletes_row_selection computes a RowSelection from a &[usize] representing the indexes of rows in a data file that have been marked as deleted by positional delete files that apply to the data file being read.

The resulting RowSelection will be merged with a RowSelection resulting from the scan's filter predicate (if present) and supplied to the ParquetRecordBatchStreamBuilder so that deleted rows are omitted from the RecordBatchStream returned by the reader.

NB: I encountered quite a few edge cases in this method and the logic is quite complex. There is a good chance that a keen-eyed reviewer would be able to conceive of an edge-case that I haven't covered.

@sdd sdd changed the title Feat/build deletes row selection implementation ArrowReader::build_deletes_row_selection implementation Feb 8, 2025
@sdd sdd force-pushed the feat/build-deletes-row-selection-implementation branch 3 times, most recently from f4b6d94 to a52fe50 Compare February 9, 2025 12:17
@sdd sdd force-pushed the feat/build-deletes-row-selection-implementation branch from a52fe50 to 4017a0e Compare February 9, 2025 12:22
sdd added 4 commits February 10, 2025 08:59
* refactor: only pass row groups metadata rather than entire
  parquet metadata to . This
  makes it easier to test  as
  we don't need to mock up a full
@sdd sdd force-pushed the feat/build-deletes-row-selection-implementation branch from 2fc5f70 to 26dc78f Compare February 10, 2025 09:00
@sdd sdd marked this pull request as ready for review February 19, 2025 06:28
@sdd sdd changed the title ArrowReader::build_deletes_row_selection implementation Scan Delete Support Part 3: ArrowReader::build_deletes_row_selection implementation Feb 21, 2025
…haviour when processing a row selection that ends before the end if the stream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant