Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Calling ParquetFileWriter$WriteTable with a non-Table crashes #42240

Closed
amoeba opened this issue Jun 21, 2024 · 1 comment
Closed

[R] Calling ParquetFileWriter$WriteTable with a non-Table crashes #42240

amoeba opened this issue Jun 21, 2024 · 1 comment
Assignees
Labels
Component: R Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Milestone

Comments

@amoeba
Copy link
Member

amoeba commented Jun 21, 2024

Describe the bug, including details regarding any error messages, version, and platform.

While comparing how PyArrow does incremental Parquet file writing, I noticed you can crash ParquetFileWriter$WriteTable if you don't pass a Table as it expects:

library(arrow)

tf <- tempfile()
fos <- FileOutputStream$create(tf)
schm <- schema(a = int32())
pfw <- ParquetFileWriter$create(sink=fos, schema=schm, ParquetWriterProperties$create(column_names=names(schm)))

# create a batch and crash when writing it
batch <- RecordBatch$create(data.frame(a=1:10))
pfw$WriteTable(batch, chunk_size = 10)

When run, this produces:

 *** caught segfault ***
address 0x0, cause 'invalid permissions'

Traceback:
 1: parquet___arrow___FileWriter__WriteTable(self, table, chunk_size)
 2: pfw$WriteTable(batch, chunk_size = 10)
An irrecoverable exception occurred. R is aborting now ...
fish: Job 1, 'Rscript crash.R' terminated by signal SIGSEGV (Address boundary error)

The package generally provides more user-friendly wrappers around the R6 classes but I thought I'd file a bug in case others would want to see this fixed.

Component(s)

R

jonkeane pushed a commit that referenced this issue Jul 10, 2024
…Batch (#42241)

### Rationale for this change

See #42240.

### What changes are included in this PR?

- Fixes a crash in `ParquetFileWriter$WriteTable` by asserting the class of what's passed in and stopping if it's not a `Table`
- Since I was already there, added `WriteBatch` to match [`pyarrow.parquet.ParquetWriter.write_batch`](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html#pyarrow.parquet.ParquetWriter.write_batch) which is just a convenience
- Adds a test for the behavior of trying to write to a closed sink
- Bumps the minimum Arrow C++ test version we test the R package with on CI from 13 to 15
- Removes one ARROW_VERSION_MAJOR >= 15 guard

### Are these changes tested?

Yes.

### Are there any user-facing changes?

New method on ParquetFileWriter (WriteBatch).
* GitHub Issue: #42240

Authored-by: Bryce Mecum <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
@jonkeane
Copy link
Member

Issue resolved by pull request 42241
#42241

@jonkeane jonkeane added this to the 18.0.0 milestone Jul 10, 2024
@amoeba amoeba added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: R Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Projects
None yet
Development

No branches or pull requests

2 participants