Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36346: [C++] Safe S3 finalization #36442

Merged
merged 2 commits into from
Jul 4, 2023
Merged

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jul 3, 2023

Rationale for this change

It is required to call FinalizeS3 at process end to cleanly shut down resources used by the AWS SDK before it's too late.
However, once FinalizeS3 has been called, another problem appears: no AWS SDK API can be called safely anymore. Even object destructors can be dangerous.
We therefore need a way to prevent unsafe use of the AWS SDK APIs, regardless of how the application issues filesystems calls.

What changes are included in this PR?

Shield all uses of the internal S3Client class behind a safe RAII facility S3ClientLock. Obtaining a S3ClientLock ensures that S3 finalization has not been called yet and that it will not be called until the S3ClientLock goes out of scope.

Are these changes tested?

Yes, some Python tests exercise calling S3 APIs after explicit finalization.

Are there any user-facing changes?

Not for correctly written application code, ideally.

@pitrou
Copy link
Member Author

pitrou commented Jul 3, 2023

This is an alternative approach to #36437. @westonpace @kou

AwsInstance() : is_initialized_(false), is_finalized_(false) {}
~AwsInstance() { Finalize(/*from_destructor=*/true); }

// Returns true iff the instance was newly initialized with `options`
Result<bool> EnsureInitialized(const S3GlobalOptions& options) {
bool expected = false;
// XXX The individual accesses are atomic but the entire sequence below is not.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I don't think this should be a problem in practice, as calls to EnsureInitialized and Finalize should be serialized at the application level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your opinion.
How about adding this note to this comment or documents for InitializeS3()/FinalizeS3()?
(I think that we can remove the "XXX" mark here.)

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jul 3, 2023
@pitrou
Copy link
Member Author

pitrou commented Jul 3, 2023

@github-actions crossbow submit -g python -g wheel

@github-actions

This comment was marked as outdated.

cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
AwsInstance() : is_initialized_(false), is_finalized_(false) {}
~AwsInstance() { Finalize(/*from_destructor=*/true); }

// Returns true iff the instance was newly initialized with `options`
Result<bool> EnsureInitialized(const S3GlobalOptions& options) {
bool expected = false;
// XXX The individual accesses are atomic but the entire sequence below is not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your opinion.
How about adding this note to this comment or documents for InitializeS3()/FinalizeS3()?
(I think that we can remove the "XXX" mark here.)

@github-actions github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 4, 2023
@pitrou pitrou marked this pull request as ready for review July 4, 2023 09:16
@pitrou pitrou requested a review from AlenkaF as a code owner July 4, 2023 09:16
@pitrou
Copy link
Member Author

pitrou commented Jul 4, 2023

@github-actions crossbow submit -g python -g wheel

@github-actions github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Jul 4, 2023
@pitrou
Copy link
Member Author

pitrou commented Jul 4, 2023

Thanks a lot for the review @kou . I agree with all your comments and made the necessary changes.

@github-actions
Copy link

github-actions bot commented Jul 4, 2023

Revision: c171b31

Submitted crossbow builds: ursacomputing/crossbow @ actions-1c2290f238

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.10-hdfs-2.9.2 Github Actions
test-conda-python-3.10-hdfs-3.2.1 Github Actions
test-conda-python-3.10-pandas-latest Github Actions
test-conda-python-3.10-pandas-nightly Github Actions
test-conda-python-3.10-spark-master Github Actions
test-conda-python-3.10-substrait Github Actions
test-conda-python-3.11 Github Actions
test-conda-python-3.11-dask-latest Github Actions
test-conda-python-3.11-dask-upstream_devel Github Actions
test-conda-python-3.11-hypothesis Github Actions
test-conda-python-3.11-pandas-upstream_devel Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-pandas-1.0 Github Actions
test-conda-python-3.8-spark-v3.1.2 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-pandas-latest Github Actions
test-conda-python-3.9-spark-v3.2.0 Github Actions
test-cuda-python Github Actions
test-debian-11-python-3 Azure
test-fedora-35-python-3 Azure
test-ubuntu-20.04-python-3 Azure
wheel-clean Github Actions
wheel-macos-big-sur-cp310-arm64 Github Actions
wheel-macos-big-sur-cp311-arm64 Github Actions
wheel-macos-big-sur-cp38-arm64 Github Actions
wheel-macos-big-sur-cp39-arm64 Github Actions
wheel-macos-mojave-cp310-amd64 Github Actions
wheel-macos-mojave-cp311-amd64 Github Actions
wheel-macos-mojave-cp38-amd64 Github Actions
wheel-macos-mojave-cp39-amd64 Github Actions
wheel-manylinux-2-28-cp310-amd64 Github Actions
wheel-manylinux-2-28-cp310-arm64 Github Actions
wheel-manylinux-2-28-cp311-amd64 Github Actions
wheel-manylinux-2-28-cp311-arm64 Github Actions
wheel-manylinux-2-28-cp38-amd64 Github Actions
wheel-manylinux-2-28-cp38-arm64 Github Actions
wheel-manylinux-2-28-cp39-amd64 Github Actions
wheel-manylinux-2-28-cp39-arm64 Github Actions
wheel-manylinux-2014-cp310-amd64 Github Actions
wheel-manylinux-2014-cp310-arm64 Github Actions
wheel-manylinux-2014-cp311-amd64 Github Actions
wheel-manylinux-2014-cp311-arm64 Github Actions
wheel-manylinux-2014-cp38-amd64 Github Actions
wheel-manylinux-2014-cp38-arm64 Github Actions
wheel-manylinux-2014-cp39-amd64 Github Actions
wheel-manylinux-2014-cp39-arm64 Github Actions
wheel-windows-cp310-amd64 Github Actions
wheel-windows-cp311-amd64 Github Actions
wheel-windows-cp38-amd64 Github Actions
wheel-windows-cp39-amd64 Github Actions

@pitrou
Copy link
Member Author

pitrou commented Jul 4, 2023

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Jul 4, 2023

Revision: c171b31

Submitted crossbow builds: ursacomputing/crossbow @ actions-0d14e99a00

Task Status
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-20 Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-minimal-with-formats Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions

@pitrou
Copy link
Member Author

pitrou commented Jul 4, 2023

The remaining CI failures seem unrelated. I think I'm going to merge so as to improve CI, as we're nearing the release.

@pitrou pitrou merged commit e7d5028 into apache:main Jul 4, 2023
@pitrou pitrou removed the awaiting review Awaiting review label Jul 4, 2023
@pitrou pitrou deleted the s3-finalize-safe branch July 4, 2023 12:47
westonpace pushed a commit to westonpace/arrow that referenced this pull request Jul 7, 2023
### Rationale for this change

It is required to call `FinalizeS3` at process end to cleanly shut down resources used by the AWS SDK before it's too late.
However, once `FinalizeS3` has been called, another problem appears: no AWS SDK API can be called safely anymore. Even object destructors can be dangerous.
We therefore need a way to prevent unsafe use of the AWS SDK APIs, regardless of how the application issues filesystems calls.

### What changes are included in this PR?

Shield all uses of the internal `S3Client` class behind a safe RAII facility `S3ClientLock`. Obtaining a `S3ClientLock` ensures that S3 finalization has not been called yet _and_ that it will not be called until the `S3ClientLock` goes out of scope.

### Are these changes tested?

Yes, some Python tests exercise calling S3 APIs after explicit finalization.

### Are there any user-facing changes?

Not for correctly written application code, ideally.

* Closes: apache#36346

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit e7d5028d.

There were 7 benchmark results indicating a performance regression:

The full Conbench report has more details.

kou pushed a commit that referenced this pull request Aug 3, 2023
### Rationale for this change

In #12227 we decided to use a bundled version of the AWS SDK when compiling Python wheels, in order to downgrade the AWS SDK version.

Now that we have fixed S3 finalization issues (#36442), it should be ok to rely on the vcpkg-installed version of the AWS SDK again.

### What changes are included in this PR?

Remove use of bundled AWS SDK and use S3 vcpkg feature for requirements.

### Are these changes tested?

On CI and via crossbow

### Are there any user-facing changes?

No
* Closes: #36752

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@hellkite500
Copy link

hellkite500 commented Aug 23, 2023

Did this make it into v13.0.0? If so, I'm still seeing issues

/Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/arrow/cpp/src/arrow/filesystem/s3fs.cc:2829:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
zsh: segmentation fault  ./test_routing_pybind
(venv) test % pip list | grep pyarrow
pyarrow         13.0.0

@pitrou
Copy link
Member Author

pitrou commented Aug 23, 2023

Yes, it should be in 13.0.0. Can you please open an issue with a reproducer script?

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…apache#36925)

### Rationale for this change

In apache#12227 we decided to use a bundled version of the AWS SDK when compiling Python wheels, in order to downgrade the AWS SDK version.

Now that we have fixed S3 finalization issues (apache#36442), it should be ok to rely on the vcpkg-installed version of the AWS SDK again.

### What changes are included in this PR?

Remove use of bundled AWS SDK and use S3 vcpkg feature for requirements.

### Are these changes tested?

On CI and via crossbow

### Are there any user-facing changes?

No
* Closes: apache#36752

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][S3] Crash on exit
3 participants