Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Possible race condition in py_binary bash wrapper #373

Open
liningpan opened this issue Jul 30, 2024 · 5 comments
Open

[Bug]: Possible race condition in py_binary bash wrapper #373

liningpan opened this issue Jul 30, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@liningpan
Copy link

What happened?

When calling the same py_binary target in parallel, the bash wrapper is racy when setting up the virtual environment. For example, we want to run a py_binary target in parallel as runfiles.

Both the old pure bash wrapper and the new one using rattler are removing the existing environment and creating a new one. This step is fundamentally unsafe and we will run into time-of-check to time-of-use issues everywhere.

We need the synchronization mechanism here (fd-lock) as well as a way to ensure the virtual environment is only created once.

Version

Development (host) and target OS/architectures: Linux Centos 7 x86

Output of bazel --version: 6.5.0

Version of the Aspect rules, or other relevant rules from your
WORKSPACE or MODULE.bazel file: 0.6.0 (last version compatible w/ centos 7); I'm pretty sure this will affect the latest version as well.

Language(s) and/or frameworks involved: Python 3.11

How to reproduce

Call `rules_py` `py_binary` target in parallel.

Any other information?

No response

@liningpan liningpan added the bug Something isn't working label Jul 30, 2024
@liningpan
Copy link
Author

@alexeagle Friendly ping here. Is this a known issue and if there would ever be a fix? I would if the best solution is to switch back to rules_python.

@mattem
Copy link
Member

mattem commented Aug 20, 2024

if there would ever be a fix?

Currently my time for OSS support is extremely limited, and generally done in some "free" time, so it may be a while before I am able to spend focused time on this.

If you'd like to discuss a design to move forwards on a fix and submit a PR, I'd be more than happy to have that conversation and work with you through the review. You can also consider sponsoring a bug / feature request via OpenCollective.

@liningpan
Copy link
Author

Hi Matt! Thanks for the response. I'm happy to work on a PR. When I say "if there would ever be a fix", I meant that given the requirement of setting up a virtual environment and updating the virtual environment on each Bazel invocation, it seems like something fundamental to how rules_py is designed to work.

One possible solution I can think of is adding fd-lock and checksum the pth file.

@alexeagle
Copy link
Member

Is it possible that we just don't re-use the same output folder? Could we trivially add a numbered parent folder in the path, similar to Bazel's sandbox? Maybe use the PID of the process to avoid collisions? Or would that result in eventual resource leak where the disk fills up?

Or, could we easily read an environment variable like VENV_LOCATION and use that as the parent folder for rattler to layout the venv? Then it's up to users to avoid having this sort of collisions.

@voxeljorge
Copy link

We have run into this one actually, but a generally ok solution seems to be to just run the program once to force the venv generation.

An alternative here might be to just expose a target that builds the venv the app will use without actually running the app. Users can then be responsible for the preflight.

Separating multiple output directories seems excessive given they would all be the same exact content, and would only be written once each no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants