You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
When using local_mp, each process that uses jax spawns a huge amount of threads. I'm running 128 actors, and each one spawns ~500 threads, meaning the program spawns over 50,000 threads!
This puts me over the ulimit for my university cluster, and I suspect isn't performant. The recommended solution is to set XLA_FLAGS="--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1". But for some reason this isn't working with PythonProcess. Here's my PythonProcess for each of my nodes:
Which results in the error bash: line 1: XLA_FLAGS=--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1: command not found in each process that uses a local resource with those envs. Why is the environment variable being treated as a command here? I've talso ried enclosing the value in quotes which did not work. Thank you!
The text was updated successfully, but these errors were encountered:
For anyone else who wants to set XLA_FLAGS, I found a workaround solution that involves editing your site_packages. I'm using the "tmux launcher", (filelaunchpad/launch/run_locally/local_tmux_launcher) which internally calls the (undocumented) subprocess.list2cmdline function on a list that looks like ["env1=val1", "env2=val2", "/path/to/python", "command_name.py"]. Ideally this turns into a command like env1=val1 env2=val2 /path/to/python command_name.py. But, if there are spaces in any of the env values, then it puts quotes around the key/val: env_1=env1 "env2=spaced value" /path/to/python command_name.py. This doesn't set the environment variable env2, but instead tries to run env2=spaced val as a bash command.
Maybe that's desired behavior by the subprocess.list2cmdline but it prevents you from setting env variables with spaces in them. So, I just edited it to strip the quotation marks: cmd = cmd.replace('"', ""). And, used backslash escaping on the spaces inside of the XLA_FLAGS value.
Would be great to get this fixed or documented as XLA_FLAGS must be a common use case for launchpad!
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When using
local_mp
, each process that uses jax spawns a huge amount of threads. I'm running 128 actors, and each one spawns ~500 threads, meaning the program spawns over 50,000 threads!This puts me over the
ulimit
for my university cluster, and I suspect isn't performant. The recommended solution is to setXLA_FLAGS="--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1"
. But for some reason this isn't working withPythonProcess
. Here's my PythonProcess for each of my nodes:Which results in the error
bash: line 1: XLA_FLAGS=--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1: command not found
in each process that uses a local resource with those envs. Why is the environment variable being treated as a command here? I've talso ried enclosing the value in quotes which did not work. Thank you!The text was updated successfully, but these errors were encountered: