You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if __name__ == "__main__":
mp.set_start_method('spawn')
device = torch.device("cuda" if torch.cuda.is_available() and args.cuda else "cpu")
envs = gym.vector.AsyncVectorEnv(
[make_env(args.gym_id, args.seed + i, i, args.capture_video, run_name, qubits, depth) for i in range(args.num_envs)],
shared_memory=False)
agent = AgentGNN(envs, device).to(device)#Graph Neural Network
for update in range(1, num_updates + 1):
for step in range(args.num_steps):
global_step += 1 * args.num_envs
dones[step] = next_done
try:
with torch.no_grad():
action, logprob, _, value, logits, action_ids = agent.get_action_and_value(next_obs_graph, device=device)
values[step] = value.flatten()
actions[step] = action
logprobs[step] = logprob
next_obs, reward, done, deprecated, info = envs.step(action_ids.cpu().numpy())
except TypeError as e:
print(f"Error: {e}")
rewards[step] = torch.tensor(reward).to(device).view(-1)
next_done = torch.Tensor(done).to(device)
As far as I understand the error, this code generates as much threads as environments I want. In one particular thread , the agent breaks in env.step(). As you can see, I tried to solve this issue with a try-except, but this does not work. I think this can be because the thread just keeps on hold until it breaks but I am not sure.
Traceback
Traceback (most recent call last):
File "/home/jriu/Copt-cquere/rl-zx/ppo.py", line 204, in <module>
next_obs, reward, done, deprecated, info = envs.step(action_ids.cpu().numpy())
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/site-packages/gym/vector/vector_env.py", line 137, in step
return self.step_wait()
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/site-packages/gym/vector/async_vector_env.py", line 320, in step_wait
result, success = pipe.recv()
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
/home/jriu/anaconda3/envs/cquere/lib/python3.10/site-packages/gym/vector/async_vector_env.py:457: UserWarning: WARN: Calling `close` while waiting for a pending call to `step` to complete.
Exception ignored in: <function AsyncVectorEnv.__del__ at 0x7ea18eb856c0>
Traceback (most recent call last):
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/site-packages/gym/vector/async_vector_env.py", line 546, in __del__
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/site-packages/gym/vector/vector_env.py", line 205, in close
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/site-packages/gym/vector/async_vector_env.py", line 461, in close_extras
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/site-packages/gym/vector/async_vector_env.py", line 320, in step_wait
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/multiprocessing/connection.py", line 250, in recv
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
File "/home/jriu/anaconda3/envs/cquere/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
EOFError:
System Info
I use gym 0.26.2, torch 2.0.1 and python 3.10.14. I am using Ubuntu 24.04 LTS. All of the packages were installed using pip.
Additional context
Add any other context about the problem here.
Checklist
I have checked that there is no similar issue in the repo (required)
The text was updated successfully, but these errors were encountered:
Describe the bug
The code suddenly reaches an EOF error when calling the
step
method after 12M steps of training.Code example
I am using
gym.vector.AsyncVectorEnv()
. I use the functionmake_env
to create my environments.The main part of the code is as follows:
As far as I understand the error, this code generates as much threads as environments I want. In one particular thread , the agent breaks in
env.step()
. As you can see, I tried to solve this issue with a try-except, but this does not work. I think this can be because the thread just keeps on hold until it breaks but I am not sure.Traceback
System Info
I use gym 0.26.2, torch 2.0.1 and python 3.10.14. I am using Ubuntu 24.04 LTS. All of the packages were installed using pip.
Additional context
Add any other context about the problem here.
Checklist
The text was updated successfully, but these errors were encountered: