You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Failed to load dataset when trying to train aero_graph_net.
Is there any way to fix this?
it stuck in the hydra instantiation as shown in the error log.
Minimum reproducible example
Relevant log output
[18:55:46 - agnet - INFO] Loading the training dataset...
Error executing job with overrides: ['+experiment=ahmed/mgn', 'data.data_dir=./data/ahmed_body']
concurrent.futures.process._RemoteTraceback:
'''Traceback (most recent call last): File "/home/willy/anaconda3/envs/modulus/lib/python3.10/concurrent/futures/process.py", line 392, in wait_result_broken_or_wakeup result_item = result_reader.recv() File "/home/willy/anaconda3/envs/modulus/lib/python3.10/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 496, in rebuild_storage_fd fd = df.detach() File "/home/willy/anaconda3/envs/modulus/lib/python3.10/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/willy/anaconda3/envs/modulus/lib/python3.10/multiprocessing/reduction.py", line 189, in recv_handle return recvfds(s, 1)[0] File "/home/willy/anaconda3/envs/modulus/lib/python3.10/multiprocessing/reduction.py", line 164, in recvfds raise RuntimeError('received %d items of ancdata' %RuntimeError: received 0 items of ancdata'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 92, in _call_target
return _target_(*args, **kwargs)
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/modulus/datapipes/gnn/ahmed_body_dataset.py", line 219, in __init__
for(i, graph, coeff, normal, area)in executor.map(
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/concurrent/futures/process.py", line 575, in _chain_from_iterable_of_lists
forelementin iterable:
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/concurrent/futures/_base.py", line 458, in result
returnself.__get_result()
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/willy/modulus/modulus/examples/cfd/aero_graph_net/train.py", line 267, in<module>main()
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
returnfunc()
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in<lambda>
lambda: hydra.run(
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/home/willy/modulus/modulus/examples/cfd/aero_graph_net/train.py", line 219, in main
trainer = MGNTrainer(cfg)
File "/home/willy/modulus/modulus/examples/cfd/aero_graph_net/train.py", line 54, in __init__
self.dataset = instantiate(cfg.data.train)
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 226, in instantiate
return instantiate_node(
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 347, in instantiate_node
return _call_target(_target_, partial, args, kwargs, full_key)
File "/home/willy/anaconda3/envs/modulus/lib/python3.10/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 97, in _call_target
raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error in call to target 'modulus.datapipes.gnn.ahmed_body_dataset.AhmedBodyDataset':
BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.')
full_key: data.train
Environment details
The text was updated successfully, but these errors were encountered:
I changed my command as the above, and it passed the dataset loading problem.
But why when I try to change the num_samples higher than that, it returns the same error?
So anything greater than 10 in data.train.num_samples causes that error to appear?
From the error itself, it looks like something happens during dataset pre-loading in one of the graph loading processes.
Unfortunately, I could not reproduce the issue on my side.
You can try adding some simple prints to create_graph function to see if there is a particular file or place where the error occurs (and keep num_workers=1 to simplify the debugging).
Also, which environment does this issue happen in?
Version
0.8.0
On which installation method(s) does this occur?
No response
Describe the issue
Failed to load dataset when trying to train aero_graph_net.
Is there any way to fix this?
it stuck in the hydra instantiation as shown in the error log.
Minimum reproducible example
Relevant log output
Environment details
The text was updated successfully, but these errors were encountered: