You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/_private/worker.py", line 2524, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::_Inner.train() (pid=1885930, ip=10.20.0.6, actor_id=6d08bc117a6b35cc7647003f01000000, repr=AccelerateTrainer)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 375, in train
raise skipped from exception_cause(skipped)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/train/_internal/utils.py", line 54, in check_for_failure
ray.get(object_ref)
ray.exceptions.RayTaskError(AttributeError): ray::_RayTrainWorker__execute.get_next() (pid=1886047, ip=10.20.0.6, actor_id=dd5dcbaf834905aa00b49be601000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f3d4c6d20a0>)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/train/_internal/worker_group.py", line 32, in __execute
raise skipped from exception_cause(skipped)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/train/_internal/utils.py", line 129, in discard_return_wrapper
train_func(*args, **kwargs)
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/ray/train/huggingface/accelerate/accelerate_trainer.py", line 411, in _accelerate_train_loop_per_worker
return train_loop_per_worker(*args, **kwargs)
File "/scratch2/jliu/CF_RL/scripts/trlx/examples/ppo_sentiments.py", line 47, in main
trlx.train(
File "/scratch2/jliu/CF_RL/scripts/trlx/trlx/trlx.py", line 92, in train
trainer = get_trainer(config.train.trainer)(
File "/scratch2/jliu/CF_RL/scripts/trlx/trlx/trainer/accelerate_ppo_trainer.py", line 74, in __init__
if not hasattr(self.model, "frozen_head") and not self.model.peft_type:
File "/scratch2/jliu/.conda/envs/RL/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'peft_type'
The same error occurred when we changed the config file into the iml setting below
🐛 Describe the bug
Hi we encountered the
DistributedDataParallel issue
when running the example code with Ray Optimization, in which we set theDistributed Type: no
:ray start --head --port=6379 python -m trlx.sweep --config configs/sweeps/ppo_sweep.yml --accelerate_config configs/accelerate/ddp.yaml --num_gpus 4 examples/ppo_sentiments.py
Here's the Traceback Error that we encountered
The same error occurred when we changed the config file into the iml setting below
Thank you very much!
Which trlX version are you using?
https://github.com/CarperAI/trlx/tree/3340c2f3a56d1d14fdd5f13ad575121fa26b6d92
Additional system and package information
transformers==4.32.0
,python==3.9
The text was updated successfully, but these errors were encountered: