DPO trainer example #172

sparsh35 · 2024-10-22T20:42:53Z

Describe the bug
In trying DPO trainer example getting a bug with batch size and sharding , may be shard axis are not properly set or could be jax error as well , system used is V3 -32 , 4 hosts
To Reproduce
Steps to reproduce the behavior
Just run the dpo trainer examples

the error is this

outputs = self.layers(

File "/home/spars/.local/lib/python3.10/site-packages/easydel/modules/qwen2/modeling_qwen_flax.py", line 862, in call
output = pecs entry. Consider checking that in_specs are correct, and if so consider changing the mesh axis sizes or else padding the input and adapting 'functools.partial(<function ring_attention at 0x7f2e71b196c0>, axis_name='sp', float32_logits=True, platform=None, backend=None, autocheck=True, blocksize_c=None, blocksize_k=128, blocksize_q=128, dtype=<class 'jax.numpy.float32'>, softmax_scale=0.08838834764831843, deterministic=True, dropout_rng=None)' appropriately.
layer(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/modules/qwen2/modeling_qwen_flax.py", line 502, in call
attn_outputs = self.self_attn(
File "/home/spars/.local/lib/python3.10/site-packages/flax/linen/partitioning.py", line 568, in inner
return rematted(variable_groups, rng_groups, *dyn_args)
File "/home/spars/.local/lib/python3.10/site-packages/flax/linen/partitioning.py", line 565, in rematted
y = fn(scope, *args)
File "/home/spars/.local/lib/python3.10/site-packages/easydel/modules/qwen2/modeling_qwen_flax.py", line 382, in call
attentions = self.attention_performer(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/modules/attention_module.py", line 529, in call
return self.ring_attention(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/modules/attention_module.py", line 649, in ring_attention
attn_output = shard_map(
ValueError: shard_map applied to the function 'functools.partial(<function ring_attention at 0x7f9c76b396c0>, axis_name='sp', float32_logits=True, platform=None, backend=None, autocheck=True, blocksize_c=None, blocksize_k=128, blocksize_q=128, dtype=<class 'jax.numpy.float32'>, softmax_scale=0.08838834764831843, deterministic=True, dropout_rng=None)' was given argument arrays with axis sizes that are not evenly divisible by the corresponding mesh axis sizes:

The mesh given has shape (1, 32, 1, 1) with corresponding axis names ('dp', 'fsdp', 'tp', 'sp').

args[0] of shape float32[1,1,28,128], where args[0] is bound to functools.partial(<function ring_attention at 0x7f9c76b396c0>, axis_name='sp', float32_logits=True, platform=None, backend=None, autocheck=True, blocksize_c=None, blocksize_k=128, blocksize_q=128, dtype=<class 'jax.numpy.float32'>, softmax_scale=0.08838834764831843, deterministic=True, dropout_rng=None)'s parameter 'query', corresponds to in_specs[0] of value PartitionSpec(('dp', 'fsdp'), 'sp', 'tp', None), which maps array axis 0 (of size 1) to mesh axes ('dp', 'fsdp') (of total size 32), but 32 does not evenly divide 1
args[1] of shape float32[1,1,28,128], where args[1] is bound to functools.partial(<function ring_attention at 0x7f9c76b396c0>, axis_name='sp', float32_logits=True, platform=None, backend=None, autocheck=True, blocksize_c=None, blocksize_k=128, blocksize_q=128, dtype=<class 'jax.numpy.float32'>, softmax_scale=0.08838834764831843, deterministic=True, dropout_rng=None)'s parameter 'key', corresponds to in_specs[1] of value PartitionSpec(('dp', 'fsdp'), 'sp', 'tp', None), which maps array axis 0 (of size 1) to mesh axes ('dp', 'fsdp') (of total size 32), but 32 does not evenly divide 1
args[2] of shape float32[1,1,28,128], where args[2] is bound to functools.partial(<function ring_attention at 0x7f9c76b396c0>, axis_name='sp', float32_logits=True, platform=None, backend=None, autocheck=True, blocksize_c=None, blocksize_k=128, blocksize_q=128, dtype=<class 'jax.numpy.float32'>, softmax_scale=0.08838834764831843, deterministic=True, dropout_rng=None)'s parameter 'value', corresponds to in_specs[2] of value PartitionSpec(('dp', 'fsdp'), 'sp', 'tp', None), which maps array axis 0 (of size 1) to mesh axes ('dp', 'fsdp') (of total size 32), but 32 does not evenly divide 1
args[3] of shape float32[1,1,1,1], where args[3] is bound to functools.partial(<function ring_attention at 0x7f9c76b396c0>, axis_name='sp', float32_logits=True, platform=None, backend=None, autocheck=True, blocksize_c=None, blocksize_k=128, blocksize_q=128, dtype=<class 'jax.numpy.float32'>, softmax_scale=0.08838834764831843, deterministic=True, dropout_rng=None)'s parameter 'bias', corresponds to in_specs[3] of value PartitionSpec(('dp', 'fsdp'), 'tp', 'sp', None), which maps array axis 0 (of size 1) to mesh axes ('dp', 'fsdp') (of total size 32), but 32 does not evenly divide 1

The text was updated successfully, but these errors were encountered:

sparsh35 · 2024-10-23T19:02:27Z

Another issue , with ORPO trainer ,
TypeError: Can't instantiate abstract class ORPOTrainer with abstract methods _eval_epoch, _execute_eval_step, _execute_train_step, _finalize_training, _run_evaluation, _run_training_loop, _train_epoch

sparsh35 · 2024-10-23T19:48:38Z

So tried, the DPO Trainer test , python_test/trainers/dpo_test.py , but it is also getting stuck at
params={"params": model.shard_params(model.params)},
File "/home/spars/.local/lib/python3.10/site-packages/transformers/modeling_flax_utils.py", line 272, in params
raise ValueError(
ValueError: params cannot be accessed from model when the model is created with _do_init=False. You must call init_weights manually and store the params outside of the model and pass it explicitly where needed.
Traceback (most recent call last):

How can I initialize from initial , I already have a SFT model , is there a way to transfer the parameters from trained model to initialized model , and the logic for this I couldn't understand, why it was necessary ?

erfanzar · 2024-10-25T10:35:47Z

Hi
Yea i can fix that for u in next 2,3 hours ill give you a Dpo example with an example for transferring model

erfanzar · 2024-10-25T15:28:44Z

@sparsh35, I have added a tutorial for the DPO trainer, and I am sorry if I was slow on that; I was checking some other things.

https://github.com/erfanzar/EasyDeL/blob/main/notebooks/dpo-trainer.ipynb

sparsh35 · 2024-10-26T12:07:06Z

@erfanzar No problem at all , I know you are the sole maintainer of this project, things do get hectic.

sparsh35 · 2024-10-31T13:19:50Z

Still getting issue , with dtypes,
Traceback (most recent call last):
File "/home/spars/train/dpotraining.py", line 161, in
main()
File "/home/spars/train/dpotraining.py", line 135, in main
output=dpo_trainer.train()
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 1160, in train
output, run_exception = self._run_training_loop(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 920, in _run_training_loop
current_step, run_exception = self._train_epoch(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 993, in _train_epoch
loss, metrics, run_exception = self._execute_train_step(batch)
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 1102, in _execute_train_step
self.model_state, dpo_out = self.sharded_train_step_function(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/func_utils/creators.py", line 271, in dpo_step
(__loss, (__chosen_rewards, __rejected_rewards)), grads = grad_fn(state.params)
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/func_utils/creators.py", line 254, in calculate_loss
losses = _loss_func(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/func_utils/loss_funcs.py", line 77, in _kto_pair_dpo_loss
chosen_kl = jax.lax.clamp(
TypeError: lax.clamp requires arguments to have the same dtypes, got int32, bfloat16, float32.

Using same code as given in example.

erfanzar · 2024-10-31T15:51:52Z

@sparsh35 hello and thanks for re-opening issue, i have that fixed rn

sparsh35 · 2024-10-31T16:03:59Z

@erfanzar Thanks , i also modified code for hinge loss and kto also to check, I think there is another bug related to jax array sharding , this code maybe isn't compatible with multi-host, I can't debug this error , can you help
[1 1 1 ... 0 0 0]]
jax.errors.SimplifiedTraceback: For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/spars/train/dpotraining.py", line 153, in
main()
File "/home/spars/train/dpotraining.py", line 127, in main
output=dpo_trainer.train()
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 1160, in train
output, run_exception = self._run_training_loop(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 920, in _run_training_loop
current_step, run_exception = self._train_epoch(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 993, in _train_epoch
loss, metrics, run_exception = self._execute_train_step(batch)
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 1102, in _execute_train_step
self.model_state, dpo_out = self.sharded_train_step_function(
ValueError: Passing non-trivial shardings for numpy inputs is not allowed. To fix this error, either specify a replicated sharding explicitly or use jax.experimental.multihost_utils.host_local_array_to_global_array(...) to convert your host local numpy inputs to a jax.Array which you can pass to pjit. If the numpy input is the same on each process, then you can use jax.make_array_from_callback(...) to create a jax.Array` which you can pass to pjit. Please see the jax.Array migration guide for more information https://jax.readthedocs.io/en/latest/jax_array_migration.html#handling-of-host-local-inputs-to-pjit-like-batch-etc. Got arg shape: (32, 1536), arg value: [[1 1 1 ... 0 0 0]

erfanzar · 2024-10-31T20:16:37Z

@sparsh35 thanks ill fix that one asap

erfanzar · 2024-10-31T21:44:44Z

can you try again and tell me if u still facing this issue

sparsh35 · 2024-11-01T04:15:38Z

No this is the error ,
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/spars/train/sa.py", line 124, in
output=dpo_trainer.train()
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 1198, in train
output, run_exception = self._run_training_loop(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 951, in _run_training_loop
current_step, run_exception = self._train_epoch(
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 1024, in _train_epoch
loss, metrics, run_exception = self._execute_train_step(batch)
File "/home/spars/.local/lib/python3.10/site-packages/easydel/trainers/direct_preference_optimization_trainer/dpo_trainer.py", line 1140, in _execute_train_step
self.model_state, dpo_out = self.sharded_train_step_function(
File "/home/spars/.local/lib/python3.10/site-packages/jax/_src/array.py", line 1047, in _array_mlir_constant_handler
raise RuntimeError(
RuntimeError: Closing over jax.Array that spans non-addressable (non process local) devices is not allowed. Please pass such arrays as arguments to the function. Got jax.Array: bfloat16[3584,152064]

sparsh35 closed this as completed Oct 26, 2024

sparsh35 reopened this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO trainer example #172

DPO trainer example #172

sparsh35 commented Oct 22, 2024

sparsh35 commented Oct 23, 2024

sparsh35 commented Oct 23, 2024

erfanzar commented Oct 25, 2024

erfanzar commented Oct 25, 2024

sparsh35 commented Oct 26, 2024

sparsh35 commented Oct 31, 2024

erfanzar commented Oct 31, 2024 •

edited

Loading

sparsh35 commented Oct 31, 2024

erfanzar commented Oct 31, 2024

erfanzar commented Oct 31, 2024

sparsh35 commented Nov 1, 2024

DPO trainer example #172

DPO trainer example #172

Comments

sparsh35 commented Oct 22, 2024

sparsh35 commented Oct 23, 2024

sparsh35 commented Oct 23, 2024

erfanzar commented Oct 25, 2024

erfanzar commented Oct 25, 2024

sparsh35 commented Oct 26, 2024

sparsh35 commented Oct 31, 2024

erfanzar commented Oct 31, 2024 • edited Loading

sparsh35 commented Oct 31, 2024

erfanzar commented Oct 31, 2024

erfanzar commented Oct 31, 2024

sparsh35 commented Nov 1, 2024

erfanzar commented Oct 31, 2024 •

edited

Loading