[HPU] llm-full-mp-gpus #299

Delaunay · 2024-10-04T16:41:04Z

Pytorch version too old for fused optimizer

llm-full-mp-gpus.0 [stderr] [rank0]: Traceback (most recent call last):
llm-full-mp-gpus.0 [stderr] [rank0]:   File "/homes/delaunap/milabench/benchmarks/llm/recipes/full_finetune_distributed.py", line 645, in <module>
llm-full-mp-gpus.0 [stderr] [rank0]:     sys.exit(recipe_main())
llm-full-mp-gpus.0 [stderr] [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torchtune/config/_parse.py", line 50, in wrapper
llm-full-mp-gpus.0 [stderr] [rank0]:     sys.exit(recipe_main(conf))
llm-full-mp-gpus.0 [stderr] [rank0]:   File "/homes/delaunap/milabench/benchmarks/llm/recipes/full_finetune_distributed.py", line 632, in recipe_main
llm-full-mp-gpus.0 [stderr] [rank0]:     recipe = FullFinetuneRecipeDistributed(cfg=cfg)
llm-full-mp-gpus.0 [stderr] [rank0]:   File "/homes/delaunap/milabench/benchmarks/llm/recipes/full_finetune_distributed.py", line 116, in __init__
llm-full-mp-gpus.0 [stderr] [rank0]:     raise RuntimeError(
llm-full-mp-gpus.0 [stderr] [rank0]: RuntimeError: Using fused optimizer on CPU is only supported in PyTorch nightly.

The text was updated successfully, but these errors were encountered:

Delaunay · 2024-10-04T16:46:11Z

--- a/benchmarks/llm/configs/llama3_70B_full.yaml
+++ b/benchmarks/llm/configs/llama3_70B_full.yaml
@@ -94,9 +94,9 @@ gradient_accumulation_steps: 1
 device: cuda
 
 # Memory management
-enable_activation_checkpointing: True
-memory_efficient_fsdp_wrap: True
-fsdp_cpu_offload: True
+enable_activation_checkpointing: false
+memory_efficient_fsdp_wrap: false
+fsdp_cpu_offload: false
 
 # Reduced precision
 dtype: bf16

	* 1 x [rank0]: ValueError: Inconsistent compute device and `device_id` on rank 0: hpu:0 vs hpu
    	| [rank0]: Traceback (most recent call last):
    	| [rank0]:   File "/homes/delaunap/milabench/benchmarks/llm/recipes/full_finetune_distributed.py", line 645, in <module>
    	| [rank0]: 	sys.exit(recipe_main())
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torchtune/config/_parse.py", line 50, in wrapper
    	| [rank0]: 	sys.exit(recipe_main(conf))
    	| [rank0]:   File "/homes/delaunap/milabench/benchmarks/llm/recipes/full_finetune_distributed.py", line 633, in recipe_main
    	| [rank0]: 	recipe.setup(cfg=cfg)
    	| [rank0]:   File "/homes/delaunap/milabench/benchmarks/llm/recipes/full_finetune_distributed.py", line 213, in setup
    	| [rank0]: 	self._model = self._setup_model(
    	| [rank0]:   File "/homes/delaunap/milabench/benchmarks/llm/recipes/full_finetune_distributed.py", line 323, in _setup_model
    	| [rank0]: 	model = FSDP(
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/habana_frameworks/torch/gpu_migration/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 102, in __init__
    	| [rank0]: 	return FullyShardedDataParallel.call_parent_func(
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/habana_frameworks/torch/gpu_migration/core/register.py", line 158, in call_parent_func
    	| [rank0]: 	return func(*args, **kwargs)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 485, in __init__
    	| [rank0]: 	_auto_wrap(
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py", line 72, in _auto_wrap
    	| [rank0]: 	_post_order_apply(root_module, wrap_fn)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 79, in _post_order_apply
    	| [rank0]: 	_post_order_apply_inner(root_module, "", None)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 63, in _post_order_apply_inner
    	| [rank0]: 	_post_order_apply_inner(child_module, child_module_name, module)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 63, in _post_order_apply_inner
    	| [rank0]: 	_post_order_apply_inner(child_module, child_module_name, module)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 64, in _post_order_apply_inner
    	| [rank0]: 	optional_module = fn(module)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 98, in fn
    	| [rank0]: 	return fsdp_fn(module, **kwargs)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/habana_frameworks/torch/gpu_migration/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 102, in __init__
    	| [rank0]: 	return FullyShardedDataParallel.call_parent_func(
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/habana_frameworks/torch/gpu_migration/core/register.py", line 158, in call_parent_func
    	| [rank0]: 	return func(*args, **kwargs)
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 511, in __init__
    	| [rank0]: 	_init_param_handle_from_module(
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 582, in _init_param_handle_from_module
    	| [rank0]: 	state.compute_device = _get_compute_device(
    	| [rank0]:   File "/homes/delaunap/hpu/results/venv/torch/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 1045, in _get_compute_device
    	| [rank0]: 	raise ValueError(
    	| [rank0]: ValueError: Inconsistent compute device and `device_id` on rank 0: hpu:0 vs hpu

Delaunay · 2024-10-17T17:35:25Z

Dies after running 15/30 (50% of the bench)


llm-full-mp-gpus.0 [data] {'cpudata': {'load': 6.8, 'memory': [217340166144, 1081801142272]}, 'task': 'main', 'time':
1729186143.3196068}
llm-full-mp-gpus.0 [stderr] Synapse detected a device critical error that requires a restart. Killing process in 5 seconds
(hl: 4) 10:28:55 [No progress error]
llm-full-mp-gpus.0 [stderr] W1017 10:29:15.898000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:851]
Sending process 3687945 closing signal SIGTERM
llm-full-mp-gpus.0 [stderr] W1017 10:29:15.899000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:851]
Sending process 3687946 closing signal SIGTERM
llm-full-mp-gpus.0 [stderr] W1017 10:29:15.899000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:851]
Sending process 3687947 closing signal SIGTERM
llm-full-mp-gpus.0 [stderr] W1017 10:29:15.899000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:851]
Sending process 3687948 closing signal SIGTERM
llm-full-mp-gpus.0 [stderr] W1017 10:29:15.899000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:851]
Sending process 3687949 closing signal SIGTERM
llm-full-mp-gpus.0 [stderr] W1017 10:29:15.899000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:851]
Sending process 3687951 closing signal SIGTERM
llm-full-mp-gpus.0 [stderr] W1017 10:29:15.899000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:851]
Sending process 3687952 closing signal SIGTERM
llm-full-mp-gpus.0 [stderr] W1017 10:29:45.899000 140525722707968 torch/distributed/elastic/multiprocessing/api.py:868] Unable to
shutdown process 3687948 via Signals.SIGTERM, forcefully exiting via Signals.SIGKILL

Delaunay added the HPU label Oct 4, 2024

Delaunay changed the title ~~llm-full-mp-gpus on HPU~~ [HPU] llm-full-mp-gpus Oct 15, 2024

Delaunay added the torchtune label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HPU] llm-full-mp-gpus #299

[HPU] llm-full-mp-gpus #299

Delaunay commented Oct 4, 2024 •

edited

Loading

Delaunay commented Oct 4, 2024 •

edited

Loading

Delaunay commented Oct 17, 2024

[HPU] llm-full-mp-gpus #299

[HPU] llm-full-mp-gpus #299

Comments

Delaunay commented Oct 4, 2024 • edited Loading

Delaunay commented Oct 4, 2024 • edited Loading

Delaunay commented Oct 17, 2024

Delaunay commented Oct 4, 2024 •

edited

Loading

Delaunay commented Oct 4, 2024 •

edited

Loading