Support for Llama-3.1 (8b) - inference #80

Bihan · 2024-08-06T09:19:41Z

Fine tuning example works with Llama-3.1 (8b) with Transformers version 4.43.3 and modification of --rope-scaling in model's config.json.

Below is the error log without rope-scaling modification

Traceback (most recent call last): File "/root/optimum-tpu/examples/custom/train.py", line 138, in <module> model, tokenizer = create_and_prepare_model(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/examples/custom/train.py", line 72, in create_and_prepare_model model = AutoModelForCausalLM.from_pretrained(args.model_name, use_cache=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling.py", line 64, in from_pretrained model = cls.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/workflow/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained model = cls(config, *model_args, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 1180, in __init__ self.model = LlamaModel(config, rank, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in __init__ [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in <listcomp> [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 746, in __init__ self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 333, in __init__ self._init_rope() File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 343, in _init_rope scaling_type = self.config.rope_scaling["type"] ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ KeyError: 'type'

I would really appreciate your suggestions/plans on the following:

How can we handle --rope_scaling issue properly?
Will there be an upgrade of Transformers version to support Llama-3.1 (8b) in near future?

The text was updated successfully, but these errors were encountered:

tengomucho · 2024-08-08T07:35:15Z

Hi @Bihan, we do not support Llama3.1 yet, but we will definitely work on supporting it soon.

Bihan · 2024-08-08T07:59:27Z

Hi @Bihan, we do not support Llama3.1 yet, but we will definitely work on supporting it soon.

@tengomucho That is good news. I found fine-tuning works with --rope_scaling adjustment and bumping Transformers. Currently I am working on the Llama3.1 Inference with optimum-tpu. I will update my progress. With updated Transformers text_generation_launcher issues AttributeErrors like following

AttributeError: 'GenerationConfig' object has no attribute '_eos_token_tensor'
Error: No such option: --otlp-service-name rank=0

Looking eagerly for an update.
Thank you

tengomucho · 2024-08-08T08:25:17Z

What is thee kind of support you are trying to achieve @Bihan, is that fine-tuning or inference/generation?

Bihan · 2024-08-08T10:32:31Z

What is thee kind of support you are trying to achieve @Bihan, is that fine-tuning or inference/generation?

@tengomucho Inference/generation

tengomucho · 2024-08-08T11:24:30Z

In that case, I would suggest you to start with this example instead. It should be much simpler.

Bihan · 2024-08-09T06:56:01Z

@tengomucho Thank you for your support I was able to serve Llama-3.1 (8b) inference by setting default value of key type of rope_scaling to dynamic.

tengomucho · 2024-08-12T08:12:37Z

Contributed implementation: #85

Bihan changed the title ~~Fine Tuning Llama-3.1 (8b) on TPU-v5e~~ Support for Llama-3.1 (8b) Aug 8, 2024

tengomucho changed the title ~~Support for Llama-3.1 (8b)~~ Support for Llama-3.1 (8b) - inference Aug 8, 2024

artus-LYTiQ linked a pull request Oct 20, 2024 that will close this issue

Llama 3.2 1B Instruct on TPU v4, bumping transformers to 4.45.2 #109

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Llama-3.1 (8b) - inference #80

Support for Llama-3.1 (8b) - inference #80

Bihan commented Aug 6, 2024

tengomucho commented Aug 8, 2024

Bihan commented Aug 8, 2024

tengomucho commented Aug 8, 2024

Bihan commented Aug 8, 2024

tengomucho commented Aug 8, 2024

Bihan commented Aug 9, 2024

tengomucho commented Aug 12, 2024

Support for Llama-3.1 (8b) - inference #80

Support for Llama-3.1 (8b) - inference #80

Comments

Bihan commented Aug 6, 2024

tengomucho commented Aug 8, 2024

Bihan commented Aug 8, 2024

tengomucho commented Aug 8, 2024

Bihan commented Aug 8, 2024

tengomucho commented Aug 8, 2024

Bihan commented Aug 9, 2024

tengomucho commented Aug 12, 2024