Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Llama-3.1 (8b) - inference #80

Open
Bihan opened this issue Aug 6, 2024 · 7 comments · May be fixed by #109
Open

Support for Llama-3.1 (8b) - inference #80

Bihan opened this issue Aug 6, 2024 · 7 comments · May be fixed by #109

Comments

@Bihan
Copy link

Bihan commented Aug 6, 2024

Fine tuning example works with Llama-3.1 (8b) with Transformers version 4.43.3 and modification of --rope-scaling in model's config.json.

Below is the error log without rope-scaling modification

Traceback (most recent call last): File "/root/optimum-tpu/examples/custom/train.py", line 138, in <module> model, tokenizer = create_and_prepare_model(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/examples/custom/train.py", line 72, in create_and_prepare_model model = AutoModelForCausalLM.from_pretrained(args.model_name, use_cache=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling.py", line 64, in from_pretrained model = cls.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/workflow/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained model = cls(config, *model_args, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 1180, in __init__ self.model = LlamaModel(config, rank, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in __init__ [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in <listcomp> [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 746, in __init__ self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 333, in __init__ self._init_rope() File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 343, in _init_rope scaling_type = self.config.rope_scaling["type"] ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ KeyError: 'type'

I would really appreciate your suggestions/plans on the following:

  1. How can we handle --rope_scaling issue properly?
  2. Will there be an upgrade of Transformers version to support Llama-3.1 (8b) in near future?
@tengomucho
Copy link
Collaborator

Hi @Bihan, we do not support Llama3.1 yet, but we will definitely work on supporting it soon.

@Bihan
Copy link
Author

Bihan commented Aug 8, 2024

Hi @Bihan, we do not support Llama3.1 yet, but we will definitely work on supporting it soon.

@tengomucho That is good news. I found fine-tuning works with --rope_scaling adjustment and bumping Transformers. Currently I am working on the Llama3.1 Inference with optimum-tpu. I will update my progress. With updated Transformers text_generation_launcher issues AttributeErrors like following

AttributeError: 'GenerationConfig' object has no attribute '_eos_token_tensor'
Error: No such option: --otlp-service-name rank=0

Looking eagerly for an update.
Thank you

@Bihan Bihan changed the title Fine Tuning Llama-3.1 (8b) on TPU-v5e Support for Llama-3.1 (8b) Aug 8, 2024
@tengomucho
Copy link
Collaborator

What is thee kind of support you are trying to achieve @Bihan, is that fine-tuning or inference/generation?

@Bihan
Copy link
Author

Bihan commented Aug 8, 2024

What is thee kind of support you are trying to achieve @Bihan, is that fine-tuning or inference/generation?

@tengomucho Inference/generation

@tengomucho tengomucho changed the title Support for Llama-3.1 (8b) Support for Llama-3.1 (8b) - inference Aug 8, 2024
@tengomucho
Copy link
Collaborator

In that case, I would suggest you to start with this example instead. It should be much simpler.

@Bihan
Copy link
Author

Bihan commented Aug 9, 2024

@tengomucho Thank you for your support I was able to serve Llama-3.1 (8b) inference by setting default value of key type of rope_scaling to dynamic.

@tengomucho
Copy link
Collaborator

Contributed implementation: #85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants