diff --git a/examples/language-modeling/llama_tuning.ipynb b/examples/language-modeling/llama_tuning.ipynb index 871b8f17..3219b7a0 100644 --- a/examples/language-modeling/llama_tuning.ipynb +++ b/examples/language-modeling/llama_tuning.ipynb @@ -1,653 +1,658 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "c7227736-2685-4971-9402-d6015b5319b5", - "metadata": {}, - "source": [ - "# Fine-Tune Llama on Google TPU\n", - "\n", - "Training Large Language Models (LLMs) on Google Tensor Processing Units (TPUs) with Single Program Multiple Data (SPMD) offers a multitude of benefits. TPUs provide competitive processing power, enabling good training times and allowing researchers to experiment with larger models and datasets efficiently. SPMD architecture optimizes resource utilization by distributing tasks across multiple TPUs, enhancing parallelism and scalability.\n", - "The easiest approach to tune a model with SPMD is using Fully Sharded Data Parallel [(FSDP)](https://engineering.fb.com/2021/07/15/open-source/fsdp/). Pytorch/XLA most recent and performant implementation is [FSDP v2](https://github.com/pytorch/xla/blob/master/docs/fsdpv2.md), that allows to shard weights, activations and outputs.\n", - "\n", - "\n", - "This example shows to tune one of Meta's Llama models on single host TPUs. For information on TPUs architecture, you can consult the [documentation](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm).\n", - "\n", - "\n", - "### Prerequisites\n", - "\n", - "We consider you have already created a single-host TPU VM, such as a `v5litepod8` setup, and you have ssh access to the machine.\n", - "You need to clone `optimum-tpu` and install few modules:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d4f15b21", - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [], - "source": [ - "git clone https://github.com/huggingface/optimum-tpu.git\n", - "# Install Optimum TPU\n", - "pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html\n", - "# Install TRL and PEFT for training (see later how they are used)\n", - "pip install trl peft\n", - "# Install Jupyter notebook\n", - "pip install -U jupyterlab notebook\n", - "# Optionally, install widgets extensions for better rendering\n", - "pip install ipywidgets widgetsnbextension\n", - "# This will be necessary for the language modeling example\n", - "pip install datasets evaluate accelerate\n", - "# Change directory and launch Jupyter notebook\n", - "cd optimum-tpu/examples/language-modeling\n", - "jupyter notebook --port 8888" - ] - }, - { - "cell_type": "markdown", - "id": "6a4cc927", - "metadata": {}, - "source": [ - "We should then see the familiar Jupyter output that shows the address accessible from a browser:\n", - "\n", - "```\n", - "http://localhost:8888/tree?token=3ceb24619d0a2f99acf5fba41c51b475b1ddce7cadb2a133\n", - "```\n", - "\n", - "Since we are going to use the gated `llama` model, we will need to log in using a [Hugging Face token](https://huggingface.co/settings/tokens):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d273dead", - "metadata": {}, - "outputs": [], - "source": [ - "!huggingface-cli login --token YOUR_HF_TOKEN" - ] - }, - { - "cell_type": "markdown", - "id": "75a0bd6e", - "metadata": {}, - "source": [ - "### Enable FSDPv2\n", - "\n", - "To fine-tune an LLM, it might be necessary to shard the model across the TPUs to prevent memory issues and enhance tuning performances. Fully Sharded Data Parallel is an algorithm that has been implemented on Pytorch and that allows to wrap modules to distribute them.\n", - "When using Pytorch/XLA on TPUs, [FSDPv2](https://pytorch.org/xla/master/#fully-sharded-data-parallel-via-spmd) is an utility that re-expresses the famous FSDP algorithm using SPMD (Single Program Multiple Data). In `optimum-tpu` it is possible to use dedicated helpers to use FSPDv2. To enable it, you can use the dedicated function, that should be called at the beginning of the execution:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "6d3c7bc2", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.\n" - ] - } - ], - "source": [ - "from optimum.tpu import fsdp_v2\n", - "fsdp_v2.use_fsdp_v2()" - ] - }, - { - "cell_type": "markdown", - "id": "733118c2", - "metadata": {}, - "source": [ - "Then, the tokenizer and model need to be loaded. We will choose [`meta-llama/Llama-3.2-1B`](https://huggingface.co/meta-llama/Llama-3.2-1B) for this example." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "d07e0b43", - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "from transformers import AutoTokenizer, AutoModelForCausalLM\n", - "\n", - "model_id = \"meta-llama/Llama-3.2-1B\"\n", - "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", - "# Add custom token for padding Llama\n", - "tokenizer.add_special_tokens({'pad_token': tokenizer.eos_token})\n", - "model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)" + "cells": [ + { + "cell_type": "markdown", + "id": "c7227736-2685-4971-9402-d6015b5319b5", + "metadata": {}, + "source": [ + "# Fine-Tune Llama on Google TPU\n", + "\n", + "Training Large Language Models (LLMs) on Google Tensor Processing Units (TPUs) with Single Program Multiple Data (SPMD) offers a multitude of benefits. TPUs provide competitive processing power, enabling good training times and allowing researchers to experiment with larger models and datasets efficiently. SPMD architecture optimizes resource utilization by distributing tasks across multiple TPUs, enhancing parallelism and scalability.\n", + "The easiest approach to tune a model with SPMD is using Fully Sharded Data Parallel [(FSDP)](https://engineering.fb.com/2021/07/15/open-source/fsdp/). Pytorch/XLA most recent and performant implementation is [FSDP v2](https://github.com/pytorch/xla/blob/master/docs/fsdpv2.md), that allows to shard weights, activations and outputs.\n", + "\n", + "\n", + "This example shows to tune one of Meta's Llama models on single host TPUs. For information on TPUs architecture, you can consult the [documentation](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm).\n", + "\n", + "\n", + "### Prerequisites\n", + "\n", + "We consider you have already created a single-host TPU VM, such as a `v5litepod8` setup, and you have ssh access to the machine.\n", + "You need to clone `optimum-tpu` and install few modules:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d4f15b21", + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "git clone https://github.com/huggingface/optimum-tpu.git\n", + "# Install Optimum TPU\n", + "pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html\n", + "# Install TRL and PEFT for training (see later how they are used)\n", + "pip install trl peft\n", + "# Install Jupyter notebook\n", + "pip install -U jupyterlab notebook\n", + "# Optionally, install widgets extensions for better rendering\n", + "pip install ipywidgets widgetsnbextension\n", + "# This will be necessary for the language modeling example\n", + "pip install datasets evaluate accelerate\n", + "# Change directory and launch Jupyter notebook\n", + "cd optimum-tpu/examples/language-modeling\n", + "jupyter notebook --port 8888" + ] + }, + { + "cell_type": "markdown", + "id": "6a4cc927", + "metadata": {}, + "source": [ + "We should then see the familiar Jupyter output that shows the address accessible from a browser:\n", + "\n", + "```\n", + "http://localhost:8888/tree?token=3ceb24619d0a2f99acf5fba41c51b475b1ddce7cadb2a133\n", + "```\n", + "\n", + "Since we are going to use the gated `llama` model, we will need to log in using a [Hugging Face token](https://huggingface.co/settings/tokens):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d273dead", + "metadata": {}, + "outputs": [], + "source": [ + "!huggingface-cli login --token YOUR_HF_TOKEN" + ] + }, + { + "cell_type": "markdown", + "id": "75a0bd6e", + "metadata": {}, + "source": [ + "### Enable FSDPv2\n", + "\n", + "To fine-tune an LLM, it might be necessary to shard the model across the TPUs to prevent memory issues and enhance tuning performances. Fully Sharded Data Parallel is an algorithm that has been implemented on Pytorch and that allows to wrap modules to distribute them.\n", + "When using Pytorch/XLA on TPUs, [FSDPv2](https://pytorch.org/xla/master/#fully-sharded-data-parallel-via-spmd) is an utility that re-expresses the famous FSDP algorithm using SPMD (Single Program Multiple Data). In `optimum-tpu` it is possible to use dedicated helpers to use FSPDv2. To enable it, you can use the dedicated function, that should be called at the beginning of the execution:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "6d3c7bc2", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.\n" + ] + } + ], + "source": [ + "from optimum.tpu import fsdp_v2\n", + "\n", + "\n", + "fsdp_v2.use_fsdp_v2()" + ] + }, + { + "cell_type": "markdown", + "id": "733118c2", + "metadata": {}, + "source": [ + "Then, the tokenizer and model need to be loaded. We will choose [`meta-llama/Llama-3.2-1B`](https://huggingface.co/meta-llama/Llama-3.2-1B) for this example." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d07e0b43", + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer\n", + "\n", + "\n", + "model_id = \"meta-llama/Llama-3.2-1B\"\n", + "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", + "# Add custom token for padding Llama\n", + "tokenizer.add_special_tokens({'pad_token': tokenizer.eos_token})\n", + "model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)" + ] + }, + { + "cell_type": "markdown", + "id": "d5576762", + "metadata": {}, + "source": [ + "To tune the model with the [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes) dataset, you can load it and obtain the `quote` column:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5c365bdd", + "metadata": {}, + "outputs": [], + "source": [ + "from datasets import load_dataset\n", + "\n", + "\n", + "data = load_dataset(\"Abirate/english_quotes\")\n", + "\n", + "def preprocess_function(samples):\n", + " # Add a simple prompt format to the quotes\n", + " prompts = [f\"Generate a quote:\\n\\n{quote}\\n\" for quote in samples[\"quote\"]]\n", + " # Add EOS token to each prompt\n", + " prompts = [p + tokenizer.eos_token for p in prompts]\n", + " return {\"prompt\": prompts}\n", + "\n", + "# data = data.map(lambda samples: tokenizer(samples[\"quote\"]), batched=True)\n", + "data = data.map(\n", + " preprocess_function,\n", + " batched=True,\n", + " remove_columns=data[\"train\"].column_names\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "id": "73174355", + "metadata": {}, + "source": [ + "You then need to specify the FSDP training arguments to enable the sharding feature, the function will deduce the classes that should be sharded:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "2c0c0797", + "metadata": {}, + "outputs": [], + "source": [ + "fsdp_training_args = fsdp_v2.get_fsdp_training_args(model)" + ] + }, + { + "cell_type": "markdown", + "id": "50b6ebe4", + "metadata": {}, + "source": [ + "The `fsdp_training_args` will specify the Pytorch module that needs to be sharded:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "5d55e0aa", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'fsdp': 'full_shard',\n", + " 'fsdp_config': {'transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'],\n", + " 'xla': True,\n", + " 'xla_fsdp_v2': True,\n", + " 'xla_fsdp_grad_ckpt': True}}" ] }, - { - "cell_type": "markdown", - "id": "d5576762", - "metadata": {}, - "source": [ - "To tune the model with the [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes) dataset, you can load it and obtain the `quote` column:\n" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "5c365bdd", - "metadata": {}, - "outputs": [], - "source": [ - "from datasets import load_dataset\n", - "\n", - "data = load_dataset(\"Abirate/english_quotes\")\n", - "\n", - "def preprocess_function(samples):\n", - " # Add a simple prompt format to the quotes\n", - " prompts = [f\"Generate a quote:\\n\\n{quote}\\n\" for quote in samples[\"quote\"]]\n", - " # Add EOS token to each prompt\n", - " prompts = [p + tokenizer.eos_token for p in prompts]\n", - " return {\"prompt\": prompts}\n", + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fsdp_training_args" + ] + }, + { + "cell_type": "markdown", + "id": "ccb4820f", + "metadata": {}, + "source": [ + "Now training can be done as simply as using the standard `Trainer` class:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "e24da8ce", + "metadata": {}, + "outputs": [], + "source": [ + "from peft import LoraConfig\n", + "\n", + "\n", + "lora_config = LoraConfig(\n", + " lora_alpha=128,\n", + " lora_dropout=0.05,\n", + " r=256,\n", + " bias=\"none\",\n", + " target_modules=\"all-linear\",\n", + " task_type=\"CAUSAL_LM\",\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "12da486c", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:root:torch_xla.core.xla_model.xrt_world_size() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.world_size instead.\n", + "WARNING:root:torch_xla.core.xla_model.xla_model.get_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.global_ordinal instead.\n", + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length, packing. Will not be supported from version '0.13.0'.\n", + "\n", + "Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.\n", + " warnings.warn(message, FutureWarning)\n", + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:212: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", + " warnings.warn(\n", + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", + " warnings.warn(\n", + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:328: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", + " warnings.warn(\n", + "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.\n", + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1810: UserWarning: For backward hooks to be called, module output should be a Tensor or a tuple of Tensors but received \n", + " warnings.warn(\"For backward hooks to be called,\"\n", + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch_xla/utils/checkpoint.py:183: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.\n", + " torch.cuda.amp.autocast(**ctx.gpu_autocast_kwargs), \\\n", + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch_xla/utils/checkpoint.py:184: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n", + " torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):\n" + ] + }, + { + "data": { + "text/html": [ "\n", - "# data = data.map(lambda samples: tokenizer(samples[\"quote\"]), batched=True)\n", - "data = data.map(\n", - " preprocess_function, \n", - " batched=True,\n", - " remove_columns=data[\"train\"].column_names\n", - ")\n" - ] - }, - { - "cell_type": "markdown", - "id": "73174355", - "metadata": {}, - "source": [ - "You then need to specify the FSDP training arguments to enable the sharding feature, the function will deduce the classes that should be sharded:\n" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "2c0c0797", - "metadata": {}, - "outputs": [], - "source": [ - "fsdp_training_args = fsdp_v2.get_fsdp_training_args(model)" - ] - }, - { - "cell_type": "markdown", - "id": "50b6ebe4", - "metadata": {}, - "source": [ - "The `fsdp_training_args` will specify the Pytorch module that needs to be sharded:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "5d55e0aa", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'fsdp': 'full_shard',\n", - " 'fsdp_config': {'transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'],\n", - " 'xla': True,\n", - " 'xla_fsdp_v2': True,\n", - " 'xla_fsdp_grad_ckpt': True}}" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } + "
\n", + " \n", + " \n", + " [70/70 04:53, Epoch 10/10]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
13.119800
22.775200
32.634400
42.655500
52.498100
62.382000
72.411500
82.359800
92.407700
102.262300
112.309600
122.274900
132.244200
142.264100
152.127000
162.178300
172.091000
182.075100
192.102500
202.084600
212.144500
221.944800
232.194500
241.914700
251.916700
261.840400
271.815600
281.847800
291.750600
301.890700
311.862300
321.880700
331.835900
341.801500
351.771000
361.863100
371.794900
381.652700
391.810800
401.777700
411.822300
421.715600
431.663500
441.744800
451.719900
461.812600
471.837500
481.746000
491.727600
501.771100
511.805000
521.717100
531.710500
541.698000
551.635800
561.766000
571.760800
581.590400
591.770700
601.572800
611.696600
621.756000
631.710000
641.677900
651.691600
661.598200
671.677800
681.642300
691.707100
701.752300

" ], - "source": [ - "fsdp_training_args" + "text/plain": [ + "" ] }, - { - "cell_type": "markdown", - "id": "ccb4820f", - "metadata": {}, - "source": [ - "Now training can be done as simply as using the standard `Trainer` class:" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "e24da8ce", - "metadata": {}, - "outputs": [], - "source": [ - "from peft import LoraConfig\n", - "\n", - "lora_config = LoraConfig(\n", - " lora_alpha=128,\n", - " lora_dropout=0.05,\n", - " r=256,\n", - " bias=\"none\",\n", - " target_modules=\"all-linear\",\n", - " task_type=\"CAUSAL_LM\", \n", - ")\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "12da486c", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING:root:torch_xla.core.xla_model.xrt_world_size() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.world_size instead.\n", - "WARNING:root:torch_xla.core.xla_model.xla_model.get_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.global_ordinal instead.\n", - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length, packing. Will not be supported from version '0.13.0'.\n", - "\n", - "Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.\n", - " warnings.warn(message, FutureWarning)\n", - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:212: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", - " warnings.warn(\n", - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", - " warnings.warn(\n", - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:328: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", - " warnings.warn(\n", - "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.\n", - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1810: UserWarning: For backward hooks to be called, module output should be a Tensor or a tuple of Tensors but received \n", - " warnings.warn(\"For backward hooks to be called,\"\n", - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch_xla/utils/checkpoint.py:183: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.\n", - " torch.cuda.amp.autocast(**ctx.gpu_autocast_kwargs), \\\n", - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch_xla/utils/checkpoint.py:184: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n", - " torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):\n" - ] - }, - { - "data": { - "text/html": [ - "\n", - "

\n", - " \n", - " \n", - " [70/70 04:53, Epoch 10/10]\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
StepTraining Loss
13.119800
22.775200
32.634400
42.655500
52.498100
62.382000
72.411500
82.359800
92.407700
102.262300
112.309600
122.274900
132.244200
142.264100
152.127000
162.178300
172.091000
182.075100
192.102500
202.084600
212.144500
221.944800
232.194500
241.914700
251.916700
261.840400
271.815600
281.847800
291.750600
301.890700
311.862300
321.880700
331.835900
341.801500
351.771000
361.863100
371.794900
381.652700
391.810800
401.777700
411.822300
421.715600
431.663500
441.744800
451.719900
461.812600
471.837500
481.746000
491.727600
501.771100
511.805000
521.717100
531.710500
541.698000
551.635800
561.766000
571.760800
581.590400
591.770700
601.572800
611.696600
621.756000
631.710000
641.677900
651.691600
661.598200
671.677800
681.642300
691.707100
701.752300

" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch_xla/core/xla_model.py:1457: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\n", - " xldata.append(torch.load(xbio))\n" - ] - }, - { - "data": { - "text/plain": [ - "TrainOutput(global_step=70, training_loss=1.9438111100878035, metrics={'train_runtime': 409.5626, 'train_samples_per_second': 5.469, 'train_steps_per_second': 0.171, 'total_flos': 9745058664284160.0, 'train_loss': 1.9438111100878035, 'epoch': 10.0})" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from transformers import DataCollatorForLanguageModeling, Trainer, TrainingArguments\n", - "from trl import SFTTrainer\n", - "\n", - "trainer = SFTTrainer(\n", - " model=model,\n", - " train_dataset=data[\"train\"],\n", - " args=TrainingArguments(\n", - " per_device_train_batch_size=32,\n", - " num_train_epochs=10,\n", - " max_steps=-1,\n", - " output_dir=\"/tmp/output\",\n", - " optim=\"adafactor\",\n", - " logging_steps=1,\n", - " dataloader_drop_last=True, # Required by FSDP v2\n", - " **fsdp_training_args,\n", - " ),\n", - " peft_config=lora_config,\n", - " dataset_text_field=\"prompt\",\n", - " max_seq_length=512,\n", - " packing=True,\n", - ")\n", - "\n", - "trainer.train()" + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/baptiste_colle/git/optimum-tpu/.venv/lib/python3.10/site-packages/torch_xla/core/xla_model.py:1457: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\n", + " xldata.append(torch.load(xbio))\n" + ] + }, + { + "data": { + "text/plain": [ + "TrainOutput(global_step=70, training_loss=1.9438111100878035, metrics={'train_runtime': 409.5626, 'train_samples_per_second': 5.469, 'train_steps_per_second': 0.171, 'total_flos': 9745058664284160.0, 'train_loss': 1.9438111100878035, 'epoch': 10.0})" ] }, - { - "cell_type": "markdown", - "id": "fc3dd275", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 5 - } - \ No newline at end of file + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from transformers import TrainingArguments\n", + "from trl import SFTTrainer\n", + "\n", + "\n", + "trainer = SFTTrainer(\n", + " model=model,\n", + " train_dataset=data[\"train\"],\n", + " args=TrainingArguments(\n", + " per_device_train_batch_size=32,\n", + " num_train_epochs=10,\n", + " max_steps=-1,\n", + " output_dir=\"/tmp/output\",\n", + " optim=\"adafactor\",\n", + " logging_steps=1,\n", + " dataloader_drop_last=True, # Required by FSDP v2\n", + " **fsdp_training_args,\n", + " ),\n", + " peft_config=lora_config,\n", + " dataset_text_field=\"prompt\",\n", + " max_seq_length=512,\n", + " packing=True,\n", + ")\n", + "\n", + "trainer.train()" + ] + }, + { + "cell_type": "markdown", + "id": "fc3dd275", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/optimum/tpu/version.py b/optimum/tpu/version.py index 6aa2623b..dc6e5f1c 100644 --- a/optimum/tpu/version.py +++ b/optimum/tpu/version.py @@ -15,5 +15,5 @@ from packaging.version import parse -__version__ = "0.2.0" +__version__ = "0.2.3" VERSION = parse(__version__) diff --git a/text-generation-inference/docker/Dockerfile b/text-generation-inference/docker/Dockerfile index ae736299..f0e09d97 100644 --- a/text-generation-inference/docker/Dockerfile +++ b/text-generation-inference/docker/Dockerfile @@ -68,7 +68,7 @@ RUN apt-get update -y \ RUN pip3 --no-cache-dir install --upgrade pip ARG ENABLE_GOOGLE_FEATURE -ARG VERSION='0.2.2' +ARG VERSION='0.2.3' RUN test -n ${VERSION:?} FROM base AS optimum-tpu-installer diff --git a/text-generation-inference/server/text_generation_server/version.py b/text-generation-inference/server/text_generation_server/version.py index 30c8700b..d81884df 100644 --- a/text-generation-inference/server/text_generation_server/version.py +++ b/text-generation-inference/server/text_generation_server/version.py @@ -1,5 +1,5 @@ from pkg_resources import parse_version -__version__ = "0.2.0" +__version__ = "0.2.3" VERSION = parse_version(__version__)