New sample notebook: model downloading with the CLI (arcee-ai#70)

juliensimon · Jul 25, 2024 · 8c8b27a · 8c8b27a
1 parent 083ed0f
commit 8c8b27a
Showing 1 changed file with 217 additions and 0 deletions.
diff --git a/notebooks/model_cli.ipynb b/notebooks/model_cli.ipynb
@@ -0,0 +1,217 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Downloading models with the Arcee Command Line Interface\n",
+    "\n",
+    "In this notebook, you will learn how to download model weights with the Arcee Command Line Interface (CLI).\n",
+    "\n",
+    "The Arcee documentation is available at [docs.arcee.ai](https://docs.arcee.ai/deployment/start-deployment)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Please [sign up](https://app.arcee.ai/account/signup) to Arcee Cloud and create an [API key](https://docs.arcee.ai/getting-arcee-api-key/getting-arcee-api-key).\n",
+    "\n",
+    "Remember to keep this key safe, and **DON'T COMMIT IT to one of your repositories**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create a new Python environment (optional but recommended) and install [arcee-python](https://github.com/arcee-ai/arcee-python)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Uncomment the next three lines to create a virtual environment\n",
+    "#!pip install -q virtualenv\n",
+    "#!virtualenv -q arcee-cloud\n",
+    "#!source arcee-cloud/bin/activate\n",
+    "\n",
+    "%pip install -q arcee-py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can now use the `arcee` command-line interface (CLI) tool."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "arcee "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Storing our API key\n",
+    "\n",
+    "The first step is to configure the CLI and provide your API key."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```bash\n",
+    "$ arcee configure\n",
+    "Current API URL: https://app.arcee.ai/api\n",
+    "API key: not in config (file or env)\n",
+    "\n",
+    "Enter your Arcee API key 🔒\n",
+    "Hit enter to leave it as is.\n",
+    "See https://docs.arcee.ai/getting-arcee-api-key/getting-arcee-api-key for more details.\n",
+    "You can also pass this at runtime with the ARCEE_API_KEY environment variable.\n",
+    ": [MY_API_KEY]\n",
+    "Setting API key\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The key is now stored locally in a configuration file named `config.json`. The default location is platform-dependent, and you print the path by running the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import typer\n",
+    "import pprint\n",
+    "\n",
+    "pprint.pprint(typer.get_app_dir(\"arcee\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If this path doesn't work for you, you can move the configuration file you just created to another location and set its new location with the `ARCEE_CONFIG_LOCATION` environment variable, e.g.:\n",
+    "\n",
+    "```bash\n",
+    "mv \"/Users/julien/Library/Application Support/arcee\" ~\n",
+    "export ARCEE_CONFIG_LOCATION=/Users/julien/arcee\"\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once you've configured the CLI, you can quickly check that it's working by printing your default Arcee organization:\n",
+    "\n",
+    "```bash\n",
+    "$ arcee org\n",
+    "Current org: juliens-test-organization\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Downloading model weights\n",
+    "\n",
+    "The CLI allows you to download model weight for models hosted in Arcee. We just need to pass the type of model (continuous pretrained, merged, or aligned) and the model name.\n",
+    "\n",
+    "```bash\n",
+    "$ arcee {cpt, merging, sft} download --name [MODEL_NAME]\n",
+    "```\n",
+    "\n",
+    "For example, we can download the weights of the model we aligned in the model alignment notebook:\n",
+    "\n",
+    "```bash\n",
+    "$ arcee sft download --name llama-3-8B-reasoning-share-gpt\n",
+    "Downloading alignment model weights for llama-3-8B-reasoning-share-gpt to /Users/julien/llama-3-8B-reasoning-share-gpt.tar.gz\n",
+    "Downloading llama-3-8B-reasoning-share-gpt weights... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:27:32 0.0/12.7 GB 0:00:14 7.7 MB/s\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once the model weights have been downloaded, we can extract them locally. You can use `gzip` or `pigz` (faster option) for decompression\n",
+    "\n",
+    "```\n",
+    "$ mkdir my_llama3\n",
+    "$ pigz -dc pigz -dc llama-3-8B-reasoning-share-gpt.tar.gz | tar xvf - -C my_llama3\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Loading the model with the transformers library\n",
+    "\n",
+    "Finally, we can load the model with the Hugging Face transformers library.\n",
+    "\n",
+    "```python\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
+    "\n",
+    "model_dir=\"llama3\"\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_dir)\n",
+    "model = AutoModelForCausalLM.from_pretrained(model_dir)\n",
+    "```\n",
+    "```bash\n",
+    "Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:19<00:00,  5.00s/it]\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This concludes the CLI demonstration. Thank you for your time!\n",
+    "\n",
+    "If you'd like to know more about using Arcee Cloud in your organization, please visit the [Arcee website](https://www.arcee.ai), or contact [[email protected]](mailto:[email protected]).\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}