Remove TGI folder from Optimum Habana (#597)

huggingface · Dec 13, 2023 · ddc6498 · ddc6498
1 parent 54971f0
commit ddc6498
Show file tree

Hide file tree

Showing 19 changed files with 1 addition and 4,053 deletions.
diff --git a/text-generation-inference/Dockerfile b/text-generation-inference/Dockerfile
diff --git a/text-generation-inference/Makefile b/text-generation-inference/Makefile
diff --git a/text-generation-inference/README.md b/text-generation-inference/README.md
@@ -16,68 +16,4 @@ limitations under the License.
 
 # Text Generation Inference on Habana Gaudi
 
-To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2, follow these steps:
-
-1. Build the Docker image located in this folder with:
-   ```bash
-   docker build -t tgi_gaudi .
-   ```
-2. Launch a local server instance on 1 Gaudi card:
-   ```bash
-   model=meta-llama/Llama-2-7b-hf
-   volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
-
-   docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model
-   ```
-3. Launch a local server instance on 8 Gaudi cards:
-   ```bash
-   model=meta-llama/Llama-2-70b-hf
-   volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
-
-   docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 8
-   ```
-4. You can then send a request:
-   ```bash
-   curl 127.0.0.1:8080/generate \
-     -X POST \
-     -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-     -H 'Content-Type: application/json'
-   ```
-   > The first call will be slower as the model is compiled.
-5. To run benchmark test, please refer [TGI's benchmark tool](https://github.com/huggingface/text-generation-inference/tree/main/benchmark).
-
-   To run it on the same machine, you can do the following:
-   * `docker exec -it <docker name> bash` , pick the docker started from step 3 or 4 using docker ps
-   * `text-generation-benchmark -t <model-id>` , pass the model-id from docker run command
-   * after the completion of tests, hit ctrl+c to see the performance data summary.
-   
-> For gated models such as [StarCoder](https://huggingface.co/bigcode/starcoder), you will have to pass `-e HUGGING_FACE_HUB_TOKEN=<token>` to the `docker run` command above with a valid Hugging Face Hub read token.
-
-For more information and documentation about Text Generation Inference, checkout [the README](https://github.com/huggingface/text-generation-inference#text-generation-inference) of the original repo.
-
-Not all features of TGI are currently supported as this is still a work in progress.
-
-New changes are added for the current release:
-- Sharded feature with support for DeepSpeed-inference auto tensor parallism. Also use HPU graph for performance improvement.
-- Torch profile. 
-
-
-Enviroment Variables Added:
-
-<div align="center">
-
-| Name                  | Value(s)       | Default     | Description                       | Usage                                          |
-|------------------     |:---------------|:------------|:--------------------              |:---------------------------------
-|  MAX_TOTAL_TOKENS     | integer        | 0           | Control the padding of input          | add -e in docker run, such         |
-|  ENABLE_HPU_GRAPH     | true/false     | true        | Enable hpu graph or not                                                      |  add -e in docker run command  |
-|  PROF_WARMUPSTEP      | integer        | 0           | Enable/disable profile, control profile warmup step, 0 means disable profile |  add -e in docker run command  |
-|  PROF_STEP            | interger       | 5           | Control profile step                                                         |  add -e in docker run command  |
-|  PROF_PATH            | string         | /root/text-generation-inference                                   | Define profile folder  | add -e in docker run command  |
-| LIMIT_HPU_GRAPH       | True/False     | False       | Skip HPU graph usage for prefill to save memory | add -e in docker run command |
-
-</div>
-
-
-> The license to use TGI on Habana Gaudi is the one of TGI: https://github.com/huggingface/text-generation-inference/blob/main/LICENSE
->
-> Please reach out to [email protected] if you have any question.
+Please refer to the following fork of TGI for deploying it on Habana Gaudi: https://github.com/huggingface/tgi-gaudi