From d130f69320e8a60fcc011dfea8185aa7c2a1985d Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Fri, 20 Oct 2023 10:31:31 -0400 Subject: [PATCH 1/5] Update FAQ.md --- FAQ.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/FAQ.md b/FAQ.md index e1d31afa8..3acaa5e81 100644 --- a/FAQ.md +++ b/FAQ.md @@ -100,3 +100,8 @@ Make sure to run the command as follows A: The issue occurs because of not copying the URL correctly. If you right click on the link and copy the link, the link may be copied with url defence wrapper. To avoid this problem, please select the url manually and copy it + + +**Q: What GPU can I use to run the code in this repository?** + +A: The code in this repository supports only NVIDIA GPUs. From 19c1b3819b2e655a07192e16653ec61c286c60b7 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Fri, 20 Oct 2023 11:52:53 -0400 Subject: [PATCH 2/5] Update README.md Adding GPU and OS requirements --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 5a5a63cfa..f8c69f802 100755 --- a/README.md +++ b/README.md @@ -57,6 +57,7 @@ torchrun --nproc_per_node 1 example_chat_completion.py \ - The `–nproc_per_node` should be set to the [MP](#inference) value for the model you are using. - Adjust the `max_seq_len` and `max_batch_size` parameters as needed. - This example runs the [example_chat_completion.py](example_chat_completion.py) found in this repository but you can change that to a different .py file. +- For best results we suggest you run the code in this repository on a **Linux Operating System with a NVIDIA GPU**. ## Inference From 0c3488da89fcb527a12c5f790da8c27fa09ad294 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Fri, 20 Oct 2023 11:56:44 -0400 Subject: [PATCH 3/5] Update FAQ.md --- FAQ.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FAQ.md b/FAQ.md index 3acaa5e81..99a9bd7b3 100644 --- a/FAQ.md +++ b/FAQ.md @@ -104,4 +104,4 @@ To avoid this problem, please select the url manually and copy it **Q: What GPU can I use to run the code in this repository?** -A: The code in this repository supports only NVIDIA GPUs. +A: The code in this repository only supports NVIDIA GPUs. From 33e88b0ce6d361a8c5e1c8365ab2f16918048bb2 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Mon, 23 Oct 2023 19:06:12 -0400 Subject: [PATCH 4/5] Update FAQ.md --- FAQ.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FAQ.md b/FAQ.md index 99a9bd7b3..ce456017d 100644 --- a/FAQ.md +++ b/FAQ.md @@ -104,4 +104,4 @@ To avoid this problem, please select the url manually and copy it **Q: What GPU can I use to run the code in this repository?** -A: The code in this repository only supports NVIDIA GPUs. +A: The code in this repository only supports NVIDIA GPUs. While hardware requirements vary based on latency, throughput and cost constraints, good performance was observed when the models were split across multiple GPUs with tensor parallelism in a machine with NVIDIA A100s or H100s. But other types of GPUs like A10G, T4, L4, or even commodity hardware can also be used to deploy these models (e.g. https://github.com/ggerganov/llama.cpp). From e592f2955c1a36983f929adf0975a5d66d84e631 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Mon, 23 Oct 2023 19:07:37 -0400 Subject: [PATCH 5/5] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f8c69f802..452d281f8 100755 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ torchrun --nproc_per_node 1 example_chat_completion.py \ - The `–nproc_per_node` should be set to the [MP](#inference) value for the model you are using. - Adjust the `max_seq_len` and `max_batch_size` parameters as needed. - This example runs the [example_chat_completion.py](example_chat_completion.py) found in this repository but you can change that to a different .py file. -- For best results we suggest you run the code in this repository on a **Linux Operating System with a NVIDIA GPU**. +- At this time the code in this repository only supports a **Linux Operating System with a NVIDIA GPU**. ## Inference