Complete instruction on how to run python script on running Meta-Llama-3-8B-Instruct with Intel Xeon 4th Gen and Newer with IPEX
Steps to run this demo:
- Clone this project into your local directory:
git clone https://github.com/allenwsh82/LLM_ChatBot.git
- Create a new virtual environment inside the project which you just clone:
python -m venv llm_chatbot_env
- Activate the virtual environment which you just created:
source llm_chatbot_env/bin/activate
- Install the dependencies by running:
pip install -r requirements.txt
- If you go to the inference_llama3_8b_bf16_ipex.py script, you will notice where two lines of code are added to enable AMX AI Accelerator to boost up performance:
########################################################################################################### #Use IPEX
import intel_extension_for_pytorch as ipex
model = ipex.optimize(model.eval(), dtype=torch.bfloat16, inplace=True, level="O1", auto_kernel_selection=True)
###########################################################################################################
- Now you have setup everything and you can run the script:
python inference_llama3_8b_bf16_ipex.py
Inference example snapshot:
- If you want to further optimize the inference performance, just quatize the model to INT4 easily with Intel ipex-llm.
python inference_llama3_8b_INT4_IPEX_LLM.py
Inference example snapshot: