Skip to content

Latest commit

 

History

History
60 lines (38 loc) · 1.79 KB

README.md

File metadata and controls

60 lines (38 loc) · 1.79 KB

LLM_ChatBot

llama3-8b

Complete instruction on how to run python script on running Meta-Llama-3-8B-Instruct with Intel Xeon 4th Gen and Newer with IPEX

Steps to run this demo:

  1. Clone this project into your local directory:
   git clone https://github.com/allenwsh82/LLM_ChatBot.git
  1. Create a new virtual environment inside the project which you just clone:
   python -m venv llm_chatbot_env
  1. Activate the virtual environment which you just created:
   source llm_chatbot_env/bin/activate
  1. Install the dependencies by running:
pip install -r requirements.txt
  1. If you go to the inference_llama3_8b_bf16_ipex.py script, you will notice where two lines of code are added to enable AMX AI Accelerator to boost up performance:

########################################################################################################### #Use IPEX

import intel_extension_for_pytorch as ipex

model = ipex.optimize(model.eval(), dtype=torch.bfloat16, inplace=True, level="O1", auto_kernel_selection=True)

###########################################################################################################

  1. Now you have setup everything and you can run the script:
   python inference_llama3_8b_bf16_ipex.py

Inference example snapshot:

llama3_8b_inference

  1. If you want to further optimize the inference performance, just quatize the model to INT4 easily with Intel ipex-llm.
   python inference_llama3_8b_INT4_IPEX_LLM.py

Inference example snapshot:

ipex-llm_llama3-8b