LLM_ChatBot

Complete instruction on how to run python script on running Meta-Llama-3-8B-Instruct with Intel Xeon 4th Gen and Newer with IPEX

Steps to run this demo:

   git clone https://github.com/allenwsh82/LLM_ChatBot.git

   python -m venv llm_chatbot_env

   source llm_chatbot_env/bin/activate

pip install -r requirements.txt

If you go to the inference_llama3_8b_bf16_ipex.py script, you will notice where two lines of code are added to enable AMX AI Accelerator to boost up performance:

########################################################################################################### #Use IPEX

import intel_extension_for_pytorch as ipex

model = ipex.optimize(model.eval(), dtype=torch.bfloat16, inplace=True, level="O1", auto_kernel_selection=True)

###########################################################################################################

   python inference_llama3_8b_bf16_ipex.py

Inference example snapshot:

If you want to further optimize the inference performance, just quatize the model to INT4 easily with Intel ipex-llm.

   python inference_llama3_8b_INT4_IPEX_LLM.py

Inference example snapshot:

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
demo.sh		demo.sh
inference_llama3_8b_INT4_IPEX_LLM.py		inference_llama3_8b_INT4_IPEX_LLM.py
inference_llama3_8b_bf16_ipex.py		inference_llama3_8b_bf16_ipex.py
requirements.txt		requirements.txt

Provide feedback