Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SERVE][CPP][Android] add native executable program to benchmark models #2987

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pfk-beta
Copy link

@pfk-beta pfk-beta commented Oct 18, 2024

Hello,

I have modified and crafted some code to run LLM in adb shell or linux shell via MLC-LLM (btw. great appreciate to authors and contributors) as a binary executable program.

I'm not an expert in C++, so the code isn't perfect(actually it is tinkered and glued outputs of ChatGPT, Claude and my dog), but I think it's easy to read, understand and run.

How to setup:
0. setup MLC-LLM and virtualenv (install dependencies, TVM, etc. etc.)

  1. Create build directory, e.g. build-aarch64-opencl. Run all following commands from this dir.
  2. Run cmake from that dir:
cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_TOOLCHAIN_FILE=/home/piotr/android/sdk/ndk/26.1.10909125/build/cmake/android.toolchain.cmake \
  -DCMAKE_INSTALL_PREFIX=. \
  -DCMAKE_CXX_FLAGS="-O3" \
  -DANDROID_ABI=arm64-v8a \
  -DANDROID_NATIVE_API_LEVEL=android-31 \
  -DANDROID_PLATFORM=android-31 \
  -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON \
  -DANDROID_STL=c++_static \
  -DUSE_HEXAGON_SDK=OFF \
  -DMLC_LLM_INSTALL_STATIC_LIB=ON \
  -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON \
  -DUSE_OPENCL=ON \
  -DUSE_OPENCL_ENABLE_HOST_PTR=ON \
  -DUSE_CUSTOM_LOGGING=OFF \
  ..
  1. Build: make -j 8. Now you should have libmlc_llm_module.so, tvm/libtvm.so and llm_benchmark.
  2. Do normal steps which you do for building mlc llm model, e.g. (I have used GPT-2, because its unmodified version match RAM of my phone. You may need to modify paths.):
mlc_llm convert_weight \
  --quantization q0f16 \
  -o ./gpt2-medium-q0f16 \
  gpt2-medium/

mlc_llm gen_config \
  --quantization q0f16 \
  --max-batch-size 1 \
  --conv-template gpt2 \
  -o ./gpt2-medium-q0f16 \
  gpt2-medium/
  1. Build model library (I was getting error about missing compiler without $CC; I think android version don't matter that much; You may want to somehow modify predefined device - this is story for new books):
CC=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang++ \
TVM_NDK_CC=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang++ \
mlc_llm compile ./gpt2-medium-q0f16/mlc-chat-config.json \
    --device android:adreno-so \
    --host aarch64-linux-android \
    -o gpt2-medium-q0f16-opencl-aarch64.so
  1. Upload files to your phone:
adb shell mkdir -p /data/local/tmp/mlc/gpt2-medium-aarch64-opencl/
adb push llm_benchmark libmlc_llm_module.so tvm/libtvm.so gpt2-medium-q0f16-opencl-aarch64.so /data/local/tmp/mlc/gpt2-medium-aarch64-opencl/
adb push gpt2-medium-q0f16 /data/local/tmp/mlc/gpt2-medium-aarch64-opencl/
  1. Let's rock. Inside adb shell run following commands:
cd /data/local/tmp/mlc/gpt2-medium-aarch64-opencl/
LD_LIBRARY_PATH=. ./llm_benchmark \
 ./gpt2-medium-q0f16 \
 ./gpt2-medium-q0f16-opencl-aarch64.so \
 "local" 4 60 250 \
 "Give me short answer who you are?" 3
  1. Arguments: 1st - folder with weights; 2nd - file with model library; 3rd - execution mode (server or interactive as alternatives); 4th 4 - means OpenCL (alternatives described in sourcecode); 5th - timeout in seconds of executation; 6th - max tokens; 7th - prompt; 8th - number of executions (in case of 1, it will print generated text).
  2. If you would like to run on local computer, you should remove in cmake directives for cross compilation. And modify elements to suits your setup.
  3. I'm afraid this cannot be merged, because it modifies some important files, like openai_format.cc.
  4. Have a nice weekend :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant