-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EAGLE-4773] Nvidia NIM dockerfile #444
Open
luv-bansal
wants to merge
8
commits into
master
Choose a base branch
from
integrating-nim
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
0bb425a
Nvidia NIM dockerfile
luv-bansal 71dd021
Update Dockerfile.nim
luv-bansal 8515a7a
Working NIM image
luv-bansal 4578646
Merge branch 'master' of https://github.com/Clarifai/clarifai-python …
luv-bansal b2c58fc
resolve conflict
luv-bansal 11f7c0a
rupdate nim image
luv-bansal 6948594
NIM integration
luv-bansal dff62f8
NIM integration
luv-bansal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
122 changes: 122 additions & 0 deletions
122
clarifai/runners/dockerfile_template/Dockerfile.nim.template
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# Use an intermediate image to install pip and other dependencies | ||
FROM --platform=$TARGETPLATFORM public.ecr.aws/docker/library/python:${PYTHON_VERSION}-slim-bookworm as deps | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
|
||
RUN python${PYTHON_VERSION} -m venv /venv && \ | ||
/venv/bin/pip install --disable-pip-version-check --upgrade pip setuptools wheel && \ | ||
ln -sf /usr/bin/python${PYTHON_VERSION} /usr/bin/python3 && \ | ||
apt-get clean && rm -rf /var/lib/apt/lists/*; | ||
|
||
# Install the NIM base image | ||
ENV NGC_API_KEY=${NGC_API_KEY} | ||
|
||
# Use the NIM base image as another build stage | ||
FROM --platform=$TARGETPLATFORM ${BASE_IMAGE} as build | ||
|
||
# Final image based on distroless | ||
FROM gcr.io/distroless/python3-debian12:debug | ||
|
||
# virtual env | ||
COPY --from=deps /venv /venv | ||
# we have to overwrite the python3 binary that the distroless image uses | ||
COPY --from=deps /usr/local/bin/python${PYTHON_VERSION} /usr/bin/python3 | ||
COPY --from=deps /usr/local/bin/python${PYTHON_VERSION} /usr/local/bin/python${PYTHON_VERSION} | ||
|
||
# Copy NIM files | ||
COPY --from=build /opt /opt | ||
COPY --from=build /etc/nim /etc/nim | ||
|
||
# Copy necessary binaries and libraries from the NIM base image | ||
COPY --from=build /bin/bash /bin/bash | ||
COPY --from=build /bin/ssh /bin/ssh | ||
COPY --from=build /usr/bin/ln /usr/bin/ln | ||
|
||
# also copy in all the lib files for it. | ||
COPY --from=build /lib /lib | ||
COPY --from=build /lib64 /lib64 | ||
COPY --from=build /usr/lib/ /usr/lib/ | ||
COPY --from=build /usr/local/lib/ /usr/local/lib/ | ||
# ldconfig is needed to update the shared library cache so system libraries (like CUDA) can be found | ||
COPY --from=build /usr/sbin/ldconfig /sbin/ldconfig | ||
COPY --from=build /usr/sbin/ldconfig.real /sbin/ldconfig.real | ||
COPY --from=build /etc/ld.so.conf /etc/ld.so.conf | ||
COPY --from=build /etc/ld.so.cache /etc/ld.so.cache | ||
COPY --from=build /etc/ld.so.conf.d/ /etc/ld.so.conf.d/ | ||
|
||
|
||
# Set environment variables | ||
ENV PYTHONPATH=/venv/lib/python3.10/site-packages:/opt/nim/llm/.venv/lib/python3.10/site-packages:/opt/nim/llm | ||
ENV PATH="/usr/local/bin:/venv/bin:/opt/nim/llm/.venv/bin:/opt/hpcx/ucc/bin:/opt/hpcx/ucx/bin:/opt/hpcx/ompi/bin:$PATH" | ||
|
||
ENV LD_LIBRARY_PATH="/opt/hpcx/ucc/lib/ucc:/opt/hpcx/ucc/lib:/opt/hpcx/ucx/lib/ucx:/opt/hpcx/ucx/lib:/opt/hpcx/ompi/lib:/opt/hpcx/ompi/lib/openmpi:/opt/nim/llm/.venv/lib/python3.10/site-packages/tensorrt_llm/libs:/opt/nim/llm/.venv/lib/python3.10/site-packages/nvidia/cublas/lib:/opt/nim/llm/.venv/lib/python3.10/site-packages/tensorrt_libs:/opt/nim/llm/.venv/lib/python3.10/site-packages/nvidia/nccl/lib:$LD_LIBRARY_PATH" | ||
|
||
ENV LIBRARY_PATH=/opt/hpcx/ucc/lib:/opt/hpcx/ucx/lib:/opt/hpcx/ompi/lib:$LIBRARY_PATH | ||
|
||
ENV CPATH=/opt/hpcx/ompi/include:/opt/hpcx/ucc/include:/opt/hpcx/ucx/include:$CPATH | ||
ENV LLM_PROJECT_DIR=/opt/nim/llm | ||
|
||
# Set environment variables for MPI | ||
ENV OMPI_HOME=/opt/hpcx/ompi | ||
ENV HPCX_MPI_DIR=/opt/hpcx/ompi | ||
ENV MPIf_HOME=/opt/hpcx/ompi | ||
ENV OPAL_PREFIX=/opt/hpcx/ompi | ||
|
||
# Set environment variables for UCC | ||
ENV UCC_DIR=/opt/hpcx/ucc/lib/cmake/ucc | ||
ENV UCC_HOME=/opt/hpcx/ucc | ||
ENV HPCX_UCC_DIR=/opt/hpcx/ucc | ||
ENV USE_UCC=1 | ||
ENV USE_SYSTEM_UCC=1 | ||
|
||
# Set environment variables for HPC-X | ||
ENV HPCX_DIR=/opt/hpcx | ||
ENV HPCX_UCX_DIR=/opt/hpcx/ucx | ||
ENV HPCX_MPI_DIR=/opt/hpcx/ompi | ||
|
||
# Set environment variables for UCX | ||
ENV UCX_DIR=/opt/hpcx/ucx/lib/cmake/ucx | ||
ENV UCX_HOME=/opt/hpcx/ucx | ||
|
||
ENV HOME=/opt/nim/llm | ||
|
||
SHELL ["/bin/bash", "-c"] | ||
|
||
# These will be set by the templaing system. | ||
ENV CLARIFAI_PAT=${CLARIFAI_PAT} | ||
ENV CLARIFAI_USER_ID=${CLARIFAI_USER_ID} | ||
ENV CLARIFAI_RUNNER_ID=${CLARIFAI_RUNNER_ID} | ||
ENV CLARIFAI_NODEPOOL_ID=${CLARIFAI_NODEPOOL_ID} | ||
ENV CLARIFAI_COMPUTE_CLUSTER_ID=${CLARIFAI_COMPUTE_CLUSTER_ID} | ||
ENV CLARIFAI_API_BASE=${CLARIFAI_API_BASE} | ||
|
||
############################# | ||
# User specific requirements | ||
############################# | ||
COPY requirements.txt . | ||
|
||
# Install requirements and clarifai package and cleanup before leaving this line. | ||
# Note(zeiler): this could be in a future template as {{model_python_deps}} | ||
RUN pip install --no-cache-dir -r requirements.txt && \ | ||
pip install --no-cache-dir clarifai | ||
|
||
# Set the NUMBA cache dir to /tmp | ||
ENV NUMBA_CACHE_DIR=/tmp/numba_cache | ||
ENV LOCAL_NIM_CACHE=/tmp/nim_cache | ||
|
||
|
||
# Set the working directory to /app | ||
WORKDIR /app | ||
|
||
# Copy the current folder into /app/model_dir that the SDK will expect. | ||
# Note(zeiler): would be nice to exclude checkpoints in case they were pre-downloaded. | ||
COPY . /app/model_dir/${name} | ||
|
||
# Add the model directory to the python path. | ||
ENV PYTHONPATH=${PYTHONPATH}:/app/model_dir/${name} | ||
|
||
ENTRYPOINT ["python", "-m", "clarifai.runners.server"] | ||
|
||
# Finally run the clarifai entrypoint to start the runner loop and local dev server. | ||
# Note(zeiler): we may want to make this a clarifai CLI call. | ||
CMD ["--model_path", "/app/model_dir/main"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we separate this into a base image and this template? feels overloaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, This looks overloaded to me too and I tried to separate this into a base image but the issue is there is a separate NIM image for every model, so not sure we can do that