Skip to content

Commit

Permalink
Merge pull request #226 from clamsproject/221-base-image-debian-upgrade
Browse files Browse the repository at this point in the history
based image updated to debian12
  • Loading branch information
keighrim authored Jun 6, 2024
2 parents 28492d0 + 7e81238 commit 392b4c8
Show file tree
Hide file tree
Showing 4 changed files with 71 additions and 21 deletions.
14 changes: 14 additions & 0 deletions clams/develop/templates/app/Containerfile.template
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,20 @@ ARG CLAMS_APP_VERSION
ENV CLAMS_APP_VERSION ${CLAMS_APP_VERSION}
################################################################################

################################################################################
# This is duplicate from the base image Containerfile
# but makes sure the cache directories are consistent across all CLAMS apps

# https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/__init__.py#L130
ENV XDG_CACHE_HOME='/cache'
# https://huggingface.co/docs/huggingface_hub/main/en/package_reference/environment_variables#hfhome
ENV HF_HOME="/cache/huggingface"
# https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved
ENV TORCH_HOME="/cache/torch"

RUN mkdir /cache && rm -rf /root/.cache && ln -s /cache /root/.cache
################################################################################

################################################################################
# clams-python base images are based on debian distro
# install more system packages as needed using the apt manager
Expand Down
10 changes: 9 additions & 1 deletion container/Containerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
FROM python:3.8-slim-buster
FROM python:3.8-slim-bookworm
LABEL org.opencontainers.image.description="clams-python image is a base image for CLAMS apps"

ARG clams_version
# https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/__init__.py#L130
ENV XDG_CACHE_HOME='/cache'
# https://huggingface.co/docs/huggingface_hub/main/en/package_reference/environment_variables#hfhome
ENV HF_HOME="/cache/huggingface"
# https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved
ENV TORCH_HOME="/cache/torch"

RUN mkdir /cache && rm -rf /root/.cache && ln -s /cache /root/.cache
RUN apt-get update && apt-get install -y pkg-config
RUN pip install --no-cache-dir clams-python==$clams_version
8 changes: 4 additions & 4 deletions container/opencv4.containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ ARG clams_version
FROM ghcr.io/clamsproject/clams-python-ffmpeg:$clams_version
LABEL org.opencontainers.image.description="clams-python-opencv image is shipped with clams-python, ffmpeg, and opencv4 with their python bindings"

ARG OPENCV_VERSION=4.7.0
ARG OPENCV_VERSION=4.10.0
ARG OPENCV_PATH=/opt/opencv-${OPENCV_VERSION}
ARG OPENCV_EXTRA_PATH=/opt/opencv_contrib-${OPENCV_VERSION}

Expand All @@ -13,10 +13,10 @@ RUN apt-get install -y g++ cmake make wget unzip libavcodec-dev libavformat-dev
# opencv download
RUN mkdir /opt || echo '/opt is already there'
WORKDIR /opt
RUN wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip -O opencv.zip
RUN wget -q https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip -O opencv.zip
RUN unzip -q opencv.zip
RUN rm opencv.zip
RUN wget https://github.com/opencv/opencv_contrib/archive/${OPENCV_VERSION}.zip -O opencv_contrib.zip
RUN wget -q https://github.com/opencv/opencv_contrib/archive/${OPENCV_VERSION}.zip -O opencv_contrib.zip
RUN unzip -q opencv_contrib.zip
RUN rm opencv_contrib.zip

Expand All @@ -41,5 +41,5 @@ RUN make -j$(nproc) && make install && ldconfig
WORKDIR /
RUN rm -rf ${OPENCV_PATH} ${OPENCV_EXTRA_PATH}
RUN pip uninstall opencv-python
RUN pip install --no-cache-dir opencv-python-rolling~=${OPENCV_VERSION}
RUN pip install --no-cache-dir opencv-python~=${OPENCV_VERSION}
RUN apt-get remove -y g++ cmake make wget unzip libavcodec-dev libavformat-dev libavutil-dev libswscale-dev && apt-get autoremove -y
60 changes: 44 additions & 16 deletions documentation/clamsapp.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,30 @@ However, there could be other non-Python software/library that are required by t
CLAMS Apps available on the CLAMS App Directory. Currently all CLAMS Apps are open-source projects and are distributed as

1. source code downloadable from code repository
2. pre-built container image
1. pre-built container image

Please visit [the app-directory](https://apps.clams.ai) to see which apps are available and where you can download them.

In most cases, you can "install" a CLAMS App by either

1. downloading source code from the app code repository and manually building a container image
2. downloading pre-built container image directly
1. downloading pre-built container image directly (quick-and-easy way)
1. downloading source code from the app code repository and manually building a container image (more flexible way if you want to modify the app, or have to build for a specific HW)

#### Download prebuilt image

This is the quickest (and recommended) way to get started with a CLAMS App.
CLAMS apps in the App Directory come with public prebuilt container images, available in a container registry.

``` bash
docker pull <prebulit_image_name>
```

The image name can be found on the App Directory entry of the app.

#### Build image from source code

Alternatively, you can build a container image from the source code.
This is useful when you want to modify the app itself, or when you want to change the image building process to adjust to your hardware environment (e.g., specific compute engine), or additional software dependencies (e.g. [MMIF plugins](https://clams.ai/mmif-python/latest/plugins.html)).
To download the source code, you can either use `git clone` command or download a zip file from the source code repository.
The source code repository address can be found on the App Directory entry of the app.

Expand All @@ -43,16 +56,6 @@ From the locally downloaded project directory, run the following in your termina
docker build . -f Containerfile -t <image_name_you_pick>
```

#### Download prebuilt image

Alternatively, the app maybe already be available on a container registry.

``` bash
docker pull <prebulit_image_name>
```

The image name can be found on the App Directory entry of the app.

### Running CLAMS App


Expand All @@ -65,7 +68,11 @@ docker run -v /path/to/data/directory:/data -p <port>:5000 <image_name>
```

where `/path/to/data/directory` is the local location of your media files or MMIF objects and `<port>` is the *host* port number you want your container to be listening to.
The HTTP inside the container will be listening to 5000 by default. Usually any number above 1024 is fine for the host port number, and you can use the same 5000 number for the host port number.
The HTTP inside the container will be listening to 5000 by default, so the second part of the `-p` argument is always `5000`.
Usually any number above 1024 is fine for the host port number, and you can use the same 5000 number for the host port number.

The mount point for the data directory inside the container can be any path, and we used `/data` just as an example.
However, it is very important to understand that the file location in the input MMIF file must be a valid and available path inside the container (see below for more details).

> **Note**
> If you are using a Mac, on recent versions of macOS, port 5000 is used by Airplay Receiver by default. So you may need to use a different port number, or turn off the Airplay Receiver in the System Preferences to release 5000.
Expand All @@ -78,6 +85,21 @@ The HTTP inside the container will be listening to 5000 by default. Usually any
> ```
> This is because the image you are trying to run is built for Intel/AMC CPUs. To force the container to run on an emulation layer, you can add `--platform linux/amd64` option to the `docker run` command.
Additionally, you can mount a directory to `/cache/` inside the container to persist the cache data between container runs.
This is particularly handy when the app you are using downloads a fairly large pretrained model file on the first run, and you want to keep it for the next run.
Unlike the data directory, the cache directory is not required to be mounted, but if you want to persist the cache data, you can mount a local directory to `/cache/` inside the container (fixed path).
```bash
docker run -v /path/to/data/directory:/data -v /path/to/cache/directory:/cache -p <port>:5000 <image_name>
```
> **Note**
> One might be tempted bind-mount their entire local cache directory (usually `~/.cache` in Linux systems) to re-use locally downloaded model files, across different apps.
> However, doing so will expose all the cached data, not just model files, to the container.
> This can include sensitive information such as browser cache, authentication tokens, etc, hence will pose a great security risk.
> It is recommended to create a separate directory to use as a cache directory for CLAMS containers.

#### Running as a local HTTP server

Expand Down Expand Up @@ -143,9 +165,15 @@ You will get
}
```

If an app requires just `Document` inputs (see `input` section of the app metadata), an empty MMIF with required media file locations will suffice. The location has to be a URL or an absolute path, and it is important to ensure that it exists.
If an app requires just `Document` inputs (see `input` section of the app metadata), an empty MMIF with required media file locations will suffice.
The location has to be a URL or an absolute path, and it is important to ensure that it exists.
Especially when running the app in a container, and the document location is specified as a file system path, the file must be available inside the container.
In the above, we bind-mounted `/path/to/data/directory` (host) to `/data` (container).
That is why we used `/data/audio/some-audio-file.mp3` as the location when generating this MMIF input.
So in this example, the file `/path/to/data/directory/audio/some-audio-file.mp3` must exist on the host side, so that inside the container, it can be accessed as `/data/audio/some-audio-file.mp3`.


However, some apps only works with input MMIF that already contains some annotations of specific types. To run such apps, you need to run different apps in a sequence.
Some apps only works with input MMIF that already contains some annotations of specific types. To run such apps, you need to run different apps in a sequence.

(TODO: added CLAMS workflow documentation link here.)

Expand Down

0 comments on commit 392b4c8

Please sign in to comment.