Skip to content
Hassan Muhammad edited this page Jun 29, 2016 · 6 revisions

tl;dr

  1. qsub -I -l nodes=1:gpus=1:gtxtitan:docker to get a docker enabled Titan GPU. Become a member of the docker group if you are not one already. Run groups $USER to find out.
  2. module load Anaconda cudnn/7.0 cuda/7.5
  3. nvidia-smi. Figure out which GPU you want to use (the one which is being used the least) and run export CUDA_VISIBLE_DEVICES=gpu_number. Where gpu_number is the GPU you want to use.
  4. Either:
  • Build your own image by copying the Dockerfile to a file named "Dockerfile" in your project, cding into your project's directory and running
docker build -t username:tagname .
  • Or, just use a prebuilt image gideonitemd/hal-tf. As of today <June 15, 2016> there is no centralized DockerHub directory (TODO).
  1. Copy [the script](# Docker Run Script) and use it to run the image. Namely,
docker_run_gpu.sh -v /path/to/your-code-or-data:/mnt/whereever -it docker-img-tag progname

Where docker-img-tag is either gideonite/hal-tf or whatever you replaced username:tagname with above. The -v argument mounts a directory into your running docker image so that you can access it within the image. progname can be any program available within the image. For example, bash.

Background

As of today, there is some sort of shared library incompatability between the cluster and tensorflow. The solution that we have come to is to use Docker to encapsulate the dependencies. You have to go a bit beyond what is listed on the Tensorflow documentation to get an image working with GPU capability, thus this wikipage.

Hardware Requirements

Docker requires TitanX or better so make sure that you request one of those by running qsub -I -l nodes=1:gpus=1:gtxtitan:docker. Make sure you are a member of the docker group.

Dockerfile

I think that the easiest way to understand is to simply look at the configuation and/or code. The comments above each command explain the significance of each command. I took this directly from the tensorflow project and made the necessary modifications.

# This is chosen to match what is on hal.
# There is some confusion about cudnn versions. tl;dr "cuDNN 6.5(v2), 7.0(v3), v5)" See here for more info. https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html.
FROM nvidia/cuda:7.5-cudnn3-devel

MAINTAINER you

# download a bunch of basic dependencies for tensorflow.
# apt-get
RUN apt-get update && apt-get install -y \
    curl \
    libfreetype6-dev \
    libpng12-dev \
    libzmq3-dev \
    python-scipy \
    python-yaml \
    libhdf5-serial-dev \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# install pip
RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
    python get-pip.py && \
    rm get-pip.py

# scikit learn is required for skflow
RUN pip install scikit-learn \
    h5py

# jupyter is commonly used as a tool for interactive development. Many machine learners use it.
RUN pip --no-cache-dir install \
        ipykernel \
        jupyter

# install Tensorflow
# TODO this version should be bumped up to 0.9.0 (June 15, 2016)
ENV TENSORFLOW_VERSION 0.8.0
RUN pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl

# TODO: I am not familiar with TensorBoard, disabled for the time being but kept around in case someone else wants to use it. If that's you, please update this wiki.
# TensorBoard
# EXPOSE 6006

# default command to be run by `docker run`
CMD ["/bin/bash"]

Docker Run Script

To make the GPU accessible from within the container one needs to manually link in the relevant libraries. The tensorflow project has a script for this and I made some small modifications for us.

#!/usr/bin/env bash
set -e

export CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}

if [ ! -d ${CUDA_HOME}/lib64 ]; then
  echo "Failed to locate CUDA libs at ${CUDA_HOME}/lib64."
  exit 1
fi

# mounts each library that matches the patterns libcuda.* and nvidia*
export CUDA_SO=$(\ls /usr/lib64/libcuda.* | \
                    xargs -I{} echo '-v {}:{}')
export DEVICES=$(\ls /dev/nvidia* | \
                    xargs -I{} echo '--device {}:{}')

# mounts the two modules.
MODULES=""
for module in "/lib/modules/2.6.32-573.7.1.el6.x86_64/modules.dep.bin", "/lib/modules/2.6.32-573.7.1.el6.x86_64/kernel/drivers/video/nvidia.ko"; do
    MODULES=" -v "$module":"$module" "$MODULES
done

# sets up $LD_LIBRARY_PATH and sets CUDA_VISIBLE_DEVICES to whatever
# is in the current environment.
ENVI="--env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/"
ENVI=$ENVI" --env CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES" # fetch var from current env.

if [[ "${DEVICES}" = "" ]]; then
  echo "Failed to locate NVidia device(s). Did you want the non-GPU container?"
  exit 1
fi

docker run -it $CUDA_SO $MODULES $DEVICES $ENVI "$@"

Appendix

General

To have access to data outside the docker container, mount a directory to e.g. /mnt. See here for more information. In practice you modify the docker run command like so:

  • GPU-version: (inside GPU-setup script) docker run -it $CUDA_SO $DEVICES -v /path/to/directory/on/hal:/mnt "$@"
  • non-GPU-version: docker run -it -v /path/to/directory/on/hal:/mnt b.gcr.io/tensorflow/tensorflow

GPU things

Firstly, the GPU has to have CUDA compute capability 3.5 or better, so the GTX-680s aren't good enough. Get a Titan-and-docker-enabled node with e.g. qsub -I -l nodes=1:gpus=1:gtxtitan:docker:shared -q active

Secondly, the GPU-setup script provided by TensorFlow needs tweaking. Replace

export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda.* | xargs -I{} echo '-v {}:{}')

with

export CUDA_SO=$(\ls /usr/lib64/libcuda.* | xargs -I{} echo '-v {}:{}')

Thirdly, once you're in the Docker container you need to update LD_LIBRARY_PATH so it can find libcuda.so.1,

# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/
# ldconfig

Additional Resources

Errors and Troubleshooting

Some errors I have seen and their fixes:

Unable to find libcuda.so.1

tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
tensorflow/stream_executor/dso_loader.cc:99] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:121] hostname: 137addbf2d79
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:146] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:257] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015
GCC version:  gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)

tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel reported version is: 352.39
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1051] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1052] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: wrong ELF class: ELFCLASS32
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

Fix: Make sure LD_LIBRARY_PATH is set. You can try to make sure libcuda.so.1 has been linked/found with

ldconfig -p | grep libcuda

Using a GTX-680

tensorflow/core/common_runtime/gpu/gpu_device.cc:684] Ignoring gpu device (device: 0, name: GeForce GTX 680, pci bus id: 0000:03:00.0) with Cuda compute cap
ability 3.0. The minimum required Cuda capability is 3.5.
tensorflow/core/common_runtime/gpu/gpu_device.cc:684] Ignoring gpu device (device: 1, name: GeForce GTX 680, pci bus id: 0000:04:00.0) with Cuda compute cap
ability 3.0. The minimum required Cuda capability is 3.5.
tensorflow/core/common_runtime/gpu/gpu_device.cc:684] Ignoring gpu device (device: 2, name: GeForce GTX 680, pci bus id: 0000:83:00.0) with Cuda compute cap
ability 3.0. The minimum required Cuda capability is 3.5.
tensorflow/core/common_runtime/gpu/gpu_device.cc:684] Ignoring gpu device (device: 3, name: GeForce GTX 680, pci bus id: 0000:84:00.0) with Cuda compute cap
ability 3.0. The minimum required Cuda capability is 3.5.

Fix: Stop using the GTX-680s.

GPUs in exclusive compute mode

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA n
ode, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 6.00GiB
Free memory: 5.92GiB
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA n
ode, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
name: GeForce GTX TITAN
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:04:00.0
Total memory: 6.00GiB
Free memory: 5.92GiB
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA n
ode, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 2 with properties:
name: GeForce GTX TITAN
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:83:00.0
Total memory: 6.00GiB
Free memory: 5.92GiB
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA n
ode, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 3 with properties:
name: GeForce GTX TITAN
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:84:00.0
Total memory: 6.00GiB
Free memory: 5.92GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 2
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 3
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 2
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 3
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 2 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 2 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 3 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 3 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 2:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
...
(lines like this for a while)
...
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 5.62GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 0 memory begins at 0x4310b00000 extends to 0x4478618a67
F tensorflow/stream_executor/cuda/cuda_driver.cc:383] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216)
Aborted

Fix: You need to set the compute mode of the GPUs to non-exclusive, by including shared in the qsub command, e.g.

qsub -I -q active -l nodes=1:gpus=1:gtxtitan:docker:shared

See also this issue, sort of.

Authors

  • Gideon Dresdner
  • Stephanie Hyland