Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSCS-CI ext #46

Open
wants to merge 50 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
0b6512a
v0.8.2
simonpintarelli May 31, 2024
4cb220c
add files
simonpintarelli Aug 9, 2024
d917403
fix paths
simonpintarelli Aug 9, 2024
9cfb7d0
use eiger
simonpintarelli Aug 15, 2024
d4f71b6
add slurm-uenv-mount dependencies to docker image
simonpintarelli Aug 15, 2024
3ba60bd
add build stage
simonpintarelli Aug 15, 2024
c225925
update
simonpintarelli Aug 15, 2024
92bdced
add missing image
simonpintarelli Aug 15, 2024
71fd540
change image name
simonpintarelli Aug 15, 2024
cc85da0
push to public
simonpintarelli Aug 15, 2024
7ea5e5c
update c++
simonpintarelli Aug 15, 2024
ac6527b
use gcc12
simonpintarelli Aug 15, 2024
09598ac
explicitly use g++-12 for meson
simonpintarelli Aug 15, 2024
656fd9d
run on daint-mc
simonpintarelli Aug 15, 2024
697462f
try eiger runner
simonpintarelli Aug 15, 2024
cc3ef14
eiger-mc
simonpintarelli Aug 15, 2024
8d33f0e
try sbatch
simonpintarelli Aug 15, 2024
338765d
remove variables
simonpintarelli Aug 16, 2024
10b039f
debug
simonpintarelli Aug 16, 2024
9fc8273
pass slurm_version as argument
simonpintarelli Aug 16, 2024
8fbbadb
persistent image name
simonpintarelli Aug 16, 2024
37be2a2
remove eval
simonpintarelli Aug 16, 2024
87b6799
fix IMAGE_NAME
simonpintarelli Aug 16, 2024
730dffe
add test docker container
simonpintarelli Aug 16, 2024
abe9db4
cleanup
simonpintarelli Aug 17, 2024
1cf0cfb
export missing BASE_IMAGE
simonpintarelli Aug 17, 2024
326b64d
update
simonpintarelli Aug 17, 2024
160cdb2
add test stage (container runner)
simonpintarelli Aug 18, 2024
d1645d5
missing image
simonpintarelli Aug 19, 2024
de55b3c
remove workdir
simonpintarelli Aug 19, 2024
51f170f
use su instead of sudo
simonpintarelli Aug 19, 2024
9d997b5
Revert "use su instead of sudo"
simonpintarelli Aug 19, 2024
6ceaca5
build rpm
simonpintarelli Aug 20, 2024
240b247
skip entrypoint
simonpintarelli Aug 20, 2024
37c04fa
build rpm in local directory (scratch)
simonpintarelli Aug 20, 2024
849d0c9
set gcc12
simonpintarelli Aug 20, 2024
ca14781
run on todi too
simonpintarelli Aug 21, 2024
4769d72
fix
simonpintarelli Aug 21, 2024
a6b6127
attempt to fix override
simonpintarelli Aug 21, 2024
2b00d63
notification context
simonpintarelli Aug 21, 2024
1496ebd
f7t runner for todi
simonpintarelli Aug 22, 2024
9e04230
persist image name
simonpintarelli Aug 22, 2024
744746a
add uarch to docker tag
simonpintarelli Aug 22, 2024
dacbb17
fix image name
simonpintarelli Aug 22, 2024
d13f5d0
wip
simonpintarelli Aug 28, 2024
e95c50e
add macros.meson to repo
simonpintarelli Aug 28, 2024
4d556c4
debug
simonpintarelli Aug 28, 2024
b722547
use github restapi to upload rpm
simonpintarelli Aug 28, 2024
1d272a3
test curl
simonpintarelli Aug 28, 2024
4290fbe
check status codes
simonpintarelli Aug 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.8.1
0.8.2
110 changes: 110 additions & 0 deletions ci/cscs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
include:
- remote: 'https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml'

stages:
- baseimage x86_64
- build x86_64
- build rpm x86_64
- baseimage aarch64
- build aarch64
- build rpm aarch64

# dynamic name for sha on watched files, slurm version, uarch
.my-dynamic-image-name:
extends: [.dynamic-image-name]
before_script:
- DOCKER_TAG=`echo $(eval cat $WATCH_FILECHANGES; echo -n $slurm_version) $(uname -m) | sha256sum | head -c 16`
- export PERSIST_IMAGE_NAME=$PERSIST_IMAGE_NAME:$DOCKER_TAG
- echo "BASE_IMAGE=$PERSIST_IMAGE_NAME" > build.env

.build slurm base:
timeout: 10h
variables:
CSCS_NOTIFICATION_CONTEXT: "$slurm_version"
DOCKERFILE: ci/slurm_docker/Dockerfile.base
DOCKER_BUILD_ARGS: '["SLURM_VERSION=$slurm_version"]'
WATCH_FILECHANGES: ci/slurm_docker/Dockerfile.base ci/slurm_docker/cgroup.conf ci/slurm_docker/entrypoint.sh ci/slurm_docker/install_slurm.sh ci/slurm_docker/slurm.conf.in
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/base/public/slurm-base

build slurm base x86_64:
stage: baseimage x86_64
extends: [.my-dynamic-image-name, '.build slurm base', .container-builder-cscs-zen2]

build slurm base aarch64:
stage: baseimage aarch64
extends: [.my-dynamic-image-name, '.build slurm base', .container-builder-cscs-gh200]

.build:
variables:
CSCS_REBUILD_POLICY: always
CSCS_NOTIFICATION_CONTEXT: "$slurm_version"
DOCKERFILE: ci/slurm_docker/Dockerfile
DOCKER_BUILD_ARGS: '["BASE_IMAGE=${BASE_IMAGE}"]'

build x86_64:
needs: ["build slurm base x86_64"]
stage: build x86_64
extends: [.build, .container-builder-cscs-zen2]
variables:
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/base/public/slurm-uenv-mount-x86_64

build aarch64:
needs: ["build slurm base aarch64"]
stage: build aarch64
extends: [.build, .container-builder-cscs-gh200]
variables:
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/base/public/slurm-uenv-mount-aarch64

.build rpm upload artifact:
variables:
CSCS_NOTIFICATION_CONTEXT: "$slurm_version"
script:
- |
_rpm_build_dir=./
mkdir -p ${_rpm_build_dir}
CXX=g++-12 CC=gcc-12 /src/rpm/make-rpm.sh --slurm-version "${slurm_version}" ${_rpm_build_dir}
binary_rpm=$(find RPMS -name '*.rpm')
# upload release
ret=$(curl -L \
-X POST \
-o response.json \
-w "%{http_code}" \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${GHUB_WRITE_TOKEN}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
-H "Content-Type: application/octet-stream" \
"https://uploads.github.com/repos/eth-cscs/slurm-uenv-mount/releases/${CI_COMMIT_REF_NAME}/assets?name=$(basename ${binary_rpm})" \
--data-binary "@${binary_rpm}")
echo "http_code: $ret"
if [ $http_code -eq 400 ]; then
echo "$http_code: Bad request, couldn't upload release"
exit 1
fi
# https://docs.github.com/en/rest/releases/assets?apiVersion=2022-11-28#upload-a-release-asset--status-codes
if [ $http_code -eq 201 ]; then
echo "$http_code: Successfully uploaded $(basename ${binary_rpm}) to release ${CI_COMMIT_REF_NAME}."
cat response.json
fi
if [ $http_code -eq 422 ]; then
echo "$http_code: Successfully replaced $(basename ${binary_rpm}) to release ${CI_COMMIT_REF_NAME}."
cat response.json
fi



build rpm x86_64 and upload artifact:
needs: ["build x86_64"]
image: $CSCS_REGISTRY_PATH/base/public/slurm-uenv-mount-x86_64
stage: build rpm x86_64
extends: ['.build rpm upload artifact', .container-runner-eiger-mc]

build rpm aarch64 and upload artifact:
needs: ["build aarch64"]
image: $CSCS_REGISTRY_PATH/base/public/slurm-uenv-mount-aarch64
stage: build rpm aarch64
extends: ['.build rpm upload artifact', .f7t-container-runner]
variables:
F7T_URL: 'https://firecrest-todi.v1.tds.cscs.ch'
FIRECREST_SYSTEM: 'todi'
ARCH: 'aarch64'
USE_CE: 'YES'
8 changes: 8 additions & 0 deletions ci/slurm_docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
ARG BASE_IMAGE
FROM $BASE_IMAGE

COPY . /src

RUN CXX=g++-12 CC=gcc-12 meson setup builddir /src \
&& meson install -C builddir \
&& echo "required /usr/local/lib64/libslurm-uenv-mount.so" > /etc/slurm/plugstack.conf
101 changes: 101 additions & 0 deletions ci/slurm_docker/Dockerfile.base
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
FROM opensuse/leap:15.4

ARG SLURM_VERSION=23.02.7
ARG SLURM_ROOT=/usr
ARG SLURM_CONFDIR=/etc/slurm

ENV SLURM_VERSION ${SLURM_VERSION}
ENV SLURM_ROOT ${SLURM_ROOT}
ENV SLURM_CONFDIR ${SLURM_CONFDIR}

RUN zypper install -y \
munge \
munge-devel \
libnuma1 \
libnuma-devel \
librrd8 \
readline-devel \
hwloc \
hwloc-devel \
lz4 \
liblz4-devel \
libz1 \
zlib-devel \
freeipmi \
freeipmi-devel \
dbus-1 \
dbus-1-devel \
make \
gcc12 \
gcc12-c++ \
curl \
tar \
bzip2 \
python3 \
vim \
ca-certificates \
less \
sudo \
fuse3-devel \
git \
sqlite3 \
sqlite3-devel \
libopenssl-devel \
util-linux \
util-linux-systemd \
squashfs \
rpm-build \
lua53 \
lua53-devel \
libmount-devel

RUN useradd -M slurm

RUN mkdir -p /var/log/slurm
RUN mkdir -p /var/spool/slurmctld && chown slurm /var/spool/slurmctld && chmod u+rwx /var/spool/slurmctld
RUN mkdir -p /var/spool/slurmd && chown slurm /var/spool/slurmd && chmod u+rwx /var/spool/slurmd


COPY ci/slurm_docker/install_slurm.sh .

RUN ./install_slurm.sh ${SLURM_VERSION} ${SLURM_ROOT} ${SLURM_CONFDIR} --enable-multiple-slurmd

RUN mkdir -p ${SLURM_CONFDIR}
COPY ci/slurm_docker/cgroup.conf ${SLURM_CONFDIR}
COPY ci/slurm_docker/slurm.conf.in ${SLURM_CONFDIR}

# slurm-uenv-mount
# install python
RUN curl -O https://www.python.org/ftp/python/3.10.11/Python-3.10.11.tgz \
&& tar xzvf Python-3.10.11.tgz \
&& cd Python-3.10.11 \
&& ./configure \
&& make install -j \
&& cd ../ && rm -r Python-3.10.11
RUN zypper --non-interactive rm libopenssl-devel

# rpmbuild > /usr/lib/rpm/macros.d/macros.meson are missing here ...
RUN python3 -m pip install --upgrade pip && python3 -m pip install meson ninja
RUN curl https://raw.githubusercontent.com/mesonbuild/meson/master/data/macros.meson -o /usr/lib/rpm/macros.d/macros.meson
# rpm build expects meson in /usr/bin/meson
RUN ln -s /usr/local/bin/meson /usr/bin/meson

# download bash-bats
RUN curl -L https://github.com/bats-core/bats-core/archive/refs/tags/v1.9.0.tar.gz | tar xz
RUN ln -s /bats-core-1.9.0/bin/bats /usr/bin/bats
RUN mkdir bats-helpers
RUN git clone --depth 1 https://github.com/bats-core/bats-assert.git bats-helpers/bats-assert
RUN git clone --depth 1 https://github.com/bats-core/bats-support.git bats-helpers/bats-support
ENV BATS_LIB_PATH /bats-helpers

RUN mkdir /user-environment
RUN mkdir /user-profilers
RUN mkdir /user-tools

RUN useradd testuser
RUN mkdir -p /home/testuser
RUN chown testuser /home/testuser

COPY ci/tests /tests

COPY ci/slurm_docker/entrypoint.sh .
5 changes: 5 additions & 0 deletions ci/slurm_docker/cgroup.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
CgroupAutomount=yes
ConstrainCores=no
ConstrainRAMSpace=no
CgroupMountpoint=/sys/fs/cgroup
CgroupPlugin=cgroup/v1
86 changes: 86 additions & 0 deletions ci/slurm_docker/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#!/bin/bash

dbus-launch
sudo -u munge munged

: "${SLURM_CONF_IN=$SLURM_CONFDIR/slurm.conf.in}"
: "${SLURM_CONF=$SLURM_CONFDIR/slurm.conf}"

# Default number of slurm nodes
: "${SLURM_NUMNODES=3}"

# Default slurm controller
: "${SLURMCTLD_HOST=$HOSTNAME}"
: "${SLURMCTLD_ADDR=127.0.0.1}"

# Default node info
: "${NODE_HOST=$HOSTNAME}"
: "${NODE_ADDR=127.0.0.1}"
: "${NODE_BASEPORT=6001}"

# Default hardware profile
: "${NODE_HW=CPUs=4}"

# Generate node names and associated ports
NODE_NAMES=$(printf "nd[%05i-%05i]" 1 $SLURM_NUMNODES)
NODE_PORTS=$(printf "%i-%i" $NODE_BASEPORT $(($NODE_BASEPORT+$SLURM_NUMNODES-1)))


echo "INFO:"
echo "INFO: Creating $SLURM_CONF with"
echo "INFO: "
column -t <<-EOF
INFO: SLURMCTLD_HOST=$SLURMCTLD_HOST SLURMCTLD_ADDR=$SLURMCTLD_ADDR
INFO: NODE_HOST=$NODE_HOST NODE_ADDR=$NODE_ADDR NODE_BASEPORT=$NODE_BASEPORT
INFO: NODE_HW=$NODE_HW
INFO: SLURM_NUMNODES=$SLURM_NUMNODES
EOF
echo "INFO: "
echo "INFO: Derived values:"
echo "INFO:"
column -t <<-EOF
INFO: NODE_NAMES=$NODE_NAMES
INFO: NODE_PORTS=$NODE_PORTS
EOF
echo "INFO:"
echo "INFO: Override any of the non-derived values by setting the respective environment variable"
echo "INFO: when starting Docker."
echo "INFO:"

export PATH=$SLURM_ROOT/bin:$PATH
export LD_LIBRARY_PATH=$SLURM_ROOT/lib:$LD_LIBRARY_PATH
export MANPATH=$SLURM_ROOT/man:$MANPATH

(
echo "NodeName=${NODE_NAMES} NodeHostname=${NODE_HOST} NodeAddr=${NODE_ADDR} Port=${NODE_PORTS} State=UNKNOWN ${NODE_HW}"
echo "PartitionName=dkr Nodes=ALL Default=YES MaxTime=INFINITE State=UP"
) \
| sed -e "s/SLURMCTLDHOST/${SLURMCTLD_HOST}/" \
-e "s/SLURMCTLDADDR/${SLURMCTLD_ADDR}/" \
$SLURM_CONF_IN - \
> $SLURM_CONF

NODE_NAME_LIST=$(scontrol show hostnames $NODE_NAMES)

for n in $NODE_NAME_LIST
do
echo "$NODE_ADDR $n" >> /etc/hosts
done

echo
echo "Starting Slurm services..."
echo

$SLURM_ROOT/sbin/slurmctld

for n in $NODE_NAME_LIST
do
$SLURM_ROOT/sbin/slurmd -N $n
done

echo
sinfo
echo
echo

exec "$@"
66 changes: 66 additions & 0 deletions ci/slurm_docker/install_slurm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/bin/bash -x
#
# Usage: install_slurm.sh <slurm-version> <install-prefix> [configure-args]
#

SLURM_VERSION=$1
SLURM_ROOT=$2
SLURM_CONFDIR=$3
shift; shift; shift
ARGS=$*

slurm_tar_file=slurm-${SLURM_VERSION}.tar.bz2
slurm_url=https://download.schedmd.com/slurm/${slurm_tar_file}


if [ -z "$SLURM_VERSION" -o -z "$SLURM_ROOT" -o -z "$SLURM_CONFDIR" ];
then
echo "Usage: install_slurm.sh <slurm-version> <install-prefix> <sysconf-dir> [configure-args]"
echo "No Slurm version or install-prefix specified on command line. Aborting."
exit 1
fi

#
# Download slurm tarball and unpack it
#
if true; then

mkdir -p /opt/src || exit 1
(
cd /opt/src

if ! stat $slurm_tar_file; then
echo "=== downloading slurm ${SLURM_VERSION} from ${slurm_url}"
curl --fail --output ${slurm_tar_file} ${slurm_url} || exit 1
fi

echo "=== unpacking $slurm_tar_file"
tar -xjf ${slurm_tar_file} || exit 1
)

fi

if [ "$ARGS" = "NO_BUILD" ];
then
exit 0
fi

#
# Remove any old build directory.
# Run configure, make, make install
#

stat /opt/build/slurm-${SLURM_VERSION} && rm -rf /opt/build/slurm-${SLURM_VERSION}
mkdir -p /opt/build/slurm-${SLURM_VERSION} || exit 1
(
cd /opt/build/slurm-${SLURM_VERSION}
CXX=g++-12 CC=gcc-12 /opt/src/slurm-${SLURM_VERSION}/configure --help
/opt/src/slurm-${SLURM_VERSION}/configure \
--prefix=${SLURM_ROOT} \
--sysconfdir=${SLURM_CONFDIR} \
--disable-dependency-tracking \
$ARGS

make -j4 && make install
)

Loading
Loading