Skip to content

Commit

Permalink
ci: Use lockfiles to control cached venv (#1238)
Browse files Browse the repository at this point in the history
Closes #1146 .

This PR intends to fix all the dependencies used in the CI in multiple
`test-requirements.txt` files, in a fully automated way.
It ensures that between two runs, the CI can't break because of a
dependency update.
It bases the pipelines cache on these files.

On the other hand, it makes it easier to update the versions of the
dependencies used in the CI.
Currently, the versions are updated only if someone updates the
`pyproject.toml` file, or if the CI cache is purged by hand.
It will be now automatically updated each week by `dependabot`, in a
pull-request.

This PR updates the `dependabot` configuration, but I can't be sure it
works before the merge. If not, we can easily create a `cron` pipeline
calling the new `pip-compile.sh` script.

---

This PR adds a new backend test ensuring that `test-requirements.txt`
files are regenerated when `requirements.in` or `test-requirements.in`
is modified.


![image](https://github.com/user-attachments/assets/310444b3-12de-46a6-8031-ed2d1671850c)
  • Loading branch information
thomass-dev authored Feb 3, 2025
1 parent 0147194 commit b6a2e75
Show file tree
Hide file tree
Showing 21 changed files with 1,685 additions and 83 deletions.
9 changes: 8 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "monthly"
interval: "weekly"
commit-message:
prefix: "chore(dependencies): PIP"
groups:
test-dependencies:
patterns:
- "*"

# It uses `skore/ci/requirements/python*/.python-version` to specify the python
# version used to update `skore/ci/requirements/**/test-requirements.txt`
48 changes: 31 additions & 17 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,27 @@ jobs:
working-directory: skore/
run: pre-commit run --all-files ruff

backend-lockfiles:
runs-on: "ubuntu-latest"
if: ${{ github.event_name == 'pull_request' }}
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 2

- name: Check lockfiles are not obsolete
run: |
changes=$(git diff --name-only HEAD^1 HEAD)
if
(echo "${changes}" | grep -qE 'skore/(test-)?requirements.in') &&
(echo "${changes}" | (! grep -qE "skore/ci/requirements/.*/test-requirements.txt"))
then
echo '::error title=backend-lockfiles:: Lockfiles obsolete, please execute `$ cd skore/ci; bash pip-compile.sh`'
exit 1
fi
backend-test:
strategy:
fail-fast: false
Expand Down Expand Up @@ -77,7 +98,11 @@ jobs:
id: cache-python-venv
with:
path: 'skore/venv'
key: python-venv-${{ matrix.os }}-${{ matrix.python }}-${{ matrix.scikit-learn }}-${{ hashFiles('skore/pyproject.toml') }}
key: >-
python-venv
-${{ matrix.os }}
-${{ matrix.python }}
-${{ hashFiles(format('skore/ci/requirements/python-{0}/scikit-learn-{1}/test-requirements.txt', matrix.python, matrix.scikit-learn)) }}
- name: Setup python-venv
working-directory: "skore/"
Expand All @@ -97,20 +122,11 @@ jobs:
fi
- name: Install dependencies in python-venv
working-directory: "skore/"
working-directory: ${{ format('skore/ci/requirements/python-{0}/scikit-learn-{1}', matrix.python, matrix.scikit-learn) }}
if: steps.cache-python-venv.outputs.cache-hit != 'true'
run: |
python -m pip install --upgrade "pip"
python -m pip install --upgrade "build"
# adding `.*` to the version ensures that we install the latest version of
# scikit-learn that is compatible with the specified version
python -m pip install --upgrade "scikit-learn ==${{ matrix.scikit-learn }}.*"
# Install `skore` and its dependencies
python -m pip install --upgrade --upgrade-strategy=eager ".[test]"
# Uninstall the `skore` package itself
python -m pip uninstall -y "skore"
python -m pip install --upgrade pip build
python -m pip install --requirement test-requirements.txt
- name: Save python-venv
uses: actions/cache/save@v4
Expand All @@ -124,10 +140,8 @@ jobs:
run: python -m build

- name: Install
working-directory: skore/
run: |
# Install `skore` without its dependencies, which are present in the venv
wheel=(dist/*.whl); python -m pip install --force-reinstall --no-deps "${wheel}"
working-directory: skore/dist/
run: wheel=(*.whl); python -m pip install --force-reinstall --no-deps "${wheel}"

- name: Show dependencies versions
working-directory: skore/
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ examples/plot_*.png

# Include excluded directories from github-actions
!/.github/**/build/
!/skore/ci/**/.python-version

# Exclude hatch artifacts
skore/LICENSE
48 changes: 48 additions & 0 deletions docs/design/0004-cache-python-dependencies-in-ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
date: 2025-01-09
decision-makers: ["@thomass-dev"]
---

# Cache python dependencies in CI

## Context and Problem Statement

The choice was made to support the 3 latests versions of `scikit-learn`, jointly with
the OS team.

To date, we already have a CI running `skore` tests with 2 OS, and 4 python versions.
The CI therefore runs the same job 2*4=8 times. To limit the installation time each time
we run the test, we use a native GH mechanism to persist the pip cache. That way, python
packages can be installed without the need to be re-download from PyPI. It's better than
nothing, but installation time is incompressibly slow.

Supporting 3 versions of `scikit-learn` means adding at least 2 additional test runs:
- the latest version is test on each OS/python versions (no additonal test run),
- the 2 older versions of scikit-learn are tested on ubuntu-latest python 3.12 (2
additional test runs)

The CI is becoming very long, we need to find a way to reduce its duration.

## Decision Outcome

Speed-up tests by installing python dependencies in virtual environments and caching it:

* pros: reduction of one minute per job when the cache is already built, otherwise
equivalent.
* cons: cache must be purged manually to install a new version of a dependency. It's not
a big deal for now, since the gain is higher than the constraint.

Dependencies no longer need to be installed at each step, as they are already present in
the cached venv.

We base the construction of the cache on the n-tuple OS, python, scikit-learn, and the
hash of the `pyproject.toml` file. This way, if the dependencies list changes, a new
cache is automatically built.

Each test run knows which cache to use, depending on its context.

Unused cache is automatically purged by GH after 90 days.

## More Information

Implementation in pull-request [#916](https://github.com/probabl-ai/skore/pull/916).
35 changes: 35 additions & 0 deletions docs/design/0005-use-python-lockfiles-in-ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
date: 2025-01-31
decision-makers: ["@thomass-dev"]
---

# Use python `test-requirements.txt` lockfiles in CI

## Context and Problem Statement

With the pull-request [#916](https://github.com/probabl-ai/skore/pull/916), we cache now
the python dependencies to speed-up CI. However, to update in the cache the version of a
dependency, for example, to take advantage of a bugfix, we have to manually purge the GH
cache from the GH cli. It's not scallable.

Moreover, in state of the CI, we can't control what version of a dependency is used. It
can lead to inconsistent tests, in particular when between two runs, a new version of a
dependency is released.

We therefore want to define the dependencies versions in the CI, while leaving the user
free to install what he wants.

## Decision Outcome

We can't fix dependencies version in the `pyproject.toml` file without impacting users.
Instead, we fix the versions in separate requirement files, intended to be used only by
the CI. These files must be rebuilt after each change in the dependencies list, to stay
in sync at all times with the `pyproject.toml` file.

These files are now used to construct the python cache in the CI.
These files are automatically managed by `dependabot`, to take account of new weekly
bugfixes. It produces pull-requests that must be accepted by maintainers.

## More Information

Implementation in pull-request [#1238](https://github.com/probabl-ai/skore/pull/1238).
71 changes: 71 additions & 0 deletions skore/ci/pip-compile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/bin/bash
#
# This script compiles all the `test-requirements.txt` files, based on combinations of
# `python` and `scikit-learn` versions. These combinations mirror those defined in the
# GitHub `backend` workflow.
#
# You can pass any piptools parameter:
#
# $ bash pip-compile.sh --upgrade
#

CWD="$PWD"
TMPDIR=$(mktemp -d)

# Make sure that `TMPDIR` is removed on exit, whatever the signal
trap 'rm -rf ${TMPDIR}' 0

# Declare the combinations of `python` and `scikit-learn` versions
declare -a COMBINATIONS

COMBINATIONS[0]='3.9;1.6'
COMBINATIONS[1]='3.10;1.6'
COMBINATIONS[2]='3.11;1.6'
COMBINATIONS[3]='3.12;1.4'
COMBINATIONS[4]='3.12;1.5'
COMBINATIONS[5]='3.12;1.6'

set -eu

(
# Copy everything necessary to compile requirements in `TMPDIR`
cp -r .. "${TMPDIR}/skore"
cp ../../LICENSE "${TMPDIR}/LICENSE"
cp ../../README.md "${TMPDIR}/README.md"

# Move to `TMPDIR` to avoid absolute paths in requirements file
cd "${TMPDIR}"

for combination in "${COMBINATIONS[@]}"
do
IFS=";" read -r -a combination <<< "${combination}"

python="${combination[0]}"
scikit_learn="${combination[1]}"
filepath="${CWD}/requirements/python-${python}/scikit-learn-${scikit_learn}/test-requirements.txt"

echo "Generating requirements: python ==${python} | scikit-learn ==${scikit_learn}"

# Force the `python` version by creating the appropriate virtual environment
pyenv local "${python}"
python -m venv "python-${python}"
source "python-${python}/bin/activate"

# Force the `scikit-learn` version by overloading test requirements
sed -i "s/scikit-learn.*/scikit-learn==${scikit_learn}.*/g" skore/test-requirements.in

# Create the requirements file tree
mkdir -p $(dirname "${filepath}")

# Create the requirements file
python -m pip install --upgrade pip pip-tools --quiet
python -m piptools compile \
--quiet \
--no-strip-extras \
--no-header \
--extra=test \
--output-file="${filepath}" \
"${@:2}" \
skore/pyproject.toml
done
)
1 change: 1 addition & 0 deletions skore/ci/requirements/python-3.10/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10
Loading

0 comments on commit b6a2e75

Please sign in to comment.