Skip to content

Commit

Permalink
GH-43951: [CI][Python] Use GitHub Packages for vcpkg cache (#44644)
Browse files Browse the repository at this point in the history
### Rationale for this change

We're using only Docker level cache for vcpkg used for wheels. If we have any vcpkg related changes, all vcpkg ports are rebuilt. It's time consuming.

### What changes are included in this PR?

Enable NuGet + GitHub Packages based cache. It's port level cache. So we don't need to rebuild all ports when we have any vcpkg related changes.

See also: https://learn.microsoft.com/en-us/vcpkg/consume/binary-caching-github-packages

NuGet + GitHub Packages based cache isn't enabled with manylinux2014 + aarch64. Because EPEL for CentOS 7 + aarch64 provides old Mono. (FYI: EPEL for CentOS 7 + x86_64 provides newer Mono.) We can't use old Mono to run NuGet on Linux.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #43951

Lead-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
  • Loading branch information
3 people authored Nov 15, 2024
1 parent 736d706 commit df40f7a
Show file tree
Hide file tree
Showing 7 changed files with 117 additions and 47 deletions.
43 changes: 27 additions & 16 deletions ci/docker/python-wheel-manylinux.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -69,36 +69,47 @@ RUN /arrow/ci/scripts/install_ccache.sh ${ccache} /usr/local
ARG vcpkg
COPY ci/vcpkg/*.patch \
ci/vcpkg/*linux*.cmake \
ci/vcpkg/vcpkg.json \
arrow/ci/vcpkg/
COPY ci/scripts/install_vcpkg.sh \
arrow/ci/scripts/
ENV VCPKG_ROOT=/opt/vcpkg
ARG build_type=release
ENV CMAKE_BUILD_TYPE=${build_type} \
VCPKG_FORCE_SYSTEM_BINARIES=1 \
VCPKG_OVERLAY_TRIPLETS=/arrow/ci/vcpkg \
PATH="${PATH}:${VCPKG_ROOT}" \
VCPKG_DEFAULT_TRIPLET=${arch_short}-linux-static-${build_type} \
VCPKG_FEATURE_FLAGS="manifests"

RUN arrow/ci/scripts/install_vcpkg.sh ${VCPKG_ROOT} ${vcpkg}
ENV PATH="${PATH}:${VCPKG_ROOT}"

COPY ci/vcpkg/vcpkg.json arrow/ci/vcpkg/
# cannot use the S3 feature here because while aws-sdk-cpp=1.9.160 contains
# ssl related fixes as well as we can patch the vcpkg portfile to support
# arm machines it hits ARROW-15141 where we would need to fall back to 1.8.186
# but we cannot patch those portfiles since vcpkg-tool handles the checkout of
# previous versions => use bundled S3 build
RUN vcpkg install \
VCPKG_FEATURE_FLAGS="manifests" \
VCPKG_FORCE_SYSTEM_BINARIES=1 \
VCPKG_OVERLAY_TRIPLETS=/arrow/ci/vcpkg
# For --mount=type=secret: The GITHUB_TOKEN is the only real secret but we use
# --mount=type=secret for GITHUB_REPOSITORY_OWNER and
# VCPKG_BINARY_SOURCES too because we don't want to store them
# into the built image in order to easily reuse the built image cache.
#
# For vcpkg install: cannot use the S3 feature here because while
# aws-sdk-cpp=1.9.160 contains ssl related fixes as well as we can
# patch the vcpkg portfile to support arm machines it hits ARROW-15141
# where we would need to fall back to 1.8.186 but we cannot patch
# those portfiles since vcpkg-tool handles the checkout of previous
# versions => use bundled S3 build
RUN --mount=type=secret,id=github_repository_owner \
--mount=type=secret,id=github_token \
--mount=type=secret,id=vcpkg_binary_sources \
export GITHUB_REPOSITORY_OWNER=$(cat /run/secrets/github_repository_owner); \
export GITHUB_TOKEN=$(cat /run/secrets/github_token); \
export VCPKG_BINARY_SOURCES=$(cat /run/secrets/vcpkg_binary_sources); \
arrow/ci/scripts/install_vcpkg.sh ${VCPKG_ROOT} ${vcpkg} && \
vcpkg install \
--clean-after-build \
--x-install-root=${VCPKG_ROOT}/installed \
--x-manifest-root=/arrow/ci/vcpkg \
--x-feature=azure \
--x-feature=azure \
--x-feature=flight \
--x-feature=gcs \
--x-feature=json \
--x-feature=parquet \
--x-feature=s3
--x-feature=s3 && \
rm -rf ~/.config/NuGet/

# Make sure auditwheel is up-to-date
RUN pipx upgrade auditwheel
Expand Down
33 changes: 31 additions & 2 deletions ci/scripts/install_vcpkg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# specific language governing permissions and limitations
# under the License.

set -e
set -eu

if [ "$#" -lt 1 ]; then
echo "Usage: $0 ``<target-directory> [<vcpkg-version> [<vcpkg-ports-patch>]]"
Expand All @@ -42,7 +42,7 @@ pushd ${vcpkg_destination}

git checkout "${vcpkg_version}"

if [[ "$OSTYPE" == "msys" ]]; then
if [[ "${OSTYPE:-}" == "msys" ]]; then
./bootstrap-vcpkg.bat -disableMetrics
else
./bootstrap-vcpkg.sh -disableMetrics
Expand All @@ -53,4 +53,33 @@ if [ -f "${vcpkg_ports_patch}" ]; then
echo "Patch successfully applied to the VCPKG port files!"
fi

if [ -n "${GITHUB_TOKEN:-}" ] && \
[ -n "${GITHUB_REPOSITORY_OWNER:-}" ] && \
[ "${VCPKG_BINARY_SOURCES:-}" = "clear;nuget,GitHub,readwrite" ] ; then
if type dnf 2>/dev/null; then
dnf install -y epel-release
dnf install -y mono-complete
curl \
--location \
--output "${vcpkg_destination}/nuget" \
https://dist.nuget.org/win-x86-commandline/latest/nuget.exe
fi
PATH="${vcpkg_destination}:${PATH}"
nuget_url="https://nuget.pkg.github.com/${GITHUB_REPOSITORY_OWNER}/index.json"
nuget="$(vcpkg fetch nuget | tail -n 1)"
if type mono 2>/dev/null; then
nuget="mono ${nuget}"
fi
${nuget} \
sources add \
-source "${nuget_url}" \
-storepasswordincleartext \
-name "GitHub" \
-username "${GITHUB_REPOSITORY_OWNER}" \
-password "${GITHUB_TOKEN}"
${nuget} \
setapikey "${GITHUB_TOKEN}" \
-source "${nuget_url}"
fi

popd
10 changes: 10 additions & 0 deletions dev/tasks/java-jars/github.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@

{{ macros.github_header() }}

permissions:
packages: write

jobs:

build-cpp-ubuntu:
Expand Down Expand Up @@ -51,7 +54,14 @@ jobs:
- name: Build C++ libraries
env:
{{ macros.github_set_sccache_envvars()|indent(8) }}
GITHUB_TOKEN: {{ '${{ secrets.GITHUB_TOKEN }}' }}
run: |
if [ "${ARCH}" = "arm64v8" ]; then
# We can't use NuGet on manylinux2014_aarch64 because Mono is old.
:
else
export VCPKG_BINARY_SOURCES="clear;nuget,GitHub,readwrite"
fi
archery docker run \
-e ARROW_JAVA_BUILD=OFF \
-e ARROW_JAVA_TEST=OFF \
Expand Down
16 changes: 15 additions & 1 deletion dev/tasks/python-wheels/github.linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@

{{ macros.github_header() }}

permissions:
packages: write

jobs:
build:
name: "Build wheel for manylinux {{ manylinux_version }}"
Expand Down Expand Up @@ -49,7 +52,18 @@ jobs:

- name: Build wheel
shell: bash
run: archery docker run -e SETUPTOOLS_SCM_PRETEND_VERSION={{ arrow.no_rc_version }} python-wheel-manylinux-{{ manylinux_version }}
env:
GITHUB_TOKEN: {{ '${{ secrets.GITHUB_TOKEN }}' }}
run: |
if [ "{{ manylinux_version }}" = "2014" ] && [ "{{ arch }}" = "arm64" ]; then
# We can't use NuGet on manylinux2014_aarch64 because Mono is old.
:
else
export VCPKG_BINARY_SOURCES="clear;nuget,GitHub,readwrite"
fi
archery docker run \
-e SETUPTOOLS_SCM_PRETEND_VERSION={{ arrow.no_rc_version }} \
python-wheel-manylinux-{{ manylinux_version }}
- uses: actions/upload-artifact@v4
with:
Expand Down
21 changes: 5 additions & 16 deletions dev/tasks/python-wheels/github.osx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@
VCPKG_OVERLAY_TRIPLETS: {{ "${{ github.workspace }}/arrow/ci/vcpkg" }}
VCPKG_ROOT: {{ "${{ github.workspace }}/vcpkg" }}

permissions:
packages: write

jobs:
build:
name: Build wheel for Python {{ python_version }} on macOS
Expand Down Expand Up @@ -69,27 +72,13 @@ jobs:
echo "VCPKG_VERSION=$vcpkg_version" >> $GITHUB_ENV
- name: Install Vcpkg
env:
GITHUB_TOKEN: {{ '${{ secrets.GITHUB_TOKEN }}' }}
run: arrow/ci/scripts/install_vcpkg.sh $VCPKG_ROOT $VCPKG_VERSION

- name: Add Vcpkg to PATH
run: echo ${VCPKG_ROOT} >> $GITHUB_PATH

- name: Setup NuGet Credentials
env:
GITHUB_TOKEN: {{ '${{ secrets.GITHUB_TOKEN }}' }}
run: |
mono $(vcpkg fetch nuget | tail -n 1) \
sources add \
-source "https://nuget.pkg.github.com/$GITHUB_REPOSITORY_OWNER/index.json" \
-storepasswordincleartext \
-name "GitHub" \
-username "$GITHUB_REPOSITORY_OWNER" \
-password "$GITHUB_TOKEN" \
mono $(vcpkg fetch nuget | tail -n 1) \
setapikey "$GITHUB_TOKEN" \
-source "https://nuget.pkg.github.com/$GITHUB_REPOSITORY_OWNER/index.json"
- name: Install Packages
run: |
vcpkg install \
Expand Down
4 changes: 3 additions & 1 deletion dev/tasks/python-wheels/github.windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ jobs:
# note that we don't run docker build since there wouldn't be a cache hit
# and rebuilding the dependencies takes a fair amount of time
REPO: ghcr.io/ursacomputing/arrow
# BuildKit isn't really supported on Windows for now
# BuildKit isn't really supported on Windows for now.
# NuGet + GitHub Packages based vcpkg cache is also disabled for now.
# Because secret mount requires BuildKit.
DOCKER_BUILDKIT: 0

steps:
Expand Down
37 changes: 26 additions & 11 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,26 +53,31 @@
#
# See more in cpp/build-support/run-test.sh::print_coredumps

x-common: &common
GITHUB_ACTIONS:

x-ccache: &ccache
CCACHE_COMPILERCHECK: content
CCACHE_COMPRESS: 1
CCACHE_COMPRESSLEVEL: 6
CCACHE_MAXSIZE: 1G
CCACHE_DIR: /ccache

x-common: &common
GITHUB_ACTIONS:

x-cpp: &cpp
ARROW_RUNTIME_SIMD_LEVEL:
ARROW_SIMD_LEVEL:

x-sccache: &sccache
AWS_ACCESS_KEY_ID:
AWS_SECRET_ACCESS_KEY:
SCCACHE_BUCKET:
SCCACHE_REGION:
SCCACHE_S3_KEY_PREFIX: ${SCCACHE_S3_KEY_PREFIX:-sccache}

x-cpp: &cpp
ARROW_RUNTIME_SIMD_LEVEL:
ARROW_SIMD_LEVEL:
x-vcpkg-build-secrets: &vcpkg-build-secrets
- github_repository_owner
- github_token
- vcpkg_binary_sources

# CPU/memory limit presets to pass to Docker.
#
Expand Down Expand Up @@ -1123,14 +1128,15 @@ services:
arch: ${ARCH}
arch_short: ${ARCH_SHORT}
base: quay.io/pypa/manylinux2014_${ARCH_ALIAS}:2024-08-03-32dfa47
vcpkg: ${VCPKG}
manylinux: 2014
python: ${PYTHON}
python_abi_tag: ${PYTHON_ABI_TAG}
manylinux: 2014
vcpkg: ${VCPKG}
context: .
dockerfile: ci/docker/python-wheel-manylinux.dockerfile
cache_from:
- ${REPO}:${ARCH}-python-${PYTHON}-wheel-manylinux-2014-vcpkg-${VCPKG}
secrets: *vcpkg-build-secrets
environment:
<<: [*common, *ccache]
volumes:
Expand All @@ -1147,14 +1153,15 @@ services:
arch: ${ARCH}
arch_short: ${ARCH_SHORT}
base: quay.io/pypa/manylinux_2_28_${ARCH_ALIAS}:2024-08-03-32dfa47
vcpkg: ${VCPKG}
manylinux: 2_28
python: ${PYTHON}
python_abi_tag: ${PYTHON_ABI_TAG}
manylinux: 2_28
vcpkg: ${VCPKG}
context: .
dockerfile: ci/docker/python-wheel-manylinux.dockerfile
cache_from:
- ${REPO}:${ARCH}-python-${PYTHON}-wheel-manylinux-2-28-vcpkg-${VCPKG}
secrets: *vcpkg-build-secrets
environment:
<<: [*common, *ccache]
volumes:
Expand Down Expand Up @@ -1239,8 +1246,8 @@ services:
image: ${REPO}:python-${PYTHON}-wheel-windows-vs2019-vcpkg-${VCPKG}-${PYTHON_WHEEL_WINDOWS_IMAGE_REVISION}
build:
args:
vcpkg: ${VCPKG}
python: ${PYTHON}
vcpkg: ${VCPKG}
context: .
dockerfile: ci/docker/python-wheel-windows-vs2019.dockerfile
# This should make the pushed images reusable, but the image gets rebuilt.
Expand Down Expand Up @@ -2119,3 +2126,11 @@ services:
/bin/bash -c "
git config --global --add safe.directory /arrow &&
/arrow/dev/release/verify-release-candidate.sh $${VERIFY_VERSION} $${VERIFY_RC}"
secrets:
github_repository_owner:
environment: GITHUB_REPOSITORY_OWNER
github_token:
environment: GITHUB_TOKEN
vcpkg_binary_sources:
environment: VCPKG_BINARY_SOURCES

0 comments on commit df40f7a

Please sign in to comment.