Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commitment of clickhouse official image #15846

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Felixoid
Copy link
Contributor

@Felixoid Felixoid commented Dec 6, 2023

Hello, dear official library team. Here's our attempt to create an official image clickhouse in the scope of #14136 and ClickHouse/ClickHouse#31473

Both the docs and official images PRs are created simultaneously.

docker-library/docs#2397

Checklist for Review

NOTE: This checklist is intended for the use of the Official Images maintainers both to track the status of your PR and to help inform you and others of where we're at. As such, please leave the "checking" of items to the repository maintainers. If there is a point below for which you would like to provide additional information or note completion, please do so by commenting on the PR. Thanks! (and thanks for staying patient with us ❤️)

  • associated with or contacted upstream?
  • available under an OSI-approved license?
  • does it fit into one of the common categories? ("service", "language stack", "base distribution")
  • is it reasonably popular, or does it solve a particular use case well?
  • does a documentation PR exist? (should be reviewed and merged at roughly the same time so that we don't have an empty image page on the Hub for very long) Initial commitment of documentation for clickhouse official image docs#2397
  • official-images maintainer dockerization review for best practices and cache gotchas/improvements (ala the official review guidelines)?
  • 2+ official-images maintainer dockerization review?
  • existing official images have been considered as a base? (ie, if foobar needs Node.js, has FROM node:... instead of grabbing node via other means been considered?)
  • if FROM scratch, tarballs only exist in a single commit within the associated history?
  • passes current tests? any simple new tests that might be appropriate to add? (https://github.com/docker-library/official-images/tree/master/test)

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 6, 2023

This PR is still in draft mode to adjust the automation in ClickHouse/ClickHouse#57203

The branch now is in the WIP mode, so it will be regenerated to the refs/heads/master before the merge

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 6, 2023

Interesting case, I am addressing it in our master, which will regenerate the LDF

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Unfortunately having the Dockerfile in the root of the upstream repository (while useful for local development) doesn't work well as the target for official-images. Since the Dockerfile uses COPY . ..., it essentially means the entire upstream source code is now part of the Docker build context, and thus becomes part of the reviewed artifacts (see diff-pr.sh for how we generate our diff; it is basically taking all docker build contexts from the starting library/foo file and diffing them to the new contexts from the PR).

This is why the "Diff Comment" CI job is failing 🙈

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Apologies, I misread the PR -- I think that might actually be failing because it's backfilling too many versions 😅

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Are all of these major.minor combinations still actively supported? (By which I mean that if they had a problem, there would be a new release/update?)

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 8, 2023

Hi Tianon, thanks for the review!

Indeed, the failure of Diff Comment is about the POST body.

Are all of these major.minor combinations still actively supported? (By which I mean that if they had a problem, there would be a new release/update?)

No, the file for currently supported versions is the following:

# The file is generated by https://github.com/ClickHouse/ClickHouse/blob/e2f425011c589c5e6d8b1d4f67ccd7a8730158a4/tests/ci/official_docker.py

Maintainers: Misha f. Shiryaev <[email protected]> (@Felixoid),
             Max Kainov <[email protected]> (@mkaynov),
             Alexander Sapin <[email protected]> (@alesapin)
GitRepo: https://github.com/ClickHouse/ClickHouse.git
GitFetch: refs/heads/docker-library
GitCommit: 43208ec354e4873f59508c45b81204b9f7bf439e

Tags: alpine, 23-alpine, 23.11-alpine, 23.11.1-alpine, 23.11.1.2711-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.11.1.2711
File: Dockerfile.alpine

Tags: latest, focal, 23, 23-focal, 23.11, 23.11-focal, 23.11.1, 23.11.1-focal, 23.11.1.2711, 23.11.1.2711-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.11.1.2711
File: Dockerfile.ubuntu

Tags: 23.10-alpine, 23.10.5-alpine, 23.10.5.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.10.5.20
File: Dockerfile.alpine

Tags: 23.10, 23.10-focal, 23.10.5, 23.10.5-focal, 23.10.5.20, 23.10.5.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.10.5.20
File: Dockerfile.ubuntu

Tags: 23.9-alpine, 23.9.6-alpine, 23.9.6.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.9.6.20
File: Dockerfile.alpine

Tags: 23.9, 23.9-focal, 23.9.6, 23.9.6-focal, 23.9.6.20, 23.9.6.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.9.6.20
File: Dockerfile.ubuntu

Tags: alpine-lts, 23.8-alpine, 23.8.8-alpine, 23.8.8.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.8.8.20
File: Dockerfile.alpine

Tags: lts, focal-lts, 23.8, 23.8-focal, 23.8.8, 23.8.8-focal, 23.8.8.20, 23.8.8.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.8.8.20
File: Dockerfile.ubuntu

Tags: 23.3-alpine, 23.3.18-alpine, 23.3.18.15-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.3.18.15
File: Dockerfile.alpine

Tags: 23.3, 23.3-focal, 23.3.18, 23.3.18-focal, 23.3.18.15, 23.3.18.15-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.3.18.15
File: Dockerfile.ubuntu

I've generated all the versions that could be built at the moment, so following the next sentence from README:

When a new repository is proposed, it is common to include some older unsupported versions in the initial pull request with the agreement to remove them right after acceptance.

It's pretty easy to rebuild the file to any version, so I am very open to advice on what to use as the minimal one.

@tianon
Copy link
Member

tianon commented Dec 11, 2023

Ah, let's start with just 23.11 (unless there's a significant difference in the Dockerization for older versions) and see if that gets the diff small enough. If we start there and the delta between that and other versions is small, then future diffs will be small too because they'll all diff against the merged version instead of all versions being considered brand new code.

@tianon
Copy link
Member

tianon commented Dec 11, 2023

For a small bit of initial Dockerization review, you'll want to take a look at #10794 (Alpine + glibc is going to be a non-starter).

Additionally, having /bin/sh or /bin/bash as PID1 while the server is running (speaking of the while true loop I see in the entrypoint which is effectively acting as a poor-man's process supervisor) is going to be a very steep uphill battle to convince us it's stable and reliable, especially in the face of receiving signals and error handling / process death edge cases.

Beyond that, please be patient -- new image review requires much higher maintainer bandwidth than update review does, so we do have a backlog.

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 11, 2023

Ah, let's start with just 23.11

Yep, all versions have the same content now

For a small bit of initial Dockerization review, you'll want to take a look at #10794 (Alpine + glibc is going to be a non-starter).

I think it's related to the part COPY --from=glibc-donor? So, it looks like we won't be able to build alpine images. Looks fair to me, we can take them off the discussion

Additionally, having /bin/sh or /bin/bash as PID1 while the server is running (speaking of the while true loop I see in the entrypoint which is effectively acting as a poor-man's process supervisor) is going to be a very steep uphill battle to convince us it's stable and reliable, especially in the face of receiving signals and error handling / process death edge cases.

I feel you are referencing https://github.com/ClickHouse/ClickHouse/blob/master/docker/server/entrypoint.sh#L132-L139. It's the bringing the clickhouse-server up for the initial setup of DB. And it has the fuse to check that the server is actually finished after it. The real server works as the only process in the container after exec command. It's similar to docker_temp_server_start, docker_process_init_files, docker_temp_server_stop

Beyond that, please be patient -- new image review requires much higher maintainer bandwidth than update review does, so we do have a backlog.

Thank you! It's very nice of you to have a transparent process 🙏

This comment has been minimized.

@Felixoid Felixoid marked this pull request as ready for review January 8, 2024 11:48
@Felixoid
Copy link
Contributor Author

Felixoid commented Jan 8, 2024

Happy New Year, dear colleagues!

Is there anything we could do from our site to boost the review, maybe?

@Felixoid
Copy link
Contributor Author

Felixoid commented Feb 9, 2024

Hello, dear @tianon @yosifkit. It's already two months since the latest update. Any feedback?

@Felixoid
Copy link
Contributor Author

Colleagues? It's already three months since the last review round. Is it even possible to add the image to the library?

@melvynator
Copy link

@tianon Is it possible to have an estimate of when this PR can be reviewed?

@whalelines
Copy link
Contributor

We cannot give an estimate, but we will review this PR as soon as we can. Reviewing new DOI submissions is quite time intensive and we have to balance that commitment with ensuring the normal flow of updates to existing DOI.

It can help to speed things up if you ensure your PR adheres to our contribution guidelines, https://github.com/docker-library/official-images?tab=readme-ov-file#contributing-to-the-standard-library .

We apologize for the delay and truly appreciate your patience through this process.

@Felixoid
Copy link
Contributor Author

Kindly hope for the next review round

@tbragin
Copy link

tbragin commented Jun 17, 2024

Docker Team - please let us know if there are any updates, and thank you in advance!

@whalelines
Copy link
Contributor

Performing another review on the current changes is still in our queue. Unfortunately, our queue is quite long at present and reviews of new DOI are quite time consuming. We will get to this as soon as we possibly can. Thanks for being patient!

@tianon
Copy link
Member

tianon commented Nov 7, 2024

Blockers

  • Initial commitment of clickhouse official image #15846 (comment)

    As noted previously, we don't necessarily directly control the clone process, so updating bashbrew is not a solution and this is a hard-blocker. My tests with BuildKit against the current monorepo took three full minutes to error out, failing to find the commit in the end.

    I would suggest a separate repository -- it can auto-sync from the monorepo if that is what you prefer (I'd suggest git filter-branch to accomplish that in a way that preserves the commits, but that's up-to-you also).

  • what does if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then do? is the conditional necessary? (it appears to always run this block and thus the conditional is not necessary and should be removed)

  • with the appropriate GnuPG dance in place, is apt-get install --allow-unauthenticated really still necessary?

  • the CLICKHOUSE_DOCKER_RESTART_ON_EXIT functionality is the while true referenced above

Questions

  • setting LANG and LANGUAGE and LC_ALL is weird -- is that really necessary?

  • setting TZ explicitly is odd (we actually have an explicit test we run against all images to verify they're in UTC already, so images don't need to be explicit like that)

  • many parts of the entrypoint script rely on clickhouse being in PATH but then others use /usr/bin/clickhouse explicitly -- is that intentional?

  • apt-get autoremove --purge -yq libksba8 🤔 why this package? what pulled it in that is no longer required? (also the autoremove after that purge is a no-op because this command will remove anything else dangling too)

Recommendations

  • UID/GID 101 is really low, with a high chance of future collisions (with distro updates, package installs, etc)

    we normally suggest something higher than 1000, with over 2000 being less likely to collide with users' systems

  • CLICKHOUSE_UID/CLICKHOUSE_GID is extremely unusual -- for users who need a specific UID/GID combination, we recommend using --user in the runtime/orchestration system (especially combined with --security-opt no-new-privileges) so that they also get the tighter security benefits associated with that

  • it's definitely worth considering adding set -u, especially as all the variable references already appear to handle it appropriately

  • I realize you probably need to have the entrypoint script, but I will tell you that our own experience maintaining scripts like it for PostgreSQL, MySQL, MongoDB, etc has been that it's extremely fragile, so if there's any way you could get away with avoiding it, you'd be doing yourself a favor (in environments like Kubernetes, initialization behavior is typically better handled by an explicit initContainer, for example).

    • related to that, Should failing entrypoint scripts clean up data directory? postgres#159 is a particularly nasty failure mode that's worth being aware of / thinking about (--restart=always where the initialization fails, but only partway through, then the container restarts and comes right up but with an only halfway initialized database and no indication that there is anything wrong unless they look at the restart count on the container, which isn't exactly shown anywhere obvious)
  • apt-transport-https is a transitional package (even in Ubuntu 20.04), and can be removed 👀

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 7, 2024

Hey, @tianon, many thanks for the comment!

I'll try to answer everything I know now. Some of the points are already addressed in ClickHouse/ClickHouse#71573, and some of them (like dedicated repo) will be improved ASAP.

Blockers

  • As much as I'd like to avoid a different repo, I see your point. I'll make a dedicated repository. It's done.
  • The Dockerfile is a limited version of our Dockerfile.ubuntu, and I try my best not to split it apart. Our CI uses it internally for different building scenarios, including testing and release processes. As you see, it falls back to the non-ARG. I see the point in removing it, although it would make maintaining the DOI much harder for us.
  • No, --allow-unauthenticated is still there by an accident. Consider it's as already removed.
  • It was an interesting hack, but we'll eliminate the CLICKHOUSE_DOCKER_RESTART_ON_EXIT loop.

Questions

  • LANG+LANGUAGE+LC_ALL were introduced a long time ago in Docker fixes (related to #3695) ClickHouse/ClickHouse#3806. I already asked the developer who introduced them. Looks like setting only LANG should be enough.
  • TZ is related to the ClickHouse's feature. It resolved the tzdata symlinks, so without setting it timezone is set internally to Etc/UTC, see the comment Not UTC timezone given when connection is etablished ClickHouse/ClickHouse#6608 (comment)
  • The inconsistency of clickhouse + /usr/bin/clickhouse is rather a bit of a mess in a project with a long history. I'll use an opportunity to clean it in favor of clickhouse.
  • libksba8 was a dependency of dirmngr gnupg2. apt-get autoremove --purge -yq libksba8 and apt-get autoremove --purge -yq dirmngr gnupg2 behaves exactly the same, so the latter looks much better.

Recommendations

  • Ideally, The official library image should be a drop-in replacement for existing clickhouse/clickhouse-server. Since it's rather a recommendation, I feel the value for the community to simply replace clickhouse/clickhouse-server with clickhouse in their k8s/docker-compose/scripts/etc. is very high.
  • The --user scenario is supported, it is the case id -u != 0. I'd say, CLICKHOUSE_UID and CLICKHOUSE_GID are there for historical reason, although we can deprecate it? 🤔 are deprecated, and we'll remove them in a couple of releases. I agree, people should use the --user exclusively.
  • I tried to enable it. Unfortunately, even Shellcheck fails to help identify possible issues there SC2154: Warn for set -u variables that will fail to expand, e.g. also in an if-condition koalaman/shellcheck#2779 (comment)
  • Yeah, it's rather, again, an argument in favor of compatibility with existing images.
  • Oh, I didn't know it's a noop since sometimes. For sure, I'll clean this out.

This comment has been minimized.

@tianon
Copy link
Member

tianon commented Nov 7, 2024

Nice, thank you!

Regarding RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then, what if we swapped it to something like if true; then so it still matches really obviously for you, and is then obviously a no-op for us? Would that be an acceptable middle ground?

The last thing I see while re-reviewing (that I thought I'd included in my last round but I guess I missed!) was this block:

        mkdir="clickhouse su ""${USER}:${GROUP}"" mkdir"
    fi
    if ! $mkdir -p "$dir"; then

100% not a blocker, but if you wanted this to quote $USER:$GROUP properly in the actual mkdir invocation you'll want to use eval, something like this instead:

        mkdir='clickhouse su "${USER}:${GROUP}" mkdir'
    fi
    mkdir+=' -p "$dir"'
    if ! eval "$mkdir"; then

or, if you prefer, Bash arrays (which shellcheck might be less loud about):

        mkdir=( mkdir )
...
        mkdir=( clickhouse su "${USER}:${GROUP}" mkdir )
    fi
    if ! "${mkdir[@]}" -p "$dir"; then

Thank you for your patience overall on this -- I'll take a look at the docs PR now. 👀

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 7, 2024

Regarding RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then, what if we swapped it to something like if true; then so it still matches really obviously for you, and is then obviously a no-op for us? Would that be an acceptable middle ground?

Hmm, what about I change it to the other way around? clickhouse local -q "SELECT ''" && exit 0 || : ; \. So, it will work for us, and the RUN block won't be executed for special cases. And it looks a bit less dirty.

mkdir=( clickhouse su "${USER}:${GROUP}" mkdir ) looks like an excellent bashism to me. I'll add it to the review-related PR.

This comment has been minimized.

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 7, 2024

I've found when and how CLICKHOUSE_UID/CLICKHOUSE_GID was implemented ClickHouse/ClickHouse#11003

Not sure what exactly the issue was with openshift and if we can break it badly.

@tianon
Copy link
Member

tianon commented Nov 7, 2024

OpenShift runs the image with a totally random UID (--user) -- it looks like the use case for that was to run the image as root explicitly to support folks in user namespaces / "rootless mode" where root is mapped to their user on the host, but that's also arguably not a configuration I'd personally recommend (UID 0 in the kernel is still a special case in a lot of places/ways, so non-0 is suggested even in user namespaces). Perhaps a "yes I really want to run the image as root and not switch to the clickhouse user" variable would be sufficient?

Re clickhouse local, I can live with that change, thanks! ❤️

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 7, 2024

Perhaps a "yes I really want to run the image as root and not switch to the clickhouse user" variable would be sufficient?

So, should we replace CLICKHOUSE_UID/CLICKHOUSE_GID with CLICKHOUSE_RUN_AS_ROOT? That sounds fair and is a pretty solid implementation. Otherwise, people should use the --user argument to set the container UID/GID.

@tianon
Copy link
Member

tianon commented Nov 7, 2024

Yeah, that sounds a lot cleaner IMO (and pushes users to the more secure behavior) 👍 🚀

This comment has been minimized.

Copy link

github-actions bot commented Nov 8, 2024

Diff for fbf8b51:
diff --git a/_bashbrew-arches b/_bashbrew-arches
index 8b13789..e85a97f 100644
--- a/_bashbrew-arches
+++ b/_bashbrew-arches
@@ -1 +1,2 @@
-
+amd64
+arm64v8
diff --git a/_bashbrew-cat b/_bashbrew-cat
index bdfae4a..37c2135 100644
--- a/_bashbrew-cat
+++ b/_bashbrew-cat
@@ -1 +1,9 @@
-Maintainers: New Image! :D (@docker-library-bot)
+Maintainers: Misha f. Shiryaev <[email protected]> (@Felixoid), Max Kainov <[email protected]> (@mkaynov), Alexander Sapin <[email protected]> (@alesapin)
+GitRepo: https://github.com/ClickHouse/docker-library.git
+GitFetch: refs/heads/main
+GitCommit: 8f3278549dd6ac2832287c18c6a35a5f90eb3394
+
+Tags: latest, jammy, 24, 24-jammy, 24.10, 24.10-jammy, 24.10.1, 24.10.1-jammy, 24.10.1.2812, 24.10.1.2812-jammy
+Architectures: amd64, arm64v8
+Directory: server/24.10.1.2812
+File: Dockerfile.ubuntu
diff --git a/_bashbrew-list b/_bashbrew-list
index e69de29..9047fc8 100644
--- a/_bashbrew-list
+++ b/_bashbrew-list
@@ -0,0 +1,10 @@
+clickhouse:24
+clickhouse:24-jammy
+clickhouse:24.10
+clickhouse:24.10-jammy
+clickhouse:24.10.1
+clickhouse:24.10.1-jammy
+clickhouse:24.10.1.2812
+clickhouse:24.10.1.2812-jammy
+clickhouse:jammy
+clickhouse:latest
diff --git a/_bashbrew-list-build-order b/_bashbrew-list-build-order
index e69de29..669c034 100644
--- a/_bashbrew-list-build-order
+++ b/_bashbrew-list-build-order
@@ -0,0 +1 @@
+clickhouse:24.10.1.2812-jammy
diff --git a/clickhouse_24.10.1.2812-jammy/Dockerfile.ubuntu b/clickhouse_24.10.1.2812-jammy/Dockerfile.ubuntu
new file mode 100644
index 0000000..54b9722
--- /dev/null
+++ b/clickhouse_24.10.1.2812-jammy/Dockerfile.ubuntu
@@ -0,0 +1,84 @@
+FROM ubuntu:22.04
+
+# see https://github.com/moby/moby/issues/4032#issuecomment-192327844
+# It could be removed after we move on a version 23:04+
+ARG DEBIAN_FRONTEND=noninteractive
+
+# ARG for quick switch to a given ubuntu mirror
+ARG apt_archive="http://archive.ubuntu.com"
+
+# We shouldn't use `apt upgrade` to not change the upstream image. It's updated biweekly
+
+# user/group precreated explicitly with fixed uid/gid on purpose.
+# It is especially important for rootless containers: in that case entrypoint
+# can't do chown and owners of mounted volumes should be configured externally.
+# We do that in advance at the begining of Dockerfile before any packages will be
+# installed to prevent picking those uid / gid by some unrelated software.
+# The same uid / gid (101) is used both for alpine and ubuntu.
+RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list \
+    && groupadd -r clickhouse --gid=101 \
+    && useradd -r -g clickhouse --uid=101 --home-dir=/var/lib/clickhouse --shell=/bin/bash clickhouse \
+    && apt-get update \
+    && apt-get install --yes --no-install-recommends \
+        ca-certificates \
+        locales \
+        tzdata \
+        wget \
+    && rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
+
+ARG REPO_CHANNEL="stable"
+ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
+ARG VERSION="24.10.1.2812"
+ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
+
+
+# A fallback to installation from ClickHouse repository
+# It works unless the clickhouse binary already exists
+RUN clickhouse local -q 'SELECT 1' >/dev/null 2>&1 && exit 0 || : \
+    ; apt-get update \
+    && apt-get install --yes --no-install-recommends \
+        dirmngr \
+        gnupg2 \
+    && mkdir -p /etc/apt/sources.list.d \
+    && GNUPGHOME=$(mktemp -d) \
+    && GNUPGHOME="$GNUPGHOME" gpg --batch --no-default-keyring \
+        --keyring /usr/share/keyrings/clickhouse-keyring.gpg \
+        --keyserver hkp://keyserver.ubuntu.com:80 \
+        --recv-keys 3a9ea1193a97b548be1457d48919f6bd2b48d754 \
+    && rm -rf "$GNUPGHOME" \
+    && chmod +r /usr/share/keyrings/clickhouse-keyring.gpg \
+    && echo "${REPOSITORY}" > /etc/apt/sources.list.d/clickhouse.list \
+    && echo "installing from repository: ${REPOSITORY}" \
+    && apt-get update \
+    && for package in ${PACKAGES}; do \
+        packages="${packages} ${package}=${VERSION}" \
+    ; done \
+    && apt-get install --yes --no-install-recommends ${packages} || exit 1 \
+    && rm -rf \
+        /var/lib/apt/lists/* \
+        /var/cache/debconf \
+        /tmp/* \
+    && apt-get autoremove --purge -yq dirmngr gnupg2
+
+# post install
+# we need to allow "others" access to clickhouse folder, because docker container
+# can be started with arbitrary uid (openshift usecase)
+RUN clickhouse-local -q 'SELECT * FROM system.build_options' \
+    && mkdir -p /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client \
+    && chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client
+
+RUN locale-gen en_US.UTF-8
+ENV LANG en_US.UTF-8
+ENV TZ UTC
+
+RUN mkdir /docker-entrypoint-initdb.d
+
+COPY docker_related_config.xml /etc/clickhouse-server/config.d/
+COPY entrypoint.sh /entrypoint.sh
+
+EXPOSE 9000 8123 9009
+VOLUME /var/lib/clickhouse
+
+ENV CLICKHOUSE_CONFIG /etc/clickhouse-server/config.xml
+
+ENTRYPOINT ["/entrypoint.sh"]
diff --git a/clickhouse_24.10.1.2812-jammy/docker_related_config.xml b/clickhouse_24.10.1.2812-jammy/docker_related_config.xml
new file mode 100644
index 0000000..3025dc2
--- /dev/null
+++ b/clickhouse_24.10.1.2812-jammy/docker_related_config.xml
@@ -0,0 +1,12 @@
+<clickhouse>
+     <!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
+    <listen_host>::</listen_host>
+    <listen_host>0.0.0.0</listen_host>
+    <listen_try>1</listen_try>
+
+    <!--
+    <logger>
+        <console>1</console>
+    </logger>
+    -->
+</clickhouse>
diff --git a/clickhouse_24.10.1.2812-jammy/entrypoint.sh b/clickhouse_24.10.1.2812-jammy/entrypoint.sh
new file mode 100755
index 0000000..2f87008
--- /dev/null
+++ b/clickhouse_24.10.1.2812-jammy/entrypoint.sh
@@ -0,0 +1,222 @@
+#!/bin/bash
+
+set -eo pipefail
+shopt -s nullglob
+
+DO_CHOWN=1
+if [[ "${CLICKHOUSE_RUN_AS_ROOT:=0}" = "1" || "${CLICKHOUSE_DO_NOT_CHOWN:-0}" = "1" ]]; then
+    DO_CHOWN=0
+fi
+
+# CLICKHOUSE_UID and CLICKHOUSE_GID are kept for backward compatibility, but deprecated
+# One must use either "docker run --user" or CLICKHOUSE_RUN_AS_ROOT=1 to run the process as
+# FIXME: Remove ALL CLICKHOUSE_UID CLICKHOUSE_GID before 25.3
+if [[ "${CLICKHOUSE_UID:-}" || "${CLICKHOUSE_GID:-}" ]]; then
+    echo 'WARNING: Support for CLICKHOUSE_UID/CLICKHOUSE_GID will be removed in a couple of releases.' >&2
+    echo 'WARNING: Either use a proper "docker run --user=xxx:xxxx" argument instead of CLICKHOUSE_UID/CLICKHOUSE_GID' >&2
+    echo 'WARNING: or set "CLICKHOUSE_RUN_AS_ROOT=1" ENV to run the clickhouse-server as root:root' >&2
+fi
+
+# support `docker run --user=xxx:xxxx`
+if [[ "$(id -u)" = "0" ]]; then
+    if [[ "$CLICKHOUSE_RUN_AS_ROOT" = 1 ]]; then
+        USER=0
+        GROUP=0
+    else
+        USER="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
+        GROUP="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"
+    fi
+else
+    USER="$(id -u)"
+    GROUP="$(id -g)"
+    DO_CHOWN=0
+fi
+
+# set some vars
+CLICKHOUSE_CONFIG="${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}"
+
+# get CH directories locations
+DATA_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=path || true)"
+TMP_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=tmp_path || true)"
+USER_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=user_files_path || true)"
+LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.log || true)"
+LOG_DIR=""
+if [ -n "$LOG_PATH" ]; then LOG_DIR="$(dirname "$LOG_PATH")"; fi
+ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.errorlog || true)"
+ERROR_LOG_DIR=""
+if [ -n "$ERROR_LOG_PATH" ]; then ERROR_LOG_DIR="$(dirname "$ERROR_LOG_PATH")"; fi
+FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=format_schema_path || true)"
+
+# There could be many disks declared in config
+readarray -t DISKS_PATHS < <(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key='storage_configuration.disks.*.path' || true)
+readarray -t DISKS_METADATA_PATHS < <(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key='storage_configuration.disks.*.metadata_path' || true)
+
+CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"
+CLICKHOUSE_PASSWORD_FILE="${CLICKHOUSE_PASSWORD_FILE:-}"
+if [[ -n "${CLICKHOUSE_PASSWORD_FILE}" && -f "${CLICKHOUSE_PASSWORD_FILE}" ]]; then
+  CLICKHOUSE_PASSWORD="$(cat "${CLICKHOUSE_PASSWORD_FILE}")"
+fi
+CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-}"
+CLICKHOUSE_DB="${CLICKHOUSE_DB:-}"
+CLICKHOUSE_ACCESS_MANAGEMENT="${CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT:-0}"
+
+function create_directory_and_do_chown() {
+    local dir=$1
+    # check if variable not empty
+    [ -z "$dir" ] && return
+    # ensure directories exist
+    if [ "$DO_CHOWN" = "1" ]; then
+        mkdir=( mkdir )
+    else
+        # if DO_CHOWN=0 it means that the system does not map root user to "admin" permissions
+        # it mainly happens on NFS mounts where root==nobody for security reasons
+        # thus mkdir MUST run with user id/gid and not from nobody that has zero permissions
+        mkdir=( clickhouse su "${USER}:${GROUP}" mkdir )
+    fi
+    if ! "${mkdir[@]}" -p "$dir"; then
+        echo "Couldn't create necessary directory: $dir"
+        exit 1
+    fi
+
+    if [ "$DO_CHOWN" = "1" ]; then
+        # ensure proper directories permissions
+        # but skip it for if directory already has proper premissions, cause recursive chown may be slow
+        if [ "$(stat -c %u "$dir")" != "$USER" ] || [ "$(stat -c %g "$dir")" != "$GROUP" ]; then
+            chown -R "$USER:$GROUP" "$dir"
+        fi
+    fi
+}
+
+create_directory_and_do_chown "$DATA_DIR"
+
+# Change working directory to $DATA_DIR in case there're paths relative to $DATA_DIR, also avoids running
+# clickhouse-server at root directory.
+cd "$DATA_DIR"
+
+for dir in "$ERROR_LOG_DIR" \
+  "$LOG_DIR" \
+  "$TMP_DIR" \
+  "$USER_PATH" \
+  "$FORMAT_SCHEMA_PATH" \
+  "${DISKS_PATHS[@]}" \
+  "${DISKS_METADATA_PATHS[@]}"
+do
+    create_directory_and_do_chown "$dir"
+done
+
+# if clickhouse user is defined - create it (user "default" already exists out of box)
+if [ -n "$CLICKHOUSE_USER" ] && [ "$CLICKHOUSE_USER" != "default" ] || [ -n "$CLICKHOUSE_PASSWORD" ] || [ "$CLICKHOUSE_ACCESS_MANAGEMENT" != "0" ]; then
+    echo "$0: create new user '$CLICKHOUSE_USER' instead 'default'"
+    cat <<EOT > /etc/clickhouse-server/users.d/default-user.xml
+    <clickhouse>
+      <!-- Docs: <https://clickhouse.com/docs/en/operations/settings/settings_users/> -->
+      <users>
+        <!-- Remove default user -->
+        <default remove="remove">
+        </default>
+
+        <${CLICKHOUSE_USER}>
+          <profile>default</profile>
+          <networks>
+            <ip>::/0</ip>
+          </networks>
+          <password><![CDATA[${CLICKHOUSE_PASSWORD//]]>/]]]]><![CDATA[>}]]></password>
+          <quota>default</quota>
+          <access_management>${CLICKHOUSE_ACCESS_MANAGEMENT}</access_management>
+        </${CLICKHOUSE_USER}>
+      </users>
+    </clickhouse>
+EOT
+fi
+
+CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS="${CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS:-}"
+
+# checking $DATA_DIR for initialization
+if [ -d "${DATA_DIR%/}/data" ]; then
+    DATABASE_ALREADY_EXISTS='true'
+fi
+
+# run initialization if flag CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS is not empty or data directory is empty
+if [[ -n "${CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS}" || -z "${DATABASE_ALREADY_EXISTS}" ]]; then
+  RUN_INITDB_SCRIPTS='true'
+fi
+
+if [ -n "${RUN_INITDB_SCRIPTS}" ]; then
+    if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
+        # port is needed to check if clickhouse-server is ready for connections
+        HTTP_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=http_port --try)"
+        HTTPS_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=https_port --try)"
+
+        if [ -n "$HTTP_PORT" ]; then
+            URL="http://127.0.0.1:$HTTP_PORT/ping"
+        else
+            URL="https://127.0.0.1:$HTTPS_PORT/ping"
+        fi
+
+        # Listen only on localhost until the initialization is done
+        clickhouse su "${USER}:${GROUP}" clickhouse-server --config-file="$CLICKHOUSE_CONFIG" -- --listen_host=127.0.0.1 &
+        pid="$!"
+
+        # check if clickhouse is ready to accept connections
+        # will try to send ping clickhouse via http_port (max 1000 retries by default, with 1 sec timeout and 1 sec delay between retries)
+        tries=${CLICKHOUSE_INIT_TIMEOUT:-1000}
+        while ! wget --spider --no-check-certificate -T 1 -q "$URL" 2>/dev/null; do
+            if [ "$tries" -le "0" ]; then
+                echo >&2 'ClickHouse init process failed.'
+                exit 1
+            fi
+            tries=$(( tries-1 ))
+            sleep 1
+        done
+
+        clickhouseclient=( clickhouse-client --multiquery --host "127.0.0.1" -u "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" )
+
+        echo
+
+        # create default database, if defined
+        if [ -n "$CLICKHOUSE_DB" ]; then
+            echo "$0: create database '$CLICKHOUSE_DB'"
+            "${clickhouseclient[@]}" -q "CREATE DATABASE IF NOT EXISTS $CLICKHOUSE_DB";
+        fi
+
+        for f in /docker-entrypoint-initdb.d/*; do
+            case "$f" in
+                *.sh)
+                    if [ -x "$f" ]; then
+                        echo "$0: running $f"
+                        "$f"
+                    else
+                        echo "$0: sourcing $f"
+                        # shellcheck source=/dev/null
+                        . "$f"
+                    fi
+                    ;;
+                *.sql)    echo "$0: running $f"; "${clickhouseclient[@]}" < "$f" ; echo ;;
+                *.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${clickhouseclient[@]}"; echo ;;
+                *)        echo "$0: ignoring $f" ;;
+            esac
+            echo
+        done
+
+        if ! kill -s TERM "$pid" || ! wait "$pid"; then
+            echo >&2 'Finishing of ClickHouse init process failed.'
+            exit 1
+        fi
+    fi
+else
+    echo "ClickHouse Database directory appears to contain a database; Skipping initialization"
+fi
+
+# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
+if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
+    # Watchdog is launched by default, but does not send SIGINT to the main process,
+    # so the container can't be finished by ctrl+c
+    CLICKHOUSE_WATCHDOG_ENABLE=${CLICKHOUSE_WATCHDOG_ENABLE:-0}
+    export CLICKHOUSE_WATCHDOG_ENABLE
+
+    # This replaces the shell script with the server:
+    exec clickhouse su "${USER}:${GROUP}" clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@"
+fi
+
+# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image
+exec "$@"

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing docker-library/docs#2397 (and we need to merge both together), but this looks solid IMO now ❤️

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 8, 2024

Should we add more images from the supported and/or historical branches in another PR?

@tianon
Copy link
Member

tianon commented Nov 8, 2024

Yep, let's do anything else in a follow-up; don't jinx it 😂 ❤️

(from our side, I just want @yosifkit to give this another set of eyes before we pull the trigger)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants