Skip to content

Latest commit

 

History

History
237 lines (138 loc) · 26.8 KB

README.md

File metadata and controls

237 lines (138 loc) · 26.8 KB

Frequently Asked Questions

As stewards of the official images and maintainers of many images ourselves, we often see a lot of questions that surface repeatedly. This repository is an attempt to gather some of those and provide some answers!

Table of Contents

  1. Frequently Asked Questions
    1. Table of Contents
    2. General Questions
      1. What do you mean by "Official"?
      2. An image's source changed in Git, now what?
      3. How are images built? (especially multiarch)
      4. What is bashbrew? Where can I download it?
      5. What do you mean by "Supported"?
      6. What's the difference between "Shared" and "Simple" tags?
    3. Image Building
      1. Why does my security scanner show that an image has CVEs?
      2. Why do so many official images build from source?
      3. HEALTHCHECK
      4. OpenPGP / GnuPG Keys and Verification
      5. Multi-stage Builds
      6. Why isn't there a Windows equivalent of docker-entrypoint.sh?
      7. Can I use a bot to make my image update PRs?
    4. Image Usage
      1. --link is deprecated!

General Questions

What do you mean by "Official"?

The name of this program was chosen in an attempt to reflect our upstream-first focus (although in hindsight, it's clear that it was a choice with some amount of confusion potential for which we're sorry).

See the readme of the github.com/docker-library/official-images repository for a more verbose overview of the program.

An image's source changed in Git, now what?

Let's walk through the full lifecycle of a change to an image to help explain the process better. We'll use the golang image as an example to help illustrate each step.

  1. a change gets committed to the relevant image source Git repository (either via direct commit, PR, or some automated process -- somehow some change is committed to the Git repository for the image source)

  2. a PR to the relevant library/xxx manifest file is created against https://github.com/docker-library/official-images (which is the source-of-truth for the official images program as a whole)

  3. that PR and a full diff of the actual Dockerfile and related build context files are then reviewed by the official images maintainers

  4. a basic build test is produced (by GitHub Actions) on amd64 (to ensure that it will likely build properly on the real build servers if accepted, and to run a small series of official images tests against the built image)

  5. once merged, the official images build infrastructure will pick up the changes and build and push to the relevant per-architecture repositories (amd64/xxx, arm64v8/xxx, etc)

  6. after those jobs push updated artifacts to the architecture-specific repositories (amd64/xxx, arm64v8/xxx, etc), a separate job collects those updates into "index" objects (also known as "manifest lists") under library/xxx (which is the "default" namespace within Docker)

For images maintained by the docker-library team, we typically include a couple useful scripts in the repository itself, like ./update.sh and ./generate-stackbrew-library.sh, which help with automating simple version bumps via Dockerfile templating, and generating the contents of the library/xxx manifest file, respectively. We also have infrastructure which performs those version bumps along with a build and test and commits them directly to the relevant image repository (which is exactly how the illustrative golang a9171b commit referenced above was created).

How are images built? (especially multiarch)

Images are built via a semi-complex Jenkins infrastructure, and the sources for much of that can be found in the github.com/docker-library/oi-janky-groovy repository.

The actual infrastructure is a combination of machines provided by our generous donors:

For a more complete view of the full image change/publishing process, see "An image's source changed in Git, now what?" above.

Inclusion Criteria

See "What are 'Official Images'?" in the main project repository for a high-level overview of the focus and goals of the project in general.

Per the "New Image Checklist" (which is used to roughly track status during "New Image" reviews), one of the primary determinations we try to make is whether the image being proposed is "generally useful" and whether the software itself is "reasonably popular" and/or "solves a particular use case well" (to help focus our review bandwidth on things that will be helpful to as large a set of users as possible).

There are also specific Dockerfile writing guidelines which can be found in the "Review Guidelines" section of our documentation, which is used as a basis for a lot of our review process.

What is bashbrew? Where can I download it?

The bashbrew tool is one built by the official images team for the purposes of building and pushing the images. At a very high level, it's a wrapper around git and docker build in order to help us manage the various library/xxx files in the main official images repository in a simple and repeatable way (especially focused around using explicit Git commits in order to achieve maximum repeatability and Dockerfile source change reviewability).

The source code is in the github.com/docker-library/bashbrew repository. Precompiled artifacts (which are used on the official build servers) can be downloaded from the relevant Jenkins job or the GitHub releases.

What do you mean by "Supported"?

On every image description, there is a section entitled "Supported tags and respective Dockerfile links" (for example, see debian's Hub page).

Within the Official Images program, we use the word "Supported" to mean something like actively maintained. To put that another way, a particular 2.5.6 software release is considered supported if a severe bug being found would cause a 2.5.7 release (and once 2.5.7 is released, 2.5.6 is no longer considered supported, but the Docker Hub tag is typically left available for pulling -- it will simply never get rebuilt after that point given that it is unsupported).

See the "Library definition files" section of our maintainer documentation for more details.

What's the difference between "Shared" and "Simple" tags?

Some images have separated "Simple Tags" and "Shared Tags" sections under "Supported tags and respective Dockerfile links" (see the mongo image for an example).

"Simple Tags" are instances of a "single" Linux or Windows image. It is often a manifest list that can include the same image built for other architectures; for example, mongo:4.0-xenial currently has images for amd64 and arm64v8. The Docker daemon is responsible for picking the appropriate image for the host architecture.

"Shared Tags" are tags that always point to a manifest list which includes some combination of potentially multiple versions of Windows and Linux images across all their respective images' architectures -- in the mongo example, the 4.0 tag is a shared tag consisting of (at the time of this writing) all of 4.0-xenial, 4.0-windowsservercore-ltsc2016, 4.0-windowsservercore-1709, and 4.0-windowsservercore-1803.

The "Simple Tags" enable docker run mongo:4.0-xenial to "do the right thing" across architectures on a single platform (Linux in the case of mongo:4.0-xenial). The "Shared Tags" enable docker run mongo:4.0 to roughly work on both Linux and as many of the various versions of Windows that are supported (such as Windows Server Core LTSC 2016, where the Docker daemon is again responsible for determining the appropriate image based on the host platform and version).

Image Building

Why does my security scanner show that an image has CVEs?

Though not every CVE is removed from the images, we take CVEs seriously and try to ensure that images contain the most up-to-date packages available within a reasonable time frame. For many of the Official Images, a security scanner, like Docker Security Scanning or Clair might show CVEs, which can happen for a variety of reasons:

  • The CVE has not been addressed in that particular image

    • Upstream maintainers don't consider a particular CVE to be a vulnerability that needs to be fixed and so won't be fixed.

      • e.g., CVE-2005-2541 is considered a High severity vulnerability, but in Debian is considered “intended behavior,” making it a feature, not a bug.
    • The OS Security team only has so much available time and has to deprioritize some security fixes over others. This could be because the threat is considered low or that it is too intrusive to backport to the version in "stable".

      e.g., CVE-2017-15804 is considered a High severity vulnerability, but in Debian it is marked as a "Minor issue" in Stretch and no fix is available.

    • Vulnerabilities may not have an available patch, and so even though they've been identified, there is no current solution.

  • The listed CVE is a false positive

    • In order to provide stability, most OS distributions take the fix for a security flaw out of the most recent version of the upstream software package and apply that fix to an older version of the package (known as backporting).

      e.g., CVE-2020-8169 shows that curl is flawed in versions 7.62.0 though 7.70.0 and so is fixed in 7.71.0. The version that has the fix applied in Debian Buster is 7.64.0-4+deb10u2 (see security-tracker.debian.org and DSA-4881-1).

    The security scanners can't reliably check for CVEs, so it uses heuristics to determine whether an image is vulnerable. Those heuristics fail to take some factors into account:

    • Is the image affected by the CVE at all? It might not be possible to trigger the vulnerability at all with this image.
    • If the image is not supported by the security scanner, it uses wrong checks to determine whether a fix is included.
      • e.g., For RPM-based OS images, the Red Hat package database is used to map CVEs to package versions. This causes severe mismatches on other RPM-based distros.
      • This also leads to not showing CVEs which actually affect a given image.

We strive to publish updated images at least monthly for Debian and Ubuntu. We also rebuild earlier if there is a critical security need, e.g. docker-library/official-images#2171. Many Official Images are maintained by the community or their respective upstream projects, like Alpine and Oracle Linux, and are subject to their own maintenance schedule. These refreshed base images also means that any other image in the Official Images program that is FROM them will also be rebuilt (as described in the project README.md file).

It is up to individual users to determine whether not a CVE applies to how you are running your service and is beyond the scope of the FAQ.

Parts of this FAQ entry are inspired by a Google Cloud blog post (specifically their "Working with managed base images" section), which has additional information which may be useful or relevant.

Related issues: docker-library/buildpack-deps#46, docker-library/official-images#2740

Why do so many official images build from source?

The tendency for many official images to build from source is a direct result of trying to closely follow each upstream's official recommendations for how to deploy and consume their product/project.

For example, the PostgreSQL project publishes (and recommends the use of) their own official .deb packages, so the postgres image builds directly from those (from http://apt.postgresql.org/).

On the flip side, the PHP project will only officially support users who are using the latest release (https://bugs.php.net/, "Make sure you are using the latest stable version or a build from Git"), which the distributions do not provide. Additionally, their "installation" documentation describes building from source as the officially supported method of consuming PHP.

One common result of this is that Alpine-based images are almost always required to build from source because it is somewhat rare for an upstream to provide "official" binaries, but when they do they're almost always in the form of something linked against glibc and as such it is very rare for Alpine-compatible binaries to be published (hence most Alpine images building from source).

So to summarize, there isn't an "official images" policy one way or the other regarding building from source; we leave it up to each image maintainer to make the appropriate judgement on what's going to be the best representation / most supported solution for the upstream project they're representing.

HEALTHCHECK

Explicit health checks are not added to official images for a number of reasons, some of which include:

  • many users will have their own idea of what "healthy" means and credentials change over time making generic health checks hard to define
  • after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of
  • Kubernetes does not use Docker's heath checks (opting instead for separate liveness and readiness probes)
  • sometimes things like databases will take too long to initialize, and a defined health check will often cause the orchestration system to prematurely kill the container (docker-library/mysql#439 for instance)

The docker-library/healthcheck repository is to serve as an example for creating your own image derived from the prototypes present. They serve to showcase the best practices in creating your own healthcheck for your specific task and needs.

OpenPGP / GnuPG Keys and Verification

Ideally, images that require downloaded artifacts should use some cryptographic signature to verify that the artifacts are what we expect them to be (mostly from a provenance perspective, but also from a network transmission perspective). Many open source projects publish PGP signatures (typically as a "detached" siganture file) which can be used for the purpose of verifying artifact provenance (with the theory being that only the correct publishers of said artifact are in possession of the private key material required to create said signature).

The way we typically recommend image maintainers fetch those public keys to verify said artifacts is via gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys XXXXX (where XXXXX gets replaced with the full key fingerprint, as in 97FC712E4C024BBEA48A61ED3A5CA953F73C700D). This will use the keys.openpgp.org service, which does require additional verification in order to be used in this way. If that additional verification is not possible/desirable for the keys in question, we recommend using keyserver.ubuntu.com.

For non-hkps keyservers (which are often flaky or have varying availability), we use a single keyserver in our Dockerfiles (specifically, keyserver.ubuntu.com), and instead hijack DNS for our builds to point that at a local instance of github.com/tianon/pgp-happy-eyeballs, which is a project that takes incoming HTTP GET requests for keys and in turn forwards them out to several keyservers at once, returning back to the client the fastest successful response (which has been working very successfully for us since early 2018).

Another common solution to this problem is to simply check a KEYS file into Git that contains the public keys content (see Apache Ant's KEYS file for an example). The primary downsides of this are that it's a pain during the Official Images review process (since every added/removed KEYS entry is many lines of what essentially is just noise to the image diff) but more importantly that it becomes much more difficult for users to then verify that the key being checked is one that upstream officially publishes (it's fairly common for upstreams to officially publish key fingerprints, as seen in RabbitMQ's "Signatures" page).

Additionally, any usage of the GnuPG command-line tool (gpg) should include the --batch command-line flag (to enable what is essentially GnuPG's "API" mode).

Multi-stage Builds

Following docker-library/official-images#5929, multi-stage builds are officially supported by the official-images build tooling, and tentatively approved for use.

The main caveat of that change is outlined in docker-library/official-images#5929 (comment), namely that we don't have a clean way to preserve the cache for the intermediate stages of a proper multi-stage image, and as such they should be used sparingly. As such, we've come up with several guidelines to help image maintainers determine whether their use of multi-stage builds is one that's likely to be accepted during image review:

  1. only a single FROM, but potentially multiple COPY --from=xxx:yyy ... copying from other tagged official images; for example:

    • a tomcat image doing FROM openjdk:XXX-jre followed by COPY --from=tomcat:XXX-jdk /path/to/compiled/tomcat/native ... to get the compiled "Tomcat Native" artifacts for a JRE-based image out of the JDK-based counterpart

    • a Windows Nano Server image copying artifacts from the larger Windows Server Core variant to overcome the lack of PowerShell for downloading/installing artifacts

  2. two-stage build where the necessary artifact does not exist and must be built from source and/or the build process is going to be similarly highly deterministic (thus mitigating the cache concern somewhat); for example:

    • a Go project without official binary releases (although it is highly recommended for something trivial like Go to publish actual official release binaries, especially if the Go version required/supported for building is highly specific given that Go only supports two major releases at a time)

    • using jlink from a JDK 9+ image to create an image with a minimal JRE that contains only the necessary components for the contained application

It is also worth pointing out moby/moby#37830 (no sticky bits), moby/moby#37123 (no ownership preservation until 19.03+), and moby/moby#36759 (no ADD --from=xxx), so multi-stage builds are not currently supported/useful for "base" images like ubuntu.

Why isn't there a Windows equivalent of docker-entrypoint.sh?

This is an unfortunate design limitation of Windows. On Linux, we have the exec family of system calls (and a Bash built-in by the same name) that allows us to completely replace our current running process with another. This is what allows us to run an ENTRYPOINT script which performs some initialization logic, then replaces itself with the actual server/application process directly (so that Docker can track that process properly). On Windows, that interface doesn't really exist (and is really difficult to emulate properly), which means that in order to even begin to replicate this behavior, we'd have to implement a process monitor as well to sit between Docker and and the server/application process for the lifetime of the session (all for some simple initialization behavior, which is kind of a heavy toll).

Can I use a bot to make my image update PRs?

Sure! Just a few simple guidelines/requests:

  1. don't make them "too often"
    • multiple times per week is definitely "too often"
    • once every week is a bit on the boundary, but acceptable
    • serious security issues override this
  2. make sure the PR @-references any/all human maintainers so you see our review comments
  3. only one PR at a time, please (the easiest way to accomplish this is to use the same branch name every time)
  4. base every new PR off the master branch from https://github.com/docker-library/official-images, not from your fork (especially important if we squash-merge your PRs)

Image Usage

--link is deprecated!

The reports of --link's death are greatly exaggerated.

-Mark Twain, probably

The documentation for "legacy container links" (--link) include a large warning about --link potentially going away at some point, but there is no timeline given and this "soft deprecation" has been going strong for a very long time. Their usage is definitely discouraged, but we expect Docker will continue to support them for quite some time.

Many sources of image documentation use --link in their examples for simplicity, including not needing to detail Docker network management, and --link's feature of inherently exchanging connection information to the linked containers as environment variables.

Several of the official images were updated in docker-library/docs#1441 with the compromise of using --network some-network in an attempt to convey to users that additional effort will be required for them to connect their services successfully (implying that they should go read documentation / learn about Docker's container networking functionality).