- Module Introduction
- What is in An Image
- The Mighty Hub Using Docker Hub Registry Image
- Images and Their Layers
- Image Tagging and Pushing to Docker hub
- Building Images The Dockerfile Basic
- Building Images Running Docker Build
- Building Images Extending Official Images
- Assignment Build Your Own Image
First we're going to discuss the basic of images and the concepts that you're going to need.
What is actually in an image and, just as importantly, what isn't in an image.
We're going to talk a little bit about how to find images on the internet and we'll actually go and look at some, dive into that whole process to finding good images and how to manage those image once we've downloaded them or created them on our won machines.
We'll jump into the fun part of making our own images.
So before we start playing with images and learning how to use them for containers, we probably want to step into what exactly is in an image and what isn't. The way I like to explain is very simple.
Images is the application binaries and dependencies for your app and the metadata on how to run it.
The official definition, "image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime", image definition
Inside an image is not actually a complete OS, there's no Kernel, there's no Kernel modules like drivers. It's really just the binaries that your application needs because the host (server) provides the kernel. That one of distinct characteristics around containers that make it different form a virtual machine (hypervisor); It's not booting up a full Operation System. It's really just starting an application.
An image can be really small. It can be a single file. If you for instance, using GO, one of GO's features is that it can build a static binary and have a single file as your application.
Or you could have a very big image that's actually using some distribution like Ubuntu with its own Package Manager built in, and where you've installed Apache, and PHP, and your source code, and all the added modules you need, you can have multiple gigabytes.
We going to take tour around docker hub, and get to know a few of it's basic features.
We'll kind of discuss the difference between an official image other just as good images, and how to tell the difference between a good image and a bad image.
Showing bit around downloading images and how you can see the different tags that they'll use and difference between an Alpine image and other image options
There's several characteristics to figuring out what's the right image. Like convention you just gonna start using images for your workflow that are related to what you do and you're going to get used to specific ones. You're going like specific ones. Typically we always start with the official tags.
The official tags, it will also be the only one where is name doesn't have
a forward slash /
in it. When you and I create images on Docker Hub, we have
to create them with our account name in front an image we create.
So when you look at all the other images, with account name like tuanany73/multi-nginx it might be actually an organization or individual repo. But the only ones that get to have just a name of the repo are considered official.
Official images are one that Docker.inc, actually has a team of people that help take care of them, ensure that they have quality documentation, that they're well tested and that they're put together properly with Dockerfiles that are obeying best practice rules.
Official images usually work with the official team of that software who actually makes it to ensure that it's doing all the things that it should be doing.
like I said before you want to start be using with the official to start with and then you eventually you may find that you want to change it slightly or add a few thing to it and you'll make your own images.
One of the best things about official images is their documentation. They're always really great at documenting how to make the images work, what options there might be, what environment variables, what's the default port and so on.
Versions are a common feature of official repositories. You don't have to have versions in every image but official ones do. Because most open-source software there's always at least a few official versions out in the wild that are officially maintained and supported.
That's what we have here, 1.19.2
for Ngnix and then we have what's considered
the stable branch of Nginx, which is 1.18
. We can see the little name here
says stable
. So let's break this down.
When we start talking about images, images aren't necessarily named. Images are tagged. A version of an image can actually have one more than on tag. We're going to dive into this a little bit later when we start making our own images and we can play around with tagging.
latest
is a special tags, it doesn't necessarily guarantee it's always the
latest commit in the repositories. What it usually means is you're getting the
latest version of this products. In the official it's very well-defined and
consistent, so that if you didn't care right now exactly what version you
wanted, you just wanted the most current, then you could just say docker pull nginx
, It would download the latest.
A best practice in your software is when you've going to production and you're actually testing software that you're going to be using for others, it's rare that you really want your software update automatically. You usually want to control that process with some other DevOps tools.
But when you're developing or just testing something locally, it's super easy with official images to just type in the name and just assume you're going to get the latest.
You will notice the other one, 1.19.2-alpine
, these all have very similar
names to the first one, but they all have the world alpine
in them. We're
going to get into base images and distribution later, but just for now, alpine
is actually a distribution of Linux that's very very small. It's actually less
than 5MB
in size. This version will actually mean that it comes form a base
image of alpine
, keeping it very small and light, whereas the default one or
the latest image actually comes from Debian distribution, It;s a little larger
in size probably a little bit over 100MB
.
You'll also notice that the three version that I've downloaded 1.19.
all have
the same IMAGE ID
because the IMAGE ID
is based upon the cryptographic SHA
of each image in Docker Hub.
If you ever want to consider not using an official repository, what I usually look for is a number of stars and number of pull, because a popular repository, to me, tends to establish trust.
I always recommend you download and inspect the software before you use it, and look in the Dockerfile and hopefully they'll have an open source repository that you can go look at exactly how they made that image.
This is a fundamental concepts of how Docker works. It uses something called the union file system. To present a series of file system changes as an actual file system. We're going to dive into the history and inspect commands and see how we can use them to understand what an image is actually made of. We're going to learn a little bit about copy and write concept of how a container run as an additional layer on top of an image.
What do mean by Image layer? It's actually transparent completely you when
you're using Docker, but when you start digging into certain commands, like the
history
, inspect
, and commit
command, you start to get a sense that an
image isn't big blob of data that comes and goes in one huge chunk (pieces).
NOTE:
docker image history
Show layers of changes made in image
If you notice when we actually did doker pull
, certain times you might see
words that indicate that there's something that you already have, like you've
cached some part of it already, and that all comes down to the fact that
images are designed using the union file system concept of making layers about
the changes. If we quickly just look at what we have in Docker images
The list on the Docker image history is not a list of things that have actually happened in the container because this is about an image. This is actually a history of the image layers. Every image starts from the very beginning with a blank layer known as a scracth.
Then every set of changes that happens after that on the file system, in the image, is another layers. You might have one layer, you might have dozens of layers and some layers maybe no change in terms of the file size. You'll notice on above gif that we actually have a change that was a simply a metadata change..
IMAGE CREATED CREATED BY SIZE COMMENT
7e4d58f0e5f3 7 days ago /bin/sh -c #(nop) CMD ["nginx" "-g" "daemon… 0B
<missing> 7 days ago /bin/sh -c #(nop) STOPSIGNAL SIGTERM 0B
<missing> 7 days ago /bin/sh -c #(nop) EXPOSE 80 0B
<missing> 7 days ago /bin/sh -c #(nop) ENTRYPOINT ["/docker-entr… 0B
<missing> 7 days ago /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7… 1.04kB
<missing> 7 days ago /bin/sh -c #(nop) COPY file:1d0a4127e78a26c1… 1.96kB
<missing> 7 days ago /bin/sh -c #(nop) COPY file:e7e183879c35719c… 1.2kB
<missing> 7 days ago /bin/sh -c set -x && addgroup --system -… 63.4MB
<missing> 7 days ago /bin/sh -c #(nop) ENV PKG_RELEASE=1~buster 0B
<missing> 7 days ago /bin/sh -c #(nop) ENV NJS_VERSION=0.4.3 0B
<missing> 7 days ago /bin/sh -c #(nop) ENV NGINX_VERSION=1.19.2 0B
<missing> 7 days ago /bin/sh -c #(nop) LABEL maintainer=NGINX Do… 0B
<missing> 7 days ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 7 days ago /bin/sh -c #(nop) ADD file:e7407f2294ad23634… 69.2MB
Which command we're actually going to run. We'll cover that in a little bit when
we ho over the Dockerfile. But for now you can actually see that this one
added a huge amount of files that was 69.2MB
. As you go up (from bottom), we
have some more data changes.
When we start an image, when we create a new image, we starting with one layer.
Every layer get its own unique SHA
that helps the system identify if that
layers is indeed the same as another layers.
Let sat that at the beginning of one of yours, you might have a OS at the very
bottom. Then you create a Dockerfile, which adds some more files and that's
another layers on top of image, maybe we use apt-get
for that; Then in
Dockerfile, you make an env
variable change. That together is your
image.
You might have a different image that starts form DEBIAN
, and then on that
image you may also use apt-get
to install some stuff, and on top of that you
create your own env
, you might open a port
. Each one of these changes that
you usually make in the Dockerfile, but you can also make with the commit docker
command that we'll check out in a minute. This also another image and
those all are bundled together.
But what happen if I have another image that's also using the same version of
Debian OS?
Well, that image can have its own changes on top of the same layer that I have
in my cache. This is where the fundamental concept of the cache of image
layers save us a whole bunch of time and space. Because we don't need to
download layers we already have, and remember it uses a unique SHA
for each
layers so it's guaranteed to be the exact layers it needs.
It knows how to match them between Docker Hub and our local cache. As we make changes to our images, they create more layers, if we decide that we want to have the same image be the base image for more layers, then it's only ever storing one copy of each layer.
With this system, really, one of the biggest benefits is that we're never storing the same image data more the once on our file system. It also mean that when we're uploading and downloading we don't need to upload and download the same layers that we already have on the other side.
If you have your own image and its was custom that you made yourself, and then
you added, let's say, an Apache
server on top of that as another layers in
your Dockerfile, and then you were to open up port:80
, and then at very end,
you actually told it to copy the source code but actually ended up having
two different Dockerfiles for two different website, and every line in the
Dockerfiles were the same except for that last little bit where you copied
Website A into the image Website B, you would end up with two images.
We'll show that in a minute.
The only files that are actually stored is with the arrow sign. So we're never storing the entire stack of image layers more than once if it's really the same layers.
How it's work with containers?
Let's say we have NodeJS
image, and we decide to run a container off of it,
all Docker does is creates a new read/write
layer for that container on
top of that NodeJS
image. When we're perusing (read carefully) the file system
and these things, all the containers and the images all just look like
a regular file system, but underneath the storage driver that's used by
Docker is actually layering, like a stack of pancakes, all these changes on
top of each other.
So if I run two containers at the same time off of the same NodeJS
image,
container A
, and container B
would only be showing, in terms of the file
space, they would only be differencing between what's on that live container
running and what is happening in the base image, which is read-only
.
When you're running Containers and you're changing files that were coming
through the image, let's say I started container C
, and I actually went in and
changed a file that was in this image in the running this is known as
copy-on-write
.
What copy-on-write
do is the file system will take that file out of the image
and copy it into differencing
and store a copy of that file in the container
layer. So now the container is really only just running process and those files
that are different than they were in the NodeJS
image.
$: docker image history nginx:alpine
IMAGE CREATED CREATED BY SIZE COMMENT
6f715d38cfe0 4 weeks ago /bin/sh -c #(nop) CMD ["nginx" "-g" "daemon… 0B
<missing> 4 weeks ago /bin/sh -c #(nop) STOPSIGNAL SIGTERM 0B
<missing> 4 weeks ago /bin/sh -c #(nop) EXPOSE 80 0B
<missing> 4 weeks ago /bin/sh -c #(nop) ENTRYPOINT ["/docker-entr… 0B
<missing> 4 weeks ago /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7… 1.04kB
<missing> 4 weeks ago /bin/sh -c #(nop) COPY file:1d0a4127e78a26c1… 1.96kB
<missing> 4 weeks ago /bin/sh -c #(nop) COPY file:e7e183879c35719c… 1.2kB
<missing> 4 weeks ago /bin/sh -c set -x && addgroup -g 101 -S … 16.5MB
<missing> 4 weeks ago /bin/sh -c #(nop) ENV PKG_RELEASE=1 0B
<missing> 4 weeks ago /bin/sh -c #(nop) ENV NJS_VERSION=0.4.3 0B
<missing> 4 weeks ago /bin/sh -c #(nop) ENV NGINX_VERSION=1.19.2 0B
<missing> 4 weeks ago /bin/sh -c #(nop) LABEL maintainer=NGINX Do… 0B
<missing> 3 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 3 months ago /bin/sh -c #(nop) ADD file:c92c248239f8c7b9b… 5.57MB
<missing>
in docker history
is actually just a misnomer inside the Docker
interface. It doesn't mean that something wrong or it's misconfigured. What it
means is that really, this nginx:alpine
image with IMAGE ID 6f715d38cfe0
,
and the other layers in the image aren't actually images themselves. They're
just layer inside nginx:alpine
image, and so they wouldn't necessarily get
their own IMAGE ID there.
Personally I think it's a little misleading in the interface to say that, but that's how they wrote it.
NOTE:
docker image inspect
returns JSON metadata about the image
What inspect
command gives us all the details about the image. This is
basically the metadata. Remember when we talked about that an image is made up
of two parts, the binaries and the dependencies, and then the metadata
about that image? Well, inspect
gives you back the metadata.
Besides just the basic info, like the IMAGE ID and its tags, you get all sorts
of details around how this image expect to be run. It actually has the option to
ExposePorts
..
[
{
....
....
"ContainerConfig": {
"Hostname": "f19c7895338a",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": { << Port was open
"80/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [ << Environment
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"NGINX_VERSION=1.19.2",
"NJS_VERSION=0.4.3",
"PKG_RELEASE=1"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) ",
"CMD [\"nginx\" \"-g\" \"daemon off;\"]" << command run by default
],
"ArgsEscaped": true,
"Image": "sha256:582c1b5eecdbda3ea7e434da6906b2cc1e0eaea81dca2f493dbb2467704ec650",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": [
"/docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"
},
"StopSignal": "SIGTERM"
}
}
...
...
]
You know when you want to start it, which ports you need to open inside your Docker host if you want it to accept connections.
You can see that Env
variables were passed in, including the version of Ngnix
that it's running and the path.
You can actually see the command Cmd
(command) it will run when you start up
the image by default.
Again, a lot of these things can actually be changed like we did earlier with
the docker container run
command; But these are showing us all the defaults
and some other interesting information like author
, Architecture
of AMD64
,
which pretty much what all normal PC's and Macs run nowadays. We don't really
have too many 32 bits
around, so this is just a standard 64 bits
intel
architecture, and design to run on the Linux OS.
First let's talk about tagging.
Usage: docker image tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
$: docker image tag
NOTE:
docker image tag
Assign one or more tags on an image
Images don't technically have a name even thought we kind of refer to them like
that when we're talking casually if you doing docker images ls
you notice that
there's no NAME
column. Besides the IMAGE ID
, which none of use are going to
remember those right? We have to refer them by three different pieces of
information.
NOTE:
<user>/<repo>:<tag>
Default tag is latest if not specified
On the screen for the image ls
command, we actually only see two of them. Over
on the left, we're seeing the REPOSITORY
, and then we seeing the TAG
. The
REPOSITORY
is actually made up of either the user name or the organization
slash the repository.
Now here we only dealing with the official repository images, so we're only gonna see the actual repository name. We mentioned earlier that the special images that are considered official are the only one that get the right to just be called their name of the repository, not the actual name of the organization slash repository.
NOTE: Official Repository
They live at the "root namespace" of the registry, so they don't need account name in front of repo name
But if we go over to Docker Hub, and I just do
a search on MySQL, we'll find that there's actually an organization not only
just the official MySQL, which is just referred to as MySQL, but there is
mysql/mysql-server
, which seems to be pretty popular as well.
It look like what it is here is this is actually the same MySQL server but it't
actually created by MySQL team at Oracle. If I download and do docker image ls
again you will notice that the REPOSITORY NAME
includes the organization
name.
What the tag
for? The tag
is not quite a version and it's not quite
a branch, but is a lot like Git tags. It's really just a pointer to
a specific image commit, and really could be anything, into that repository.
The tags are just a label that point to an actual IMAGE ID
and we can have
many of them all point to the same one.
REPOSITORY TAG IMAGE ID CREATED SIZE
nginx 1.19 7e4d58f0e5f3 7 days ago 133MB
nginx 1.19.2 7e4d58f0e5f3 7 days ago 133MB
nginx latest 7e4d58f0e5f3 7 days ago 133MB
Well I could make my own Dockerfile and create my own custom image, but we can
also re-tag existing Docker images so I can just do a docker image tag nginx
Usage: docker image tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
$: docker image tag nginx tuanany/ngnix
The format of this command is the image that is the image that I'm going to give a new tag to goes first; and then the new tag that I want to give it. If you don't specify the tag, it'll always default to latest. Latest doesn't actually always mean latest because I technically could take some old software and tag it latest. There is nothing special about it but it is just kind of the default.
NOTE:
latest
tagIt's just the default tag, but image owners should assign it to the newest stable version.
Really, I wish they would just call it default and not latest. So it is a little confusing, but generally on Docker Hub, especially when you're using official images, you can trust that the latest is generally the latest stable version of the software that you want to use; And now it's labeled with my username and a new repo that doesn't exist yet on Docker Hub.
NOTE:
docker image push
uploads changed layers to a image registry (default is Hub)
You should login first with command docker login
before you can push your own
images.
NOTE:
docker login <server>
Default to logging in Hub, but you can override by adding server url.
So when you have your own free Docker Hub account, you can actually login from
the command line docker login
In ~/.docker/config.json
it actually stored an authentication key
that would
allow my local docker CLI to access Docker Hub as me. This is important
point that we'll learn about later on in production is that wherever you login
with Docker CLI, by default, it's going to store the authentication key for your
account in the profile of that user. So just be aware of that.
NOTE:
docker logout
Always logout form shared machines or server when done, to protect your account.
If you're using a machine that you don't trust, when you're done just type
docker logout
.
Again, I didn't actually have to create this from scratch to upload it, I just simply gave a new tag to an existing image and when I uploaded it, it automatically created a new repo based on that tag.
Dockerfile seems look like shell script, but it's not, It's not batch file, it's not a shell script. It's a totally different language of file that's unique to Docker and the default name is Dockerfile with capital D.
$: docker build -f some-dockefile
But in command line (CLI) when you need to deal with a Dockerfile using docker
command, you can actually use -f
. Which is actually common amongst a lot of
the tools with Docker, you can use -f
to specify a different file than
a default.
The Dockerfile source code.
The FROM
command is in every Dockerfile. It's required to be there. It's
normally a minimal distribution.
FROM debian:stretch-slim
And really, the reason you would use these is to save yourself time and pain.
Because these minimal distributions are actually much smaller than the CD's you
would use to install a virtual machine from them. For example, the Ubuntu one
doesn't even have curl
in it, whereas obviously, if you installed a full
Ubuntu on a VM it would have a curl
and a lot of other commands already
installed.
Because all of these distributions are official images, it meas that they're going to be always up to date with the latest security patches and you can depend and trust on them; And one of the main benefits for using them in containers is to use their package distribution systems.
NOTE: package manager
PM's like
apt
,yum
,dnf
, orpacman
are one of the reason to build containers FROM Arch, Debian, Ubuntu, Fedora or CentOS
ENV NGINX_VERSION 1.13.6-1~stretch
ENV NJS_VERSION 1.13.6.0.1.14-1~stretch
NOTE:
env
variableOne reason they were chosen as preferred way to inject key.value is they work everywhere, on every OS and config
ENV
it's a way to set environment variables, which are actually very important
in containers because they're the main way we set keys and values
for container building and for running container.
In this case it's actually setting the version of Nginx it would like to us to install and this environment variable will be set so that any subsequent lines will be able to use it.
Now, as a reminder from previous lectures, each one of these stanzas (verse) is an actual layer in our Docker images. So the order of them actually matters, because it does work top down.
RUN apt-get update \
&& apt-get install --no-install-recommends --no-install-suggests -y gnupg1 \
&& \
NGINX_GPGKEY=573BFD6B3D8FBC641079A6ABABF5BD827BD9BF62; \
found=''; \
for server in \
ha.pool.sks-keyservers.net \
hkp://keyserver.ubuntu.com:80 \
hkp://p80.pool.sks-keyservers.net:80 \
pgp.mit.edu \
; do \
echo "Fetching GPG key $NGINX_GPGKEY from $server"; \
apt-key adv --keyserver "$server" --keyserver-options timeout=10 --recv-keys "$NGINX_GPGKEY" && found=yes && break; \
done; \
test -z "$found" && echo >&2 "error: failed to fetch GPG key $NGINX_GPGKEY" && exit 1; \
apt-get remove --purge -y gnupg1 && apt-get -y --purge autoremove && rm -rf /var/lib/apt/lists/* \
&& echo "deb http://nginx.org/packages/mainline/debian/ stretch nginx" >> /etc/apt/sources.list \
&& apt-get update \
&& apt-get install --no-install-recommends --no-install-suggests -y \
nginx=${NGINX_VERSION} \
nginx-module-xslt=${NGINX_VERSION} \
nginx-module-geoip=${NGINX_VERSION} \
nginx-module-image-filter=${NGINX_VERSION} \
nginx-module-njs=${NJS_VERSION} \
gettext-base \
&& rm -rf /var/lib/apt/lists/*
You'll usually see RUN
commands when you need to install software with
a package repository, or you need to do some unzipping, or some file edits
inside container itself.
RUN
command can also run shell scripts that you've copied earlier in the file
or any commands that you can access from inside the container, at that point in
time in the file.
RUN
...
...
&& echo "deb http://nginx.org/packages/mainline/debian/ stretch nginx" >> /etc/apt/sources.list \
...
...
Since we're coming from Debian, this RUN
command has access to all commands
and binaries that would have been installed with that release; And this one for
Nginx more or less adding a key to the repository where You can get the package
to install the latest Nginx repository.
There's few things here that are really key in making good Dockerfile and we'll actually talk about more in a later section where we talk about best practices.
But two things to note, the reason that we're adding all these command with
&&
, so that they're chained one after the other, if you remember, each
stanza is its own layer. What it does is ensure that all of these commands
are fit into one single layer. It saves us space us a little time. It save
us space; And it's so common that you'll probably see it in every Dockerfile
on Docker Hub.
RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
# forward request and error logs to docker log collector
Above another RUN
command is all about pointing our .log
files to the
stdout
and stderr
. We'll see later, the proper way to do logging inside
a container is to not log to a log file; And there's no syslogd
or any
other syslog service inside a container.
Docker actually handles all of our logging for us. All we have to do inside
the container is make sure that everything we want o be captured in the logs is
spit to stdout
and stderr
and Docker will handled the rest.
There's actually logging drivers that we can use in the Docker engine itself to control all the logs for all containers on our host; And so that's really what you want to do with them.
It add more complexity to tour app if your app is actually doing the logging
itself; And then, if you have to deal with files in every container, now you've
got a problem of how do you get those files out; And searchable, and
accessible. Here we're taking the default Nginx logs and we're actually
linking them to the stdout
. So that Docker can capture them.
EXPOSE 80 443
# expose these ports on the docker virtual network
# you still need to use -p or -P to open/forward these ports on host
We have EXPOSE
command. By default, no TCP (Transmission Control Protocol)
or UDP (User Datagram Protocol) ports are open inside a container. It
doesn't expose anything form the container to a virtual network unless we list
it with EXPOSE
; And, of course, because this is a Web and proxy server it's
going to expose 80
and 443
.
Now, this EXPOSE
command does not mean these ports are going to be opened
automatically on our host. That's what the -p | --publish
command is whenever
we use docker run
.
The CMD
is required parameter that is the final command that will be
run every time you launch a new container from the image, or every time you
restart a stopped container.
There is some really excellent documentation on all of these stanzas, plus a whole lot more, that we're going to go into later, on the Docker documentation website at docs.docker.com
So these five different stanzas are pretty normal in every single Dockerfile.
Some of them are required, Some of them like RUN
, ENV
, and EXPOSE
are
optional, but they're pretty typical for most container that you're going to
create image for.
Build locally.
Usage: docker image build [OPTIONS] PATH | URL | -
Build an image from a Dockerfile
-t, --tag list Name and optionally a tag in the 'name:tag' format
--target string Set the target build stage to build.
--ulimit ulimit Ulimit options (default [])
//@NOTE tag must be all lowercase
$: docker image build -t customnginx
The FROM
command on Dockerfile, when I build this image, is going to actually
pull that Debian stretch-slim image from Docker Hub down to my local cache
/var/lib/docker/
; And execute line by line each those stanzas inside my
docker engine and cache each those layers.
Step 1/7 : FROM debian:stretch-slim << Each line = layer file stored
---> 5e45a95672e1 << a hash build cache
Step 2/7 : ENV NGINX_VERSION 1.13.6-1~stretch << Each line = layer file stored
---> Running in 9ca9d90b8325 << a hash build cache
Removing intermediate container 9ca9d90b8325
---> 33aa94a95f39 << a hash build cache
Step 3/7 : ENV NJS_VERSION 1.13.6.0.1.14-1~stretch << Each line = layer file stored
---> Running in 19052df5ae87 << a hash build cache
Removing intermediate container 19052df5ae87
---> 3c7e41f29787
Step 4/7 : RUN apt-get update \
&& apt-get install --no-install-recommends --no-install-suggests -y gnupg1 \
&& \
NGINX_GPGKEY=573BFD6B3D8FBC641079A6ABABF5BD827BD9BF62; \
found=''; \
for server in \
ha.pool.sks-keyservers.net \
hkp://keyserver.ubuntu.com:80 \
hkp://p80.pool.sks-keyservers.net:80 \
pgp.mit.edu \
; do \
echo "Fetching GPG key $NGINX_GPGKEY from $server"; \
apt-key adv --keyserver "$server" --keyserver-options timeout=10 --recv-keys "$NGINX_GPGKEY" && found=yes && break; \
done; \
test -z "$found" && echo >&2 "error: failed to fetch GPG key $NGINX_GPGKEY" && exit 1; \
apt-get remove --purge -y gnupg1 && apt-get -y --purge autoremove && rm -rf /var/lib/apt/lists/* \
&& echo "deb http://nginx.org/packages/mainline/debian/ stretch nginx" >> /etc/apt/sources.list \
&& apt-get update \
&& apt-get install --no-install-recommends --no-install-suggests -y \
nginx=${NGINX_VERSION} \
nginx-module-xslt=${NGINX_VERSION} \
nginx-module-geoip=${NGINX_VERSION} \
nginx-module-image-filter=${NGINX_VERSION} \
nginx-module-njs=${NJS_VERSION} \
gettext-base \
&& rm -rf /var/lib/apt/lists/*
...
...
Removing intermediate container 2b18af1fd3e1
---> 64cbfaa4616f
Step 5/7 : RUN ln -sf /dev/stdout /var/log/nginx/access.log && ln -sf /dev/stderr /var/log/nginx/error.log << Each line = layer file stored
---> Running in 19d26badf043 << a hash build cache
Removing intermediate container 19d26badf043
---> 69ea67de31d6
Step 6/7 : EXPOSE 80 443 << Each line = layer file stored
---> Running in 54e6b6c55e43
Removing intermediate container 54e6b6c55e43 << a hash build cache
---> 3027b06353b4
Step 7/7 : CMD ["nginx", "-g", "daemon off;"] << Each line = layer file stored
---> Running in d07cf0a72728 << a hash build cache
Removing intermediate container d07cf0a72728
---> 8cf1f704414c
Successfully built 8cf1f704414c
Successfully tagged customnginx:latest
each steps is a line in the Dockerfile that it's executing inside this image as
it's building it. and then there's a little has
at the end which is actually
the hash
it keeps in the build cache, so the next time we build this thing, if
that line hasn't changed in the Dockerfile, it's not going to rerun it. This
is one of the magic pieces of why Docker makes deployment and software
building so fast, is it actually is intelligent enough to cache the steps in
the build.
So quite often, after you've built an image the first time, and you're really just there changing your custom source code and not necessarily changing the application in itself, all this installation stuff has already happened. So you will have very short built times.
What if I go back in Dockerfile, I'm going to add an additional exposed port 8083
. Now, that doesn't mean that Nginx is smart enough to know that I'm
opening this port
, or that it's in anyway communicating with Nginx. It's just
me allowing the container to recieve packets on port 8080
.
S, lets build it again. The build time only took us a couple of second; And
you'll notice that on each step it'll say Using cache
. Starting with Step
2 it'll say Using cache
, so on Step 3 and Step 4. On step 5, it
recognizes on Step 5, that line is different.
So actually executes that into the container; And then on Step 6, it has to rerun that line because the minute a line changes, every line after that now has to be rebuilt as well.
This brings up the point about ordering of your lines in Dockerfile. Because, if you get things out of order, for instance, if you copied the code in, or if you copying the software code that you're creating at the very beginning of the file, then every time you change a source file and you rebuild, it's going to have to build entire Dockerfile again.
It's critically important for your sanity and time that you usually keep the things at the top of your Dockerfile that change the least and then the things that change the most at the bottom of your Dockerfile.
We're not building our own Nginx because ideally, if you can use an official image to get the job done from Docker Hub, then it will be a lot easier for you to maintain this Dockerfile and keep it working well.
In simpler scenarios, quite often the official image is work. But as you grow and add more complexity to your environment and your systems, you'll probably find that you need to add additional custom software, or change the way that it starts, or add scripts in so that it tweak the configuration.
But when I'm starting a new greenfield project, or if I'm converting some old app, I always start with the official images from Docker Hub. Then once I hit some roadblocks and I'm not able to use that image anymore, I might go back to Docker Hub and take another look and see if there's anything custom out there that's really popular, that I can trust, and look info and see if it's going to solve my problem.
But there's nothing wrong with building your own. It's just more work and more upkeep over time.
# this shows how we can extend/change an existing official image from Docker Hub
FROM nginx:latest
# highly recommend you always pin versions for anything beyond dev/learn
WORKDIR /usr/share/nginx/html
# change working directory to root of nginx webhost
# using WORKDIR is preferred to using 'RUN cd /some/path'
COPY index.html index.html
# I don't have to specify EXPOSE or CMD because they're in my FROM
So here is super simple. We've got three stanzas. You've seen the FROM
before;
And this time we have the WORKDIR
. WORKDIR
just basically running a cd
directory change. So you might be tempted to use the RUN
command and just
type RUN cd
to /usr/share/nginx/html
and then do some things.
But really the best practice for Dockerfile is to always a separate
WORKDIR
stanza for whenever you've changing directories. So if your file gets
a little complex and you have to move back and forth in your container to do
things while it's building, you always want to use the WORKDIR
command,
because it's a lot easier to describe in the Dockerfile what you're doing.
In this case, what we're actually doing here is we're changing to the default
Nginx directory for its .html
files. In the default configuration on Docker
Hub, Nginx is acting just a web server, and it's just serving static files right
off the container disk.
The last stanza here, we have, is the COPY
command; And this is the stanza
you'll always be using to copy your source code from your local machine, or your
build servers, into your container images. In this case, we're just taking our
simple index.hml
and we're overwriting the file in the Nginx default
directory so that it's our custom home page for the web server.
Before we build this, you'll notice that we're missing required stanzas,
like the CMD
, So how can we get away with that? Well, there's already a CMD
specified in the FROM
image; And when we use the FROM
, we inherit
everything from the Dockerfile we're froming.
This is how you can chain Dockerfiles together so that images depend on the
other images that depend on other images.
$: cd dockerfile-sample-2
$: docker image build -t nginx-custom-html .
$: doker container run --rm --publish 80:80 nginx-custom-html
Our custom index.html
running with official Nginx image, and our container
works.
Usage: docker image tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
$: docker image tag nginx-custom-html:latest tuanany73/nginx-custom-html