Whether you're all in on Docker or still considering making the leap, you'll want to read this post. It covers one of the basic issues you're likely to encounter when using Docker in a cluster setup: How do I get a new version of a Docker image to all nodes.
Docker's default mechanism to distribute images is called a registry, which is a service responsible for hosting and distributing images. You can either use it in a hosted fashion or you can run your own private copy of it. Things to consider when choosing one over the other include, but are not limited to:
The main functionality of the registry is lookups to find a certain image based on a description. The registry handles Docker images in repositories, or repos for short (likely inspired by Git). These are collections of related images, typically providing different versions of an app.
Labels are used to distinguish images from each other. A label is simply an alphanumeric identifier attached to the image, which could read
latest, but a good practice is to provide a build hash from the CI/CD pipeline to unambiguously identify the image as it from the developer's laptop to a production system. A repository offers operations such as pushing (uploading an image into a repo) and pulling (downloading an image from a repo).
The complete address of a Docker image including its version is:
A Docker address like ubuntu is in the
rootnamespace—controlled by Docker Inc. and reserved for official images on the Docker Hub—while names prefixed with
/rsuch as mhausenblas/ntil belong to the user namespace. So, for example, if you want to use Nginx in version 1.9 from the Docker Hub, you would say:
hub.docker.com/_/nginx:1.9. This could actually be expressed as just
nginx:1.9, because the default for
registryis the Docker Hub.
The most established hosted version of a Docker registry is the Docker Hub. It comes with a nice and usable UI, and allows in the free version for one private and as many public repos as you like. You have two options: either you build locally, say, on your laptop and simply
docker pushthe Docker image. This is for quick and dirty cases, when you're learning or playing around, for example.
Every production-grade use should be done via the Automated Builds. This is a fancy way of saying that the
Dockerfilein question lives in a Git repo (GitHub and Bitbucket are supported at time of writing) and whenever there are changes detected, the Docker images is built and made available on the Hub. While it takes a few minutes to set up an Automated Build, it is immediately worth it, once you realize that without the Git repo-backing you don't have any description or proper history.
This popular player on the registry market is part of the Google Cloud Platform, and the company is playing its strong networking advantage with it. While it's in my experience less straight-forward to use than the Docker Hub, it's still convenient enough if you're already using the Google Cloud Platform in its entirety.
In other words: If you want to use to Google Registry standalone, you might not be as happy as with Docker Hub, since there are a few dependencies you'll have to take into consideration (for example, you'll need to install the Google Cloud client SDK) and the images are stored in Google Storage buckets.
However, if access control and security in general is important for you, this might be the best choice.
This is a free service (for public repositories) with a nice workflow built in. It allows for fine-grained updates based on regexps on Git branches, offering a couple of notification options (from email to Webhooks), an audit trail and build logs. The pricing is based on your usage and, besides the rather limited authentication support (OAuth-only at time of writing), an excellent choice.
The most recent addition to the hosted Docker registry market is AWS ECR. It is a shiny and new offering, fully managed by AWS. While AWS ECR is tightly integrated with the Amazon Elastic Container Service, you can also use it as a generic Docker registry.
In many (if not all) production environments it pays to run your own private registry. It is straight-forward to run a private registry. In the Mesosphere Datacenter Operating System, for example, using Marathon, you simply run the following command:
$ cat > private-registry.json <<EOF
&& dcos marathon app add private-registry.json
You're in full control, there are no external dependencies in the CD pipeline and you can't beat the speed of a local network connection—even in the case that you have a fiberoptic connection to one of Google's or AWS's datacenters, this option is faster. Please note that above example, while complete, should not considered to be prod-ready, there are certainly some more steps necessary to make it resilient against a range of failures.
In a properly set-up CI/CD pipeline using Jenkins, for example, the differences between the Docker images should not negatively impact the to-be-transmitted data too much. But it is a matter of scalability. Take for example a 500-node cluster and an average of 10 new Docker images that have to be distributed every two hours (meaning you have a release cycle of 12 new versions per day) to every node in the cluster. If the average difference in image size is only 100 kilobytes, this results in daily ingress network traffic of roughly 6 gigabytes.
However, Docker images usually grow by more than 100 kilobytes between different releases, so the network-related costs of a public cloud-hosted Docker registry might turn out to be significant.
You can also put your Docker images on a storage infrastructure other than the local filesystem. Options include Amazon S3, Microsoft Azure Blob Storage, HDFS or any NFS-mounted network storage system. Especially with larger clusters (in the hundreds of nodes), the combination of a private registry backed by a non-local filesystem is considered a good practice.
Further, established repository providers such as Artifactory offer Docker registry integration and, indeed, I have already seen a couple of folks using this option.
Last but not least, you can roll your own solution, for example based on what Facebook does: It is using a BitTorrent-based mechanism to distribute its container images.
If there's one thing I hope sticks with you from this post, it's this: Do pay attention to the choice you make concerning Docker image distribution, because it can turn out to be a bottleneck in production.