Docker containers at scale (our take on Docker Swarm)

For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.

Feb 26, 2015



4 min read

For everyone involved with Apache Mesos and the Mesosphere Datacenter Operating System (DCOS), it was a real honor when Solomon Hykes (CTO of Docker) said at DockerCon EU in December that Mesos is the "gold standard" for large-scale production clusters running containers.
And it's a real sign of the maturity of containers — and what Docker has accomplished — that we are even talking about containers at scale. Modern Linux containers have been around since 2006, when Google first introduced the necessary ingredients for containerization in cgroups and namespaces. But it was really Docker's work that simplified the creation of containers and made them what they are today: The best way for developers and sysadmins to standardize, configure, and ship applications.
What's the Significance of Docker's Swarm Announcement Today?
Pushing a container into production sounds simple, but there's a lot to it.
Docker makes it really easy to package your app. But you want it to be just as easy on the other side, on the operations side. You want it to be just as easy to throw it into the cloud — whether literally on Amazon or your own hardware, private cloud or public cloud — have it just do what it's supposed to do, which is run as many times as it needs to and never go down and never page you. And if you have a new version, to allow you to put that new version out there in a graceful way. That's what developers care about- they care about pushing code into production and just having it run without having to wear a pager or worry about IT staff. The reality is that today it's simply not that easy.
Docker Swarm was started by Docker as a solution to the problem of seamlessly managing a cluster of containers. So today's unveiling of Swarm is really about Docker having gotten very serious about how enterprises can deploy and manage containers at massive scale.
We think the coolest part of the Docker Swarm announcement is the notion of "batteries included but swappable." Simply put, this means that you can start using Docker Swarm and "swap in" Mesosphere when you need to go to production at scale. We think they made a great community decision there by encouraging customer choice and innovation around scheduling and orchestration, rather than prescribing a single approach.
Mesosphere's Playbook for Managing Containers at Scale
Mesos and the Mesosphere DCOS are specifically designed to manage containers at scale. These high-performance systems have been hardened over many years in production in some of the largest datacenters in the world. Twitter, for example, runs almost entirely on Mesos.
Orchestrating and scheduling containers at scale — on hundreds or thousands of machines — is very different than running containers on a single machine or even 100 machines. As you scale up past 100 machines, the surface area of possible failures and performance bottlenecks becomes exponentially complex.
At datacenter-scale, machines fail all the time, disks fail all the time, networks fail all of the time. Failure becomes the norm. Managing for failure and making failure transparent requires highly-specialized systems like Mesos and the Mesosphere DCOS. From Mesosphere's standpoint, you want to treat your cluster of computers like a black box — you want to throw your apps at it (such as with Docker Swarm) and have them run reliably, and if something bad happens you want the system to fix itself.
Mesos and the Mesosphere DCOS are specifically designed to manage containers at datacenter scale and solve for the most common challenges faced by enterprises running production-grade applications. For example, the Mesosphere technology is:
Highly-available. Production-critical workloads often require five or more 9's of reliability and high degrees of automation. Achieving this requires the years of battle-hardening which Mesos has endured.
Fault-tolerant. When something bad happens — a server, rack or network goes down — Mesos is capable of identifying and mitigating the failures automatically.
Self-healing. When something goes down in a Mesos or Mesosphere DCOS cluster, such as when a rack goes down or a set of services die, those services are automatically restarted and everything is automatically reconnected, like a starfish growing another arm, with no human intervention.
By combining Docker Swarm and Mesosphere, you can use the Docker workflow to build and package your apps and also be confident they can run at scale, reliably, with the Mesosphere DCOS and Mesos managing the orchestration and scheduling.
Datacenter-Scale Multitenancy
In traditional systems, running multiple types of workloads in the same datacenter would require the operations team to dedicate clusters of machines to each workload. So, for example, if you wanted to run Docker Swarm and Spark you would need to create a cluster of machines dedicated to Docker Swarm and another cluster of machines dedicated to Spark. This creates silos in the datacenter where each type of workload requires separate cluster management and cannot easily share data and resources.
Mesos and the Mesosphere DCOS are uniquely capable of running Docker Swarm workloads on the same cluster as other workloads, including Big Data workloads like Spark, Storm, Kafka and Hadoop. This drives up resource utilization while reducing cost and complexity.
With the Mesosphere DCOS, you can install Docker Swarm as a datacenter service with a single command and Docker Swarm workloads will run on the same cluster as other datacenter services, such as Big Data services. With the Mesosphere DCOS, any number of datacenter services can be run simultaneously and Mesos will elastically share resources with Docker Swarm workloads.
How it Works: Docker Swarm on Mesos
Docker Swarm on Mesos is written directly against the Mesos API, which also means it's fully compatible with the Mesosphere DCOS. This makes Docker Swarm a first class citizen in the Mesos and Mesosphere ecosystems, alongside other datacenter services, such as Marathon, Cronos, Spark, Storm, Hadoop, and Cassandra.
The benefit of integrating Docker Swarm using the Mesos API is that it provides the most flexibility in utilizing Mesos features while maintaining full compatibility with Docker Swarm. Docker Swarm on Mesos will benefit from the latest advances in Docker Swarm, while able to leverage the unique features and scalable architecture of Mesos and the Mesosphere DCOS.

Ready to get started?