Platform architecture for the container-centric datacenter

Aug 11, 2016

Karl Isenberg


11 min read

Whether you call it a stack or a platform or a system, in order to use containers in production you need a set of tools that work together. There is no one tool that solves all problems, so understanding the platform architecture layers is critical to finding or building a well-integrated multi-tool container solution that meets your needs.
Containers and container orchestration are ultimately issues of distributed computing. So it's good to understand the history of that field. But the history of high performance computing is vast. Some approaches solved fundamental problems and remain relevant to this day, while others taught us valuable lessons but eventually fell into obscurity. Let's go back to basics and cover how distributed computing has evolved. Then, we'll better understand the present and can even take a glimpse into the future.
Revisionist History of Distributed Computing
In the beginning, there were computers. Then came the internet to give all those computers something useful to do. Chatrooms, dancing babies, cat memes, and all sorts of other important stuff.

Monolithic Web Application
So exciting were these magically connected boxes that everyone went out and created websites and web applications. Pretty soon, users caught on, traffic spiked, and websites everywhere locked up and stopped responding. So, money was thrown at the problem, hardware was purchased, and the world of distributed computing came to the masses.

Distributed Scale and Availability
With distribution came scale and availability. Unburdened load balancers sat in front and distributed load. People everywhere were finally able to get back to catching their Pokémon! But suddenly a lot more system administrators were needed to set up machines and install dependencies and manage OS upgrades and update web apps when they changed. Then the sysadmins leveled up and started automating deployments with scripts.

Service-Oriented Architecture
Then developers learned about Service-Oriented Architecture and broke up their monoliths into front-ends, back-ends, and other services communicating over network-based interfaces. Message brokers, caches, and databases added to the complexity. By now, there's an army of sysadmins—they're writing more code, they're demanding to be called operations engineers, and they're asking for more salary.
Pretty soon, the ops guys are going crazy managing scripts and they start using configuration management tools like Puppet and Chef. They start worrying about data replication and backups and port management and idempotent artifacts. As the services proliferate, demand for new machines increases. Developers and product managers get frustrated that it takes so long to spin up new machines.

Virtualized Infrastructure
Then comes server virtualization and a new virtualized infrastructure layer. Now it takes just minutes to requisition a new machine instead of days. And with it comes greater isolation between services and more flexible machine resource allocation. Virtual machines may be on-premises or they may be hosted by a third party, who's saving money by operating at a larger scale and having customers share machines. Suddenly, the tools market is full of provisioning solutions and machine image builders.
But along with all this flexibility and cost savings comes increased networking complexity, IP provisioning and subnet proliferation. At this point, it becomes viable to give each developer their own production-like development and staging VMs. With that power comes the headache of multitenancy, chargeback and account management as new users flood the provisioning and hosting systems.

By now, developers are clamoring for "agile" and dividing themselves into smaller teams and splitting their large services into many small microservices. The single responsibility principle leads developers to build more flexible, maintainable, scalable and replaceable components, but it also dramatically increases the management, integration and networking overhead of putting these microservices into production. Now there are multiple machine types that need to be provisioned independently and more language runtimes, frameworks and libraries to install. Dependency hell has arrived.

Containers to the Rescue!?
And then along comes containerization and container images to solve all your problems! A new container runtime layer manages these containers with daemon agents on each machine. Now there are service-level container images to use for pre-provisioning, packaging, distribution and rapid deployment. Each team can now manage its own service dependencies independently. As operations work shifts to developers, the difference between development and production environments decreases. Hopefully that means you'll hear less "works on my machine!"
So all your problems are solved, right? Not so fast… As the microservice transition accelerates, developers make more and more services, more and more containers. The poor ops guys start pulling out their hair trying to get these containerized services into production, placing containers on VMs one at a time, planning in shared spreadsheets, and building giant configuration management decks.

Container Orchestration
The manual placement eventually becomes automated scripts. The scripts eventually become software tools. The tools mature and grow and add functionality, as they do. Placement becomes scheduling as lifecycle event-handling starts to be handled automatically. At this point container orchestration emerges from the primordial ooze of automation, melding a cluster of machines into a single containerized layer. But what is orchestration exactly?

Container Orchestration Breakdown
At first, scheduling just involved distributing containers and services across machines, but those placement and event-handling decisions become more and more complicated as scale and diversity increase.
Robust resource management quickly becomes a requirement in order to maximize utilization and minimize interference between containers competing for resources. And as placement is automated, services have to change hard-coded assumptions into configuration options and configuration options into deployment definition templates. Yesterday's declarative deployments turn into today's automated definition generation. And all these services still need to be able to automatically discover their remote dependencies and have network channels to talk to them.
You can't fire your ops team just yet...
Container orchestration is still pretty new, but there are already about a dozen competing tools to choose from! Many go beyond orchestration, providing full container platforms or application platforms with integrated container orchestration. Some are even hosted or capable of being hosted as multitenant cloud platforms.

What exactly IS a cloud platform?
On the Shoulders of Giants
"Platform" is a really generic term. Even within the context of computing it really just means a base of technologies on which other technologies or processes are built or run. The computing space is full of higher-level platforms that build on top of lower level platforms, raising the tide that lifts all boats.
Most of these platform layers are easily understood. For example, in the on-premises world you have the following:
  • Infrastructure Platforms (e.g., OpenStack, VMware vSphere)
  • Container Platforms (e.g., Kubernetes, Rancher)
  • Application Platforms (e.g., Cloud Foundry, Red Hat OpenShift, Deis)
All of these platform types also have their own "as a service" equivalents running in the cloud:
  • Cloud Infrastructure Platforms (e.g., AWS, Microsoft Azure, Google Compute Engine)
  • Cloud Container Platform (e.g., Google Container Engine, Azure Container Services, Amazon Elastic Compute Service)
  • Cloud Application Platform (e.g., Heroku, Google App Engine, Pivotal Web Services, IBM Bluemix)
In the above layer cake, an infrastructure platform combines the management of networking and virtual machines, hiding the hardware from the user. Container platforms simply expose containers to the user and facilitate their management in a distributed system across multiple virtual or physical machines. Application platforms, on the other hand, may or may not use containers and instead present an application interface to the user, often using Heroku-style buildpacks to create packages like container images.
Of course, to add to the complexity and confusion, some layers don't build on or expose lower level platforms. For example, CloudFoundry uses containers internally, but hides the container platform layer from the user.
On top of the distinction between platforms and cloud platforms, many solutions also provide users, groups, namespacing and security permissions: the ingredients for multitenancy. These systems are often suffixed with "as-a-service," because they can be hosted and managed by a central group and shared by many user groups, whether or not the solution itself is hosted in the cloud, in your company's datacenter, or a hybrid of the two.
Historically speaking, application platforms tend to predate container platforms, but while application platforms are often great for web applications, they're not always flexible enough to support complex distributed systems or stateful services, and they often lock the user into a specific set of supported programming languages. With the popularity of containers exploding, the latest generation of platforms provide or reveal a lower-level container image abstraction for users to build on. This is the new container platform layer—a truly polyglot platform, flexible enough for both stateless and stateful services, from 12-factor web applications to distributed data storage and processing services.
However, the proliferation of container platforms brings with it new concerns about how to perform application lifecycle management and "Day 2" operations, like debugging and maintenance. The tools used to achieve scalability, availability, flexibility, portability, security and usability at the virtualization layer are simply not designed to work at the containerization level. So a whole new ecosystem of tools are emerging to deal with this, most of which are distributed services themselves!

Distributed Containerized Operating Systems
As an ecosystem of tools and services grows, it also becomes increasingly necessary to have a method of package management and a service catalog from which to download, install and configure third party services. More and more system services get baked into the container platform until it stops looking like just a distributed kernel and more like a distributed operating system. Just like a machine operating system, a distributed operating system must perform orchestration, support secure multitenancy, and separate service layers into system space and user space for permission and access management.
The Datacenter Operating System
DC/OS—created by Mesosphere and available as an open source project—is the first distributed operating system to be built on top of containerization technology. Being built to manage containers sets DC/OS apart from previous distributed operating systems. DC/OS decouples jobs and services from the system they run on, increasing the portability, flexibility, and fault tolerance of the jobs and services as well as the system itself.
Just as container platforms build on top of infrastructure platforms, DC/OS builds on top of Mesosphere's production-proven container platform to provide a robust distributed operating system. Apache Mesos acts as the cluster resource manager and distributed kernel while Marathon handles container orchestration. These core open-source components are joined by a fleet of plugins, tools, and system services to provide additional higher and lower level capabilities:
  • Resource isolators manage CPU, memory, disk space, volumes, and even GPU to enable colocation of stateless microservices, stateful data services, and high-powered real-time and batch jobs.
  • System services provide virtual networking, container IPs, load balancing, task metrics, system diagnostics, dashboards, and a browser-based control center to facilitate both day one and day two operations.
  • Package management gives the user access to distributed data storage, stream processing, message queues, continuous integration, and other community services.
  • A flexible installer makes it easy to deploy a production-ready system on-premise or in the cloud.
While open source DC/OS focuses on getting the distributed operating system right, Mesosphere Enterprise DC/OS provides the advanced functionality that allows DC/OS itself to be managed as a service: user and group management, end-to-end encryption, secrets management, and access controls.
Looking Forward
The platform and infrastructure ecosystem continues to cyclically expand and contract. Each new layer brings a new set of competing tools until a few winners outpace the rest. Eventually those winners gain more features until they become unwieldy, too complex to change or use. Then a new layer emerges, hiding lower level complexity and providing a new platform on which to build. The lower levels gain interoperability, maturity and stability as they move towards standardization, while the higher levels test out new abstractions, providing power and agility to a new generation of software.
While history is always repeating itself, hopefully every iteration gets a little better than the last. As we saw with the Operating System Wars and the Container Wars, multiple container orchestration solutions have now been developed, with container platforms emerging around them. The competition in this space is fierce. The fight isn't just over which solution is the best, but also who can get the most mind share among thought leaders and the tech industry as a whole.
While Silicon Valley may have moved on to the next new thing, the rest of the world is still trying to understand containerization and the business cases that make it compelling. Unfortunately, containerization by itself is more interesting than useful. To get the most out of containers, you need a robust platform that can harness the development, operational, and security benefits into a well-integrated, internally consistent system.
The word is already out, though. There are twice as many orchestration solutions this year as there were last year. If history is any judge of the future, there's unlikely to be just one winner of the Container Orchestration Wars. Instead, solutions will compete until demand for a higher layer forces them to collaborate. In the meantime, development of the next layer of the stack is well underway at Mesosphere. The future of the datacenter looks a lot like a single server, only more containerized and more distributed.
I'm looking forward to the day when container platform vendors collaborate on a common API for the distributed operating system: a POSIX for the datacenter!

Ready to get started?