Stateful Services - The Black Sheep of the Container World
For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.
With the explosion of microservices and the desire for cloud portability, the containerization of applications have taken the datacenter by storm. Everyone from the smallest startups to the largest enterprise is now experimenting with or adopting containers.
Containers offer significant value to businesses; including increased developer agility, and the ability to move applications between virtual machines, cloud instances, bare metal servers, and across data centers. Taking on this new design pattern also provides elasticity for application services (scale up and scale out) as well as high availability. Containers are a significant step forward for IT teams when implemented properly.
Yet, while organizations take on the journey to containerization, they often isolate stateless workloads in containers from their stateful workloads like data services, building new silos in their infrastructure. This isolation can create additional complexity and challenges, so finding a way to manage these systems on a single platform is incredibly valuable in the age of data-driven, microservices-based applications.
So what exactly is the difference between a stateless and stateful workload?
Stateless applications are usually microservices or containerized applications that don't "store" data. Web services (such as front end UIs and simple, content-centric experiences) are often great candidates as stateless applications since HTTP is stateless by nature. Any data that flows through a stateless service is usually transitory and the state (such as a transaction), would be stored in a separate "back-end" service such as a database. Stateless containers may still use some form of storage, however the storage used is most often ephemeral - in that if the container restarts, anything stored is lost. If we think about a typical unix server, processes that store data in the tempfs use a similar model. As organizations look to adopt containers, they will typically start with stateless containers as they can often be more easily adapted to this new architecture, separated from their monolithic application codebase, and lend themselves to independent scaling.
Stateful applications, on the other hand, are services that require backing storage and keeping state is critical to running the service. Databases such as Cassandra, MongoDB, Postgres, and mySQL are great examples. They require some form of persistent storage that will survive service restarts.
So why, within the container world is running stateful workloads a challenge?
There are a few things to consider in terms of running a stateful workload.
The first challenge is resource isolation. Many container orchestration solutions in the market provide a best effort approach to resource allocation, including memory, CPU and Storage. While this may be ok for stateless apps, it may be catastrophic for stateful services, where loss of performance may result in loss of customer transactions or data.
The second challenge is determining what type of backing storage is required. There are many kinds of storage, such as local file systems; volumes; object stores; block devices; and shared , network-attached, or distributed filesystems, to name a few. Each of these storage systems have different characteristics and each stateful data service may require or support a different storage type.
Given we are deploying distributed containerized applications, we must consider how we attach services to the storage and how the relationship between the individual service instances and the storage is managed. In some cases, the stateful service by nature is distributed. Most nosql databases subscribe to this model and are optimized for a scale out architecture. In those cases, each instance has its own dedicated storage and the application itself has semantics for synchronization of data. For that use case, local, dedicated, persistent storage optimized for performance and resource isolation is key.
In other cases, stateful services want a shared backend volume. Services that can take advantage of a shared backend storage system are better suited for external storage which may be network attached and optimized for sharing between instances. External storage in that case may be implemented in some form of storage fabric, distributed or shared filesystem, object store, or other "storage service".
The platform you deploy your apps on must be adaptable to different storage patterns.
The third challenge of stateful services is the ongoing operations or management of the full lifecycle of the service. While it may be easy to run a single instance of a database container for testing; proper production deployment and operation of stateful services requires highly available deployment, scaling, and error handling procedures. Most of today's stateful database technologies were originally designed for a non-containerized world. The operational instructions are very specific to the technology and can sometimes be version specific. Trying to map generic primitives of a container orchestration platform to stateful services is usually a time consuming and error prone operation. Organizations may start out by trying to simply containerize these stateful services. They must then develop specific tooling to coordinate multiple related instances for high availability or take on other complex strategies to deploy, manage or operate these services. This can result in manual overhead or the development of custom automation for each service. This can often pose significant operational risk.
Mesosphere DC/OS simplifies the operation of complex data services
DC/OS is built on top of Apache Mesos, the production proven distributed system kernel that powers many of the world's largest web scale companies and enterprises. Mesos two-level scheduler architecture encapsulates application specific logic on top of mesos resource management. This means you can run different types of services on the same infrastructure reducing your overall data center footprint while optimizing resource utilization.
Mesos' architecture provides the built-in automation to manage the lifecycle of services which includes the deployment, placement, security, scalability, availability, failure recovery, and in-place upgrades of the services. Many of the other technologies on the market try to provide a subset of that functionality through manual configuration files and external out of band tools — these "fixes" often depend on the skill of the implementation team, increase operational overhead, and are difficult to maintain in the case of employee attrition.
DC/OS simplifies storage management for data services
Mesosphere DC/OS provides customers with the ability to run all workloads, stateless and stateful on a single platform, with the confidence and the ease of use for production deployment. DC/OS provides many benefits when it comes to running stateful services. First and foremost is the ease of deployment of popular distributed data services. Most notably is Apache Cassandra - a distributed database. Cassandra can be easily deployed with only a few clicks from the DC/OS Universe or a single command from the DC/OS CLI.
In order to deliver a wide variety of data services, Mesosphere DC/OS provides options for local persistent storage and external volumes both of which required for different types of data services. Local persistent storage is "local" to the node within the cluster and is usually the storage resident within the machine - think "internal disks". These disks can be partitioned for specific services and will typically provide the best in terms of performance and data isolation. The downside to local persistent storage is that it binds the service or container to a specific node. Distributed services like nosql databases work well in this model.
External volumes on the other hand are typically attached to the container service over the network and can take on various forms. This is ideal for the broadest applicability, as it separates the container and storage allowing containers to move freely around the cluster. Cloud providers have storage services that fall in this category such as Amazon's S3 or EBS; distributed filsystems, such as HDFS, Gluster, or NFS; Or storage fabrics, such as Ceph, Portworx, Quobyte, and others. All of these options are available to DC/OS. A basic deployment of DC/OS has a built in service (rex-ray) that provides external volume support. Rex-ray integrates directly to a specific set of storage services giving customers some level of flexibility and choice in a backend storage provider. This includes services like Amazon EBS, EMC ScaleIO, and others. DC/OS also supports a pluggable architecture for docker volume drivers. Docker volume drivers are provided by many other storage services, such as Portworx, HP 3PAR, etc.
Many of these storage services have also been packaged in the DC/OS Universe, like Cassandra and the other data services, to quickly and easily deploy and operate. This flexibility is what makes it possible for organizations to deploy a single platform on which to deliver modern app stacks with both stateless and stateful workloads.
DC/OS is the only modern platform trusted and supported by many data services vendors
While there are many technologies and frameworks that claim to run databases in containers at a small scale or in a PoC, running these services at scale in production is what truly matters. This is what gave Mesosphere DC/OS it's stellar reputation and why DC/OS is the only platform that many of the leading data services technology providers trust and support.
The Mesosphere App Ecosystem - "the Universe" now includes more than +100 services, many of which are open source. Organizations also have the freedom to choose supported enterprise offerings from the technology provider such as Confluent for Apache Kafka, DataStax for Apache Cassandra, RIAK from Basho, Redis Labs for Redis, and many many other technologies. Mesosphere DC/OS is the only modern platform that include such supported offerings for data services, a testament of DC/OS technological and market advantage.
Mesosphere DC/OS provides a broad set of capabilities, options, and freedom of choice for bringing stateful services to the containerized data center. Whether it is by providing purpose-built distributed data services, like Kafka or Cassandra, or by providing options for how to attach workloads to various storage backends. This eliminates the silos between stateless container as a service infrastructures and the data services backends that most ANY application requires.
DC/OS is the only platform of its kind that can deliver the breadth in terms of services you can simply run (deploy and operate), the security needed for Enterprises, the scale, and the flexibility needed for on premises, hybrid, cloud, and mixed infrastructure models. In doing so, Mesosphere helps organizations achieve the agility, elasticity, and availability they have been looking for in the data center of the last generation.