Challenges in Cycle Counting
CPU Isolation in Apache Mesos and DC/OS
TL;DR: DC/OS 1.10 enforces hard CPU limits with CFS isolation for both the Docker Engine and Universal Container runtimes. This gives more predictable performance across all tasks but might lead to a slowdown for tasks (and therefore also deployments) that previously have consumed more CPU cycles than allocated.
Isolating tasks is an essential function of container schedulers such as DC/OS. In this blog post we discuss the different options for isolating CPU resources and the rationale for a recent change in this behavior.
Container isolation has two goals:
- Isolation should prevent one task from accessing critical information of another task. With containers this is using accomplished using namespaces. Namespaces are a Linux kernel feature that provide a separate namespace for each container akin to each container having its own process ID.
- Isolation should provide fair and predictable access to resources. This is accomplished with Linux cgroups a Linux kerneo feature that allows you to specify a maximum of CPU cycles or a maximum memory usage per container.
This blog is about the different options for limiting access to CPU cycles in a fair and predictable fashion.
Overview of CPU isolation
As DC/OS uses Apache Mesos at its core, it also uses its isolation mechanisms which are enforced by containerizers. Mesos offers with two different containerizers: Mesos (referred to as UCR in DC/OS) and Docker.
For isolation, both containerizers rely on Linux cgroups.
For CPU isolation, you can use two different cgroups subsystems:
Both solutions have some limitations when it comes to the goal of flexible but predictable performance. This is where the completely fair scheduler (CFS) comes in. It allows strict CPU limitation (i.e., specifying the maximum CPU bandwidth available to a group or hierarchy).
It might seem to be a disadvantage that containers don't receive idle CPU cycles in CFS. However, in production setups predictable performance is a more desirable characteristic.
How does it affect DC/OS users?
When using Apache Mesos you have the choice to use CPU shares or CFS strict CPU limitations. If you use CPU shares and your host system has free CPU cycles your task can consume more CPU cycles than initially configured in your Marathon app definition. If you use CFS strict CPU limitations, your task can only consume a maximum of CPU time based on your Marathon configuration.
The default configuration for Mesos is to use CPU shares. CFS strict CPU limitations as the default were introduced in DC/OS a while ago, but until recently this configuration was respected only by the Mesos executor and not by the Docker executor. The fix for MESOS-6134 in the latest Mesos release and also included in DC/OS 1.10 removes this limitation.
If you recently upgraded to DC/OS 1.10 or configured
MESOS_CGROUPS_ENABLE_CFS=truein your Mesos agent configuration and you are now seeing slow running Docker applications or slow deployments, you probably want to take action!
If you run into such issues, you should increase the required CPU amount in your Marathon app definition. Your apps/deployments are running slowly because they require more CPU cycles than they are able to consume. Thus, the easiest way to solve this issue is to change the resource requirements in your Marathon app definition. Just change the
cpusproperty of your app definition to a higher value and test if this change solves your issues.
In some special cases you may want to change Mesos Agent configuration to not use strict CFS CPU limitations. Maybe the majority of your applications have a CPU peak during startup and a lower consumption afterwards or you have other advanced CPU loads.
If you do not want strict CPU separation, you can change the current default behavior. In this scenario you need to change the configuration for your DC/OS installation as well as your Mesos Agent configurations. First change this line in your dcos-config.yaml to
MESOS_CGROUPS_ENABLE_CFS=false. Once this is done, you can perform either a DC/OS re-installation or ssh to all Mesos agent nodes, or simply change the configuration in
MESOS_CGROUPS_ENABLE_CFS=falseand restart the Mesos agent process with
sudo systemctl restart dcos-mesos-slave. If you are considering changing this configuration, you should also have a look at the Mesos oversubscription feature.