Introducing Marathon 1.5
For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.
We are excited to announce the release of Marathon 1.5.1. Marathon is the container orchestrator powering Apache Mesos and DC/OS. Marathon 1.5 is part of DC/OS 1.10 and is also available for download as a standalone binary. This release includes a number of new features, bug fixes, and improvements. Among many other features and performance and scalability improvements, it includes support for file based secrets, support for multiple networks, a backup and restore mechanism, and a plugin interface to customize offer matching.
Backup & Restore
Marathon 1.5 has added a built-in backup and restore functionality. The complete current state of Marathon, which is kept in the persistent data store, can be backed up to an external file or to an external storage provider. Restoring from a backup brings Marathon to the exact state it was in at the time of backup creation.
For detailed information, please see the related marathon docs page.
Recent changes in Apache Mesos introduced the ability to handle a temporary unavailability agent. In this case (Marathon) tasks running would be placed in the TASK_UNREACHABLE state. This behavior allows for the ability for a node to disconnect and reconnect to the cluster without having a task replaced. To allow for a task to reconnect to the cluster, the default configuration will wait 75 seconds before restarting that task. Prior to the TASK_UNREACHABLE state Marathon would usually restart in less than a second.To make the behavior flexible it is now possible to configure unreachableStrategy for apps and pods to either instantly replace unreachable apps or pods or after a custom timeout duration (during which the task might have become already reachable again).
Marathon 1.5 introduced multiple networking improvements involving to better support multiple container networks. To support this, the field networkNames has been added to app container's ContainerPortMapping and the pod Endpoint.
Additionally container port discovery has been improved, with a pod or app being able specify with which container network(s) a port name/protocol/etc is associated. Discovery labels are now generated for container networks associated with ports.
Unfortunately this causes some breaking changes and the following deprecated fields will no longer be generated for app JSON:
Marathon will continue to accept old app JSON containing these fields as it did in 1.4; however, applications that use deprecated fields will be normalized into a canonical representation and hence external tooling cannot rely on these fields anymore and requires adjustments.
See the networking documentation for details concerning the new API.
File based Secrets
Marathon provides a pluggable interface to integrate secret store providers such as vault
With Marathon 1.5 this interface has been extended to support file based secrets which can be mounted into the Mesos Sandbox.
Please note, that there is not yet an OSS implementation of this interface.For detailed information, please see the related marathon docs page.
Customizable offer matching
Marathon now has a pluggable interface for custom logic during offer matching. Such plugins can be used to provide custom filters for offers, e.g., for these use cases:
- Analytics. If task fails, for example, 5 times for 5 minutes, we can assume that it will fail again and reject new offers for it.
- Binding to agents. For example, agents can be marked as included into primary or secondary group. Task can be marked with group name. Plugin can schedule task deployment to primary agents. If all primary agents are busy, task can be scheduled to secondary agents.
Outlook for 1.5.2
With Marathon 1.5.2 Mesos maintenance primitives will become fully usable. These maintenance primitives allow configure scheduled maintenance or unavailability window for Mesos agents.
In Marathon 1.5.2 it will be possible to opt-in for this feature and be able to respect possible unavailability. During the configurable draining time no new tasks will be started on the particular agents.