Product, Company, Partners

YARN on Mesos will bridge the world of Mesos and big data

For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.

Feb 11, 2015

D2iQ

D2iQ

On August 24, 2014, in a mid-sized lecture hall in a Chicago hotel, Mohit Soni and Renan DelValle of eBay gave a presentation at the first annual MesosCon. Their deceptively straightforward topic? Running multiple resource managers on the same cluster in a datacenter with both YARN and Mesos.

 

Today, Mesosphere and MapR are proud to announce project Myriad, an open source framework for running YARN on Mesos that integrates the two major powerhouses in the datacenter—Mesos and Hadoop—and makes them fully compatible technologies. Project Myriad extends the work by Soni and DelValle and invites a larger community to participate in the project.

 

Project Myriad is currently being shepherded by the Apache Mesos project, but will soon be submitted to the Apache incubator program to become an independent project.

 

Why Myriad is Important

 

For organizations that run YARN, conventional practice has been for operations teams to create a statically partitioned cluster dedicated to YARN workloads. In this siloed model, the YARN cluster would only run Hadoop workloads and nothing else. It would have its own hardware or cloud instances, its own operations team, and could not share resources with other workloads in the datacenter.

 

Project Myriad combines the best of YARN and Mesos, allowing modern Hadoop workloads to run elastically with other datacenter and cloud workloads, thereby sharing resources with all of the organization's Linux applications (e.g., web servers, Java apps) as well as their datacenter services like Cassandra, Kafka, Elasticsearch, and Kubernetes.

 

Running YARN on the same cluster as other datacenter services and applications dramatically increases utilization and agility while reducing operational complexity and cost. With project Myriad, operations teams can meet the needs of data scientists and developers with one set of resources without being hamstrung by the need for YARN to run in isolation, on dedicated clusters.

 

"The implications are exciting for a number of reasons," said Florian Leibert CEO of Mesosphere. "From a 'fast data' standpoint, Myriad is about bringing the big data jobs closer to the compute and tearing down the silo's. Operationally, Myriad increases utilization rates and reduces the complexity of spinning up dedicated clusters."

 

How it Works

 

Myriad works by delegating its resource management to Mesos. Mesos is uniquely suited for this delegation because, unlike YARN, it is a two-level scheduler. As originally documented in the UC Berkeley AMPLab research, a two-level scheduler with a Dominant Resource Fairness (DRF) algorithm allows an infinite variety of scheduling and allocation algorithms to all run multitenant in the same datacenter.

 

Underneath the hood, Myriad is a Mesos framework with a scale up/down REST API that consumes resource offers from Mesos and makes decisions to launch new NodeManagers to execute YARN tasks. It passes the required configuration and task launch information to Mesos which forwards that to the Mesos nodes. These nodes launch a Myriad Executor which will manage the lifecycle of the NodeManager.

 

Myriad makes it possible to elastically expand or contract YARN on demand, to execute workloads that span thousands of nodes, and to run those workloads on the same cluster—and with the same data—as other workloads in the datacenter, including long-running services and Docker containers. This allows YARN to consume otherwise underutilized resources and increases datacenter utilization and business flexibility. To read more about how Myriad works, see the How it Works section in the Github repo.

 

For users of the Mesosphere Datacenter Operating System (DCOS), Myriad will be a datacenter service that can be installed on your DCOS cluster and then used to launch YARN NodeManagers.

 

Join the Community

 

We are looking to build a strong Myriad community. Mesos and Hadoop stakeholders are showing great interest in the project, and we welcome new contributors as we officially submit Myriad to the Apache Foundation as an incubator project. If you would like to help us evolve Myriad and grow the combined Mesos and Hadoop open source ecosystems, feel free to visit the Myriad Github project to submit tickets, pull requests, or become a contributor.

Ready to get started?