Spark and Mesos: shared history and future [Mesosphere Hackweek]

Jun 23, 2015

Michael Hausenblas

D2iQ

1 min read

Apache Spark has always been intrinsically connected to Apache Mesos, and the two remain a great pairing as they both rise to prominence among mainstream IT users. During Mesosphere's HackWeek in March, we worked closely with partner Typesafe to make them gel on our Datacenter Operating System, as well.

Mesosphere and Typesafe on various aspects of Spark on Mesos, including scheduling and deployment. Spark is now a DC/OS service, and earlier this month Typesafe announced a new DC/OS-specific Spark distribution that has been certified by Databricks and for which Typesafe will provide enterprise support.

For the unfamiliar, Spark is a generic data processing platform, capable of running both batch- and stream-processing workloads. Many consider it the logical successor to Hadoop MapReduce, but it is certainly more than that: it allows a unified way to do SQL, machine learning, graph processing and more in a storage-agnostic way. Spark works with data from sources including, but not limited to, HDFS, Cassandra, S3, Tachyon and Elasticsearch.

It is very popular among data scientists as well as the wider big data crowd.

A not so widely known fact is that Spark has its root in Mesos: it was initially developed at the AMPLab as a proof-of-concept Mesos framework to demonstrate how easy and fast it is to develop a distributed platform on top of Mesos. It became so successful that it was spun out into its own Apache project and has now commercial backing by companies such as Databricks, Typesafe and, recently, IBM, which went all in on it.

Spark on Mesos is not only a natural choice because of the projects' joint history, but also because Mesos offers some very practical advantages over other cluster managers. Since Mesos enables you to run stateless services such as web servers or application servers on the same cluster as Spark for analytics purposes, it increases the overall cluster utilization and accommodates the effects of data gravity.

You can learn more about data processing with Spark on Mesos and the DC/OS here: