Apache Spark has always been intrinsically connected to Apache Mesos, and the two remain a great pairing as they both rise to prominence among mainstream IT users. During Mesosphere's HackWeek in March, we worked closely with partner Typesafe to make them gel on our Datacenter Operating System, as well.
For the unfamiliar,
Spark is a generic data processing platform, capable of running both batch- and stream-processing workloads. Many consider it the logical successor to Hadoop MapReduce, but it is certainly more than that: it allows a unified way to do SQL, machine learning, graph processing and more in a storage-agnostic way. Spark works with data from sources including, but not limited to, HDFS, Cassandra, S3, Tachyon and Elasticsearch.
It is very popular among data scientists as well as the wider big data crowd.
A not so widely known fact is that Spark has its root in Mesos: it was initially developed at the
AMPLab as a
proof-of-concept Mesos framework to demonstrate how easy and fast it is to develop a distributed platform on top of Mesos. It became so successful that it was spun out into its own Apache project and has now commercial backing by companies such as
Databricks,
Typesafe and, recently,
IBM, which went all in on it.
Spark on Mesos is not only a
natural choice because of the projects' joint history, but also because Mesos offers some very practical advantages over other cluster managers. Since Mesos enables you to run stateless services such as web servers or application servers on the same cluster as Spark for analytics purposes, it increases the overall cluster utilization and accommodates the effects of data gravity.
You can learn more about data processing with Spark on Mesos and the DC/OS here: