Production-Proven Data Services on DC/OS 1.9 | D2iQ

Mar 14, 2017

Eryn Muetzel


4 min read


Data is growing at a rate faster than ever before. Every day, 2.5 quintillion bytes of data are created. But how are enterprises taking advantage of this data?

Over the past two to three years, companies have started transitioning from big data, where analytics are processed after-the-fact in batch mode, to fast data, where analysis is done against data streaming in real-time to provide immediate insights. Fast data allows companies to create new business opportunities and serve their customers in new ways, and has become the core of powerful business applications. According to a recent OpsClarity survey, over 92% of companies plan to increase their investment in streaming data in the next year, indicating an increasing shift in the use of data processing to serve customer-facing applications.

It appears enterprises are finally realizing business value from data. But the learning curve is still steep.

Why We Need Data Services Automation

Businesses looking to develop new services (from personalization to Internet of Things) will often need to build their solution using a combination of data services. Kafka and Spark are common examples. However, the distributed nature of these data services can make them very difficult to deploy and operate. These challenges fall in three areas.

First, installing a production-grade platform service such as Kafka or Cassandra requires specialized knowledge of operators; even for an expert, deployment is time consuming and often requires significant engineering effort. Second, ongoing operations of these technologies is complex and risky; common tasks such as upgrading software, deploying updates, rolling back in failure scenarios, monitoring health, and managing storage resources are often manual and error-prone. Third, maintaining enough infrastructure to handle data-processing peaks gets expensive. Average datacenter utilization continues to hover around 6-12%, driven by companies' desire to maintain high service quality through peak load periods.

The consequences are severe - and include:

  • Longer time to market for new applications
  • Low productivity of developers and operators
  • Increased risk of downtime
  • Diminished ability for data scientists and developers to experiment with new technologies
  • High on-premise or cloud infrastructure costs

Mesosphere DC/OS: Production-Proven Infrastructure for Fast Data

Mesosphere DC/OS is the only production-proven platform that runs both containers and data services on the same infrastructure. DC/OS provides one-click installation of data services such as databases, message queues, and analytics engines, on-par with cloud providers such as Amazon Web Services. These services are simple to deploy, operate, and scale, and run on shared infrastructure, dramatically increasing utilization. This is possible because DC/OS utilizes two-level scheduling with frameworks to implement application-aware lifecycle management, which provides key differentiators versus simply running services in containers on container orchestration platforms such as Kubernetes and Docker Swarm. For example, Mesosphere DC/OS enables:

  1. Single-command install of cloud native data services such as Spark, Cassandra, HDFS, Kafka and Elasticsearch, among many others. DC/OS also dramatically simplifies resizing instances of a data service, as well as adding more instances.
  2. Reduced time and effort involved with operating cloud native data services through simple runtime software upgrades and updates, application-level monitoring and metrics, and managed persistent storage volumes.
  3. Dramatically increased utilization, enabled by multiple data services, containerized applications and traditional applications all running on the same infrastructure.

One-Click Data Service Installation with Mesosphere DC/OS

Working with Our Partners to Bring New Data Services to DC/OS

An operating system is only as powerful as the applications that run on it. To that end, we've been incredibly focused on working with our partners to bring new data services into the DC/OS ecosystem. More than a year ago, we integrated the popular open source tools Spark, Kafka, Cassandra, and Akka (colloquially known as the "SMACK stack"), and in August announced partnerships with the top companies behind these technologies, including Confluent, DataStax, and Lightbend.

But we didn't stop there. DC/OS 1.9 includes production-quality integrations with an additional 7 data services, including both open source and partner-supported technologies. With these partners, we are bringing incredible value to our customers, who can now install these services datacenter-wide with a single click, operate them with no downtime, and run them elastically across shared infrastructure.

Mesosphere DC/OS Universe: One-Click Installation of Over 100 Platform Services

Interested in learning more? You are invited to our data services webinar series taking place over the next several weeks. We will spend an hour with each partner to dive deeper into their technology, use cases, and demos!

DataStax is the provider of DataStax Enterprise (DSE), the always-on data platform for cloud applications powered by the industry's best distribution of Apache Cassandra™. DSE lets you focus on what matters most to you and makes it easy to distribute your data across datacenters or cloud regions, making your applications always-on, ready to scale, and able to create instant insight and experiences. Your applications are ready for anything – be it enormous growth, handling mixed workloads, or enduring catastrophic failure. With DSE's unique, fully distributed, masterless architecture, your application scales reliably and effortlessly.

Approximately six months ago, we worked with DataStax to bring DataStax Enterprise to DC/OS. Today, we are announcing expanded support for DataStax Enterprise which includes integrated analytics, search, and graph capabilities as well as administration and monitoring, developer tooling, and more. Our customers have been clamoring for this joint offering, and we are excited to make it available.

"DataStax provides data management for cloud applications. Since we announced our partnership with Mesosphere six months ago, we've seen accelerated interest among our customer base," said Kathryn Erickson, Director of Strategic Partnerships at DataStax. "Deploying DataStax Enterprise, the always-on data platform on DC/OS helps customers get the most out of their infrastructure, and critically, it makes their data management highly portable across clouds or data centers. With the most recent integration of DataStax Enterprise, we bring a unified data platform of search, analytics, graph, and monitoring capabilities into to the DC/OS ecosystem. The partnership provides our customers an always-on, scalable solution that can deliver instantly actionable insight for their applications.

Register for the DataStax webinar!

Elasticsearch is a popular open source tool for distributed search and analytics. The complete Elastic Stack (including Elasticsearch, Beats, Logstash, and Kibana) is used to ingest, search, analyze, and visualize data in real-time. Users include Goldman Sachs, who is using the Elastic Stack to track and analyze stock trades to provide better financial guidance, Netflix, who is ensuring message delivery and operational excellence, and Verizon, who has been able to reduce mean-time-to-resolution by 10x.

We worked with Elastic, the company behind Elasticsearch, to bring the complete Elastic Stack to DC/OS. While historically provisioning and managing the five subcomponents of the Elastic Stack could be manual and error-prone, DC/OS makes it easy to have a highly available Elastic Stack cluster running in five minutes.  

Couchbase delivers a NoSQL database that makes it simple to build adaptable, responsive always-available applications that scale to meet unpredictable spikes in demand and enable mobile and IoT apps to work offline. Organizations around the world choose Couchbase for its advantages in data model flexibility, elastic scalability, performance, and 24x365 availability to build enterprise web, mobile, and IoT applications. Couchbase customers include industry leaders like Amadeus, Marriott and United Airlines, as well as hundreds of other household names.

Deploying Couchbase on DC/OS allows customers to deploy with one-click, upgrade and resize instances with no downtime, and run Couchbase alongside other tools such as Spark for a complete analytics solution.

"Couchbase and Mesosphere are both truly distributed technologies," said Narayan Sundareswaran, Global Business Development at Couchbase. "This makes them both cloud native for any type of public, private or hybrid environment. As our customers grow their Couchbase usage to large scale deployments, technologies like Mesosphere can dramatically simplify expansions."

Register for the Couchbase webinar!

Apache Kafka, a high throughput distributed messaging system, is experiencing tremendous adoption. Confluent is the company founded by the creators of Apache Kafka, and builds the Confluent Platform on top of open source Kafka. We announced the integration of Confluent Platform with DC/OS a few months back, and have seen continued demand since. We will continue to work with Confluent to bring the latest and greatest Confluent Platform capabilities to DC/OS.

Register for the Confluent webinar!

Apache Flink is an open source distributed data stream processor. Flink provides efficient, fast, consistent, and robust handling of massive streams of events, as well as batch processing as a special case of stream processing. The engineers at data Artisans wrote the first line of code of what would later become Apache Flink back in 2010, and they continue to push the project forward.

The Apache Flink community integrated Flink with DC/OS because of the large demand for Mesos and DC/OS from their user base: almost a third of Flink deployments were on Mesos, even before the integration. We are excited to be working with data Artisans and the rest of the Flink community to bring a next-generation analytics engine to the DC/OS community.

Register for the Flink webinar!

Alluxio, the world's first system that unifies data at memory speed, enables enterprises to run any application, such as Spark, Presto, on top of any storage system(s), such as S3, GCS, ECS, HDFS, Ceph, NFS, at scale on premise and in the cloud. Alluxio, one of the fastest growing open source projects, was originally developed at UC Berkeley's AMPlab (just like Apache Mesos). Alluxio Enterprise Edition, the commercial offering based on its open source memory speed virtual distributed storage platform, is now available in the DC/OS Universe.

"Alluxio and Mesosphere share a common legacy at the UC-Berkeley AMPLab, where our core technologies were created and incubated alongside other innovative systems such as Spark," says Alluxio VP of Products, Neena Pemmaraju. "These technologies come together again on a powerful and easy-to-use DC/OS. The ability to integrate the Alluxio memory-speed virtual distributed storage system with various computation frameworks such as Spark on DC/OS will change the way organizations build and run data-driven applications."

Register for the Alluxio webinar!

Lightbend is the provider of the world's leading Reactive application development platform. A few months back, Lightbend introduced their Fast Data Platform (FDP), which uses DC/OS as the preferred infrastructure foundation on top of which the FDP components will run. Lightbend Fast Data Platform bundles Apache Kafka, Apache Spark, DC/OS, OpsClarity, Apache Flink, and Lightbend Reactive Platform with Akka, Akka Streams, Play, and Lagom. We have worked with Lightbend to make DC/OS the best possible platform for running applications developed with Spark and the Lightbend Reactive Platform.

Read more from Lightbend's blog post.

Register for the Lightbend webinar!

Redis Labs is an in-memory data structure store, used as a database, cache and message broker, for high performance operational, analytics or hybrid use cases. Redis is the most widely adopted in-memory NoSQL database worldwide, and allows developers to deliver millions of operations per second at sub-millisecond latencies. Redis Labs, the company behind the Redis open source project, provides an enterprise offering enhanced with hasslefree automated scaling, clustering, multi-zone high availability, auto-failover, continuous monitoring and 24x7 support.

"The partnership between Mesosphere and Redis Labs is a powerful pairing of two incredibly popular technologies, and one that our joint customers and prospects have been asking for," said Rod Hamlin, Vice President of Strategic Partnerships at Redis Labs. "Large enterprises now get the benefit of high performance, seamlessly scaling Redis Enterprise managed by Mesosphere's sophisticated DC/OS technology, providing the premier platform for elastically scaling big data applications."

Read more from Redis Lab's blog post.

Register for the Redis Labs webinar!

Basho's products, Riak KV (Key Value) and Riak TS (Time Series), are highly available, scalable, and easy to operate, distributed NoSQL databases. They automatically distribute data across the cluster to ensure fast performance and fault-tolerance. Riak KV targets Key-Value use cases, while Riak TS focuses on time-series and IoT scenarios. Riak products are used by fast growing Web businesses and by one-third of the Fortune 50 to power their critical Web, mobile and social applications. With Riak managing the data tier and Mesosphere DC/OS managing the underlying infrastructure, customers can efficiently and easily scale distributed applications.

"Basho is very excited about Mesosphere DC/OS 1.9, especially the aspects that improve operational simplicity and networking. We believe this release will encourage Riak users to embrace a hybrid-cloud approach and make it easier for them to deploy and scale distributed, real-time applications," said Dave McCrory, Chief Technology Officer at Basho.

Read more from Basho's blog post.

Accelerating the Addition of New Data Services

Moving forward, we are accelerating the pace at which new data services will be integrated with Mesosphere DC/OS. Our newly developed open source SDK provides a high-level interface for building new stateful services on DC/OS. With the SDK, developers can write a stateful service complete with persistent volumes, fault tolerance, and configuration management in about 100 lines of code. This SDK is the product of Mesosphere's experience writing production stateful services for DC/OS such as Kafka, Cassandra and HDFS. Integrations written with the DC/OS SDK will be production quality, will be standardized with a familiar operator experience across partner technologies, and will be developed in days or weeks, not months.

We are thrilled to be working with such a world-class group of partners, and are extremely excited to continue to bring new stateful data services to the DC/OS community.

Ready to get started?