You’re 4 Words Away From a Complete Big Data System | D2iQ

Aug 20, 2015



7 min read

$ dcos package install infinity

One command, four words and users will have a next-generation big data system in place, capable of processing the streams of information flowing into their companies every second of every day. That's the promise we're making with the announcement of Mesosphere Infinity, a new product that combines a best-of-breed real-time analytics stack into a single package in our Datacenter Operating System (DCOS).

When DCOS users type the install command, the system will install Spark, Akka, Cassandra and Kafka. Apache Mesos, of course, is already installed as part of the DCOS. This is an ideal environment for handling all sorts of data-processing needs -- from nightly batch-processing tasks to real-time ingestion of sensor data, and from business intelligence to hardcore data science. What the open source LAMP stack did for web applications, the Infinity stack can do for data-based applications.

In late 2014 Alexy Khrabrov, Chief Scientist at Nitro and principal of By the Bay, came up with the idea of creating a complete end-to-end data pipeline training that combined Spark, Akka, Cassandra, Kafka, and Apache Mesos, sometimes called the SMACK stack. To test this idea in the real world he assembled a dream team of partners and companies, including Mesosphere, to deliver a one day training course during the Scala By the Bay and Big Data Scala events. This training was overwhelmingly successful, with over 100 engineers participating to build their own end-to-end analytics pipelines in a single day by using these components.

Mesosphere is productizing this group of open source components via Infinity because within the Fortune 500 and other mainstream businesses, these types of data-centric workloads aren't merely hypothetical anymore. CEOs, CIOs and other decision-makers know they need to figure out how they'll capitalize on opportunities, and customer expectations, such as personalization and the Internet of Things. They know the window is closing on becoming early adopters of advanced data analytics for competitive advantage, rather than simply keeping up with the Joneses.

We hear it from our large-enterprise customers, such as Verizon. Our collaborator on Infinity, Cisco, and our technology partners on Infinity -- a list that today includes Confluent, Elodina and Typesafe -- are hearing the same things from their customers. They know what they need to do, and now they want the technologies to do it right -- without spending huge amounts of money and man-hours stitching together open source components.

Data isn't just big, it's everywhere

The challenges -- and opportunities -- that companies are facing are well known by now. Just a few years after the term big data came into vogue, it already feels antiquated. Data isn't just big, fast and diverse anymore, it's also everywhere. It's ubiquitous.

Data is coming at us from servers, sensors, phones, appliances, toothbrushes and even jewelry. And if industry analysts are right, all that ubiquitous data is more valuable than ever. It's the key to changing how business decisions are made and how consumers interact with brands and products.

The downside, however, is that the ease of processing and analyzing data hasn't always kept up with the ease of creating it. The learning curve for using new analytics tools and techniques has been decreasing, especially thanks to technologies such as Spark, but installing and managing environments for processing ubiquitous data remains difficult. It can be very costly in terms of time, money and manpower.

This is the problem Mesosphere wants to solve with Infinity. We're making it easier than ever to install and manage an integrated -- and proven -- data stack. Because it runs on the DCOS platform, everything shares the same set of resources and can be monitored using the same dashboard.

What is the Infinity Stack?

If you're unfamiliar with Mesos, Spark, Akka, Cassandra and Kafka, here is a breakdown of what each Infinity component brings to the table. Essentially, they address the fundamental requirements, and then some, for dealing with big, fast and ubiquitous data:

  • Mesos: Mesos is the resource-management core of the DCOS, responsible for abstracting a cluster of machines into a single resource pool that can run all of the Infinity components (and much more), while providing advanced job scheduling, resource isolation and high availability.
  • Spark: Spark is a popular open source data-processing framework renowned for its speed (in-memory and on-disk) and ease of programming. It's commonly used as an alternative to MapReduce for batch-processing jobs, but various Spark components also address interactive SQL, machine learning, stream processing, statistical computing and graph processing.
  • Akka: Akka is a toolkit for building distributed and concurrent applications that run on the JVM, and supports programming in both Scala and Java. In the case of the Infinity stack, Akka acts as a middleware layer housing the business logic and managing message-passing among the various components.
  • Cassandra: Cassandra is a NoSQL, key-value datastore originally created at Facebook. It's optimized for speed, scalability and reliability, including the ability to span multiple datacenters. Large Cassandra clusters -- running in in production environments at major companies -- can comprise tens of thousands of nodes and petabytes of data.
  • Kafka: Kafka is a distributed messaging system originally developed at LinkedIn, and now in use across numerous industries. It's designed to ingest huge volumes of data flowing into a system and deliver it in real-time to other data systems, such as Spark, Cassandra, Hadoop and MySQL.

The various pieces of this stack are already deployed at massive scale -- often in combination with each other -- across a user base that includes Apple, Twitter, Netflix, LinkedIn, Verizon, Disney and more.

As useful as this big data stack is, though, it's just the core of what should become a much broader platform. Because all the data-processing components run on top of the DCOS, Infinity is actually very flexible in terms of what it can support. While we think Akka, Cassandra, Kafka and Spark are the right starting point for most users, we envision adding additional file systems, databases, and other components to the mix as technologies and customer demands evolve.

Seeding a community of users and use cases

But as everyone knows, infrastructure technology is nothing without applications. That's why we expect Mesosphere Infinity to deliver more than just a simple way to deploy and manage a suite of data-processing technologies. We expect it will trigger an avalanche of innovative applications from users that will inspire others to rethink their big data plans.

A reality that often gets glossed over by vendors trying to glom onto new trends with old technology, or trying to sell a small piece of a large system, is that simply calling out buzzwords like IoT or big data isn't enough. What companies trying to act on these new opportunities need are real-world examples of what's possible -- which new data sources to analyze, which physical components to measure and how to build a data pipeline to crunch all this information.

Mesosphere Infinity already has early users and concrete use cases, such as Verizon. It's using the software stack to analyze all sorts of data streaming off customer devices, and hoping to improve network performance and the overall customer experience as a result. When Infinity becomes generally available later this year, and as use cases start piling up, we'll do our part to ensure the collective knowledge of our user base is shared with the broader community. We want businesses of all types to understand not just which technologies are available for harnessing their streams of data, but also how they can actually use those technologies to their fullest advantage.

You've heard about how much revenue will soon be tied to the Internet of Things, how advanced analytics can optimize business processes and decision-making, and how an emerging class of technologies can help power efforts to capitalize on these trends. When Mesosphere Infinity hits the scene next year, it's all going to get a whole lot easier.

Catch us at MesosCon this week in Seattle to hear more about Infinity and the Datacenter Operating System.

** The original article was amended to include the history behind Mesosphere Infinity.



Ready to get started?