Product, Use Cases

"High Performance Spark" and "Cassandra: The Definitive Guide", New Free Book Excerpts from O'Reilly and Mesosphere

Learn more about these critical new open source technologies changing the way businesses build data-rich applications.

Jun 20, 2017

Eryn Muetzel


We are excited to announce the availability of two new books from O'Reilly Media, "High Performance Spark" and "Cassandra: The Definitive Guide". Mesosphere has teamed up with O'Reilly to offer an excerpt from each of the books as a free download.


Data is truly the new oil. Data can power entirely new revenue streams, but it must be extracted and processed effectively, and in real-time. The ability to easily access tremendous amounts of computing power has made data the new basis of competition. We've moved beyond traditional big data, where companies gained historical insights using batch analytics, to "fast data". In today's digital economy, businesses must learn to extract value from data and build modern applications that serve customers with personalized services, in real time, and at scale. Whether we're talking about messaging applications, IoT, connected cars, or business applications leveraging machine learning or AI, fast data systems are key.


A de facto architecture is emerging to build and operate these fast data applications, with a set of leading technologies used within this architecture that are scalable, open source, and enable real-time processing. This set of technologies is often referred to as the "SMACK" stack and includes:


  • Apache Spark for large-scale analytics
  • Apache Mesos for running data services and containers elastically
  • Akka for simplifying development of data-driven apps
  • Apache Cassandra for highly available and scalable storage
  • Apache Kafka for capturing message streams


Mesosphere DC/OS (which includes Mesos at its core) makes it incredibly easy to deploy, operate, and scale these data services (and dozens of others) natively alongside containerized microservices elastically on any infrastructure.


The SMACK Stack


"High Performance Spark" was written for data engineers and data scientists who are looking to get the most out of Apache Spark. The book lays out the key strategies to make Spark queries faster, able to handle larger datasets, and use fewer resources. This free preview edition features three chapters:


  • Introduction to High Performance Spark: An overview of Spark and why performance matters when analyzing data at very large scale
  • How Spark Works: Introduces the overall design of Spark as well as its place in the big data ecosystem
  • Tuning, Debugging, and Other Things Developers Like to Pretend Don't Exist: Describes how to leverage the Spark settings that have a significant impact on performance.


"Cassandra: The Definitive Guide" describes how and why to apply Apache Cassandra in your application in a production environment. This free preview edition features four chapters:


  • Beyond Relational Databases: History of relational databases and the recent rise of non-relational database technologies like Cassandra.
  • Introducing Cassandra: What's exciting and different about Cassandra, where it came from, and what its advantages are.
  • The Cassandra Architecture: What happens during read and write operations and how the database accomplishes some of its notable aspects, such as durability and high availability
  • Deploying and Integrating: Considerations for planning cluster deployments, including cloud deployments. Introduces several technologies that are frequently paired with Cassandra to extend its capabilities


At Mesosphere, many of our customers are generating significant value from the SMACK stack. The applications we make possible include edge computing for travel and tourism, systems that underpin autonomous cars, AI-based voice control systems, financial trading platforms, and many, many more of the apps you use every day. Given the momentum of the SMACK stack in building modern enterprise applications, we're excited to be sponsoring this series of O'Reilly books. Our platform, Mesosphere DC/OS, allows you to deploy both containers and big data services with push-button ease. DC/OS also stitches together your entire datacenter and all your cloud instances into a single set of compute resources to simplify operations and maximize efficiency.

Ready to get started?