5 min read
NBCUniversal, the third largest media company in the world, is breathing new life into traditional TV advertising by combining batch and real-time analytics. Traditionally, ads for linear TV (e.g. scheduled programs) were sold by the show's title, and were not based on any real data about the audience or what they had viewed. In contrast, audience targeting in digital, non-linear TV has been going on for quite some time.
NBCUniversal is now bringing data-driven targeting to linear TV with its Audience Studio offering, so that audiences can be targeted across all delivery platforms. The key component of Audience Studio is a data management platform that marketers can use to combine their first-party data with NBCU's and third-party data.
New Technology Challenges
Building the Audience Studio platform presented some unique technology challenges for the team at NBCU.
First, the platform needed to support both batch (e.g. viewership data from cable companies) and streaming (e.g. data from the NBC Sports app) analytics. While NBCU needed to do very large-scale batch processing in the evenings, the nature of workloads shifted to an analytical mode during the daytime.
Second, the platform needed to take advantage of containers -- the team at NBCU recognized the importance of containerization in making software easy to deploy, repeatable, and scalable.
Finally, NBCU needed to utilize their infrastructure resources more efficiently to be cost effective. This meant they were looking to move away from a Hadoop-based architecture due to its high cost in the cloud, and running technologies such as Spark and Cassandra in their own, separate clusters of servers would lead to a huge amount of waste due to low utilization.
A New Architecture for Fast Data Applications: the SMACK Stack
To solve these challenges and get to market quickly, NBCUniversal decided to build Audience Studio on Mesosphere DC/OS and utilizing the open source data services including Apache Spark, Apache Cassandra, and Apache Kafka, all running on top of Amazon Web Services.
Mesosphere DC/OS: Like many companies, NBCUniversal has a cloud-first initiative. As part of that strategy, the team building Audience Studio was adamant that they avoided getting locked into any single cloud vendor's proprietary APIs, in the event that they wanted to migrate the application later on. Mesosphere DC/OS was chosen so that the NBCU team could easily deploy and operate the open source data services they wanted, while retaining complete cloud portability. The NBCU team originally started with Mesos and moved to DC/OS for its ease of use, particularly for data services such as Cassandra and Kafka. "External persistent volumes are a godsend. That's how I'm able to run my Postgres database on the DC/OS cluster. It just mounts the volumes, and if the node goes down, it just shifts over," says Thomas Barr, Chief Architect at NBCU. DC/OS also provides NBCU container orchestration capabilities, as well as CI/CD (i.e. Jenkins) capabilities.
Apache Spark: Spark was a linchpin in the architecture. Spark, which has become the defacto analytics engine, was chosen for its streaming and batch analytics capabilities. NBCU also utilizes Spark notebooks and Tableau to give data scientists and analysts access to raw data.
Apache Cassandra: Cassandra operates as a NoSQL store for the device graph for audience targeting. Cassandra was chosen as it was considered the best of breed for NoSQL technology, and was able to handle the load and scale required by NBCU.
Apache Kafka: Kafka is used as a queuing mechanism. NBCU deals with a large amount of streaming data, which can't be written directly into S3. Data is written into the Kafka queue, where it is picked up by Spark streaming, and then streamed into S3.
As witnessed by NBCU and many other companies that Mesosphere works with, this combination of Spark, Mesos, Cassandra, and Kafka (i.e. the "SMACK" stack) has become a prominent architecture for new data-rich applications, based on its scalability, high availability, simple deployment and operations, open source heritage, and efficient resource utilization.
Cloud Flexibility, Resource Efficiency, Developer Productivity... Oh My!
NBCUniversal has achieved significant benefits by adopting the SMACK stack and DC/OS. As prioritized at the start, the team has complete infrastructure flexibility. Thomas Barr explains, "I do not want to go down that path where I lock myself into a provider. If I was ever told to migrate off a cloud provider, it would only be two days worth of work. And of course, DC/OS runs anywhere."
The team has also been able to increase utilization of its cloud instances by at least 2x, halving its AWS bill. DC/OS allows NBCU to pool its cloud instances into one contiguous cluster, so individual clusters for Spark, Cassandra, and Kafka are no longer required. And the operations of these data services is simplified to a new level.
"Even though I'm at the VP level in my company, I am the tech ops. Because it just doesn't require a half-time staff on this thing. It just works."
- Thomas Barr, Chief Architect, NBCUniversal
DC/OS has allowed NBCU to increase developer productivity, based on its container orchestration and CI/CD capabilities. "DC/OS allows my developers to concentrate on writing the code that needs to be written as opposed to trying to figure out how to get everything working together. Everything just deploys," says Thomas Barr.
Looking Forward
NBCUniversal continues to expand its Audience Studio offering. As traditional TV advertising is brought forward to meet the expectations of modern advertisers, NBCU is poised to utilize any data it can get its hands on with the help of DC/OS.