For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.
With Mesosphere's Datacenter Operating System (DCOS), it's possible to develop and deploy a non-trivial, production-ready distributed application in a matter of days. We were able to prove this during a recent company offsite meeting, by creating—in just three days—a crime-mapping app built on a backend of DCOS, Marathon, Kubernetes, Kafka, Spark, InfluxDB and other components.
Trying the same thing without DCOS, one would find herself still installing and configuring the individual components by the time we had completed the entire project.
Mesosphere recently had its company offsite meeting in Mexico. As part of a hackathon during the trip, my team decided to build a DCOS demo using real-world crime data. The stories that crime data—in our case some open data from the City of Chicago—can tell are manifold: from real estate planning to police dispatch, there's a lot of value in having online access to data (which crime is peaking right) as well as offline access (historically speaking, the geographical hotspots for crime in the city).
The team working on the time series demo was Michael Gummelt, Tobi Knaup, Stefan Schimanski, James DeFelice and myself. We had about three days to complete the project, from idea to architecture to implementation and documentation. At the beginning there was zero codebase—we did not have anything to build off of in the first place.
However, because DCOS allows for rapid iterations and the architecture is rather modular, we were able to get the demo done in time and still enjoy some fun in the sun.
Here is the architecture we developed:
Our underlying goal was to demonstrate and apply as many good practices as possible, such as:
- By using custom Docker images like mhausenblas/tsdemo-s3-fetcher we demonstrated a simple yet effective CI/CD pipeline within DCOS.
- By using Kubernetes we showed how having the web app and the S3 fetcher in a pod is beneficial in terms of data locality.
- By using secrets in Kubernetes we demonstrated how to pass along AWS credentials in a secure manner.
- By using Kafka to feed both the online and the offline part we showed how easily different workloads benefit from a single, reliable data source.
What we learned
The Spark Streaming process implementation went smooth, with only a few minor issues, such as data ingestion into InfluxDB via its HTTP API and JSON serialization challenges. Our implementation—which has both the online processing part that outputs into InfluxDB and the offline processing part that writes into a pre-defined S3 bucket, in one single Spark Streaming app—is not considered good practice. We went with it because of the time constraints of the hackathon.
In our demo, AWS S3 is the main link between the Spark Streaming process and the offline reporting web app. It's a handy device and straightforward to use from the consuming, down-stream Web app.
We hit a few bugs with Kubernetes (we used to implement the offline processing part), which are all but one "core" bugs. That is, they are present in Kubernetes itself and are not related to how Kubernetes is set up in the DCOS:
- There is a known issue with imagePullPolicy in that imagePullPolicy: Always does not re-pull an image.
- With emptyDirectory mounts there is a known issue in that it is owned by root:root and its permissions are set to 750 (see also k8s-offlinereporting-rc.yaml).
- There is currently no way to set secrets through environment variables, hence we used Kubernetes Secrets (see install offline reporting Web UI for details).
- If one of the containers in a pod is flapping, the endpoint and with it the associated service doesn't come up. For example, at some point in time the S3 fetcher Docker image had a faulty command in it, so the service using this pod was not available.
A learning experience
Within three days, we managed to assemble the codebase, test and deploy a working distributed application, end to end. With DCOS, it's straightforward to implement both the stateless parts as well as the core data pipeline of the app. While we identified some areas for improvement, such as replacing InfluxDB with Cassandra+KairosDB, we were overall really happy with the outcome.
We'd love to hear from you to learn if you had similar experiences or maybe want to try out our demo.