Events

Meet DRAX the Destroyer

For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.

Jun 19, 2016

Michael Hausenblas

D2iQ

I assume that you've heard of Netflix's Chaos Monkey? It seems that chaos-based resilience testing is becoming increasingly popular these days, especially in the context of containers and folks have been asking for a Mesos-specific tool as well.

 

Now, the other day I get a mail from Flo, our CEO, asking me what the effort would be and if I could take care of it. Given that both the community and my CEO seems to think that's a good thing to have I thought: well, let's give it a try.

 

The result is now available and I decided to name it after a Guardians of the Galaxy character (because I like Marvel) who exposes certain (in this context desirable) properties, namely his destructive manners. Meet DRAX, a DC/OS-specific resilience testing tool that works mainly on the task/ container level.

 

In a nutshell, DRAX runs as a Marathon app, killing off random tasks of any (non-framework) or specific (non-framework) application running in Marathon. Why this limitation to Marathon? Simple: this sort of testing is mainly interesting for long-running apps, so exactly what Marathon is used for to run.

 

Enough talk, let's see DRAX in action.

 

In action

 

All you have to do to get started is either clone the GitHub repo dcos-labs/drax or simply download DRAX's app spec and you can launch DRAX like so (assuming you've got a DC/OS 1.7 cluster and the CLI installed):

 

dcos marathon app add marathon-drax.json

 

Once that is done you should see DRAX running in Marathon an ready to rampage:

 

DRAX in action.

DRAX in action.

 

A concrete rampage, targeting any app, looks as follows:

 

$ http POST $PUBLIC_AGENT:7777/rampage HTTP/1.1 200 OK Content-Length: 121 Content-Type: application/javascript Date: Mon, 13 Jun 2016 12:15:19 GMT {"success":true,"goners":["webserver.0fde0035-315f-11e6-aad0-1e9bbbc1653f","dummy.11a7c3bb-315f-11e6-aad0-1e9bbbc1653f"]}

 

BTW, no worries, DRAX is smart enough not to target itself.

 

Next steps

 

This is only the beginning. I'd love to hear feedback, either via the DC/OS Community Slack or directly by raising an issue on the repo. What would you like to see next? Node-level rampages ala Chaos Monkey? Or maybe taking down an entire cluster to simulate a DR scenario? Network partitions? I'd appreciate if you use DRAX and let me know what sucks and what rocks. Thanks for your time and hope to hear from you!

Ready to get started?