Product, Use Cases

Integration testing with Mesos, Chronos and Docker

For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.

Mar 26, 2015

Sunil Shah


It's easy to use Mesosphere's software to achieve better monitoring of production infrastructure. In this post, we show how a simple monitoring system can be easily implemented in your datacenter.




At Mesosphere, we run a handful of internal services that back web applications used by our customers (for example, Mesosphere for Google Cloud Platform). We often want to test our backend services to guard against application level issues, like external APIs being inaccessible or connectivity problems to cloud providers or repository mirrors. We'd like our monitoring to be frequent and regular, so that our engineers are alerted in a timely fashion if and when things stop working.


In this post, I'll describe in some level of detail an example of one of the ways we run integration tests against our services. The integration test will be deployed using our internal Mesos cluster and Chronos instance.


If you haven't used Chronos before, think of it as a distributed cron service. You can specify scheduled or one off jobs that can then be executed on a Mesos cluster. It has native support for Docker containers and robust scheduling logic.


We use a handful of other secondary services to help with monitoring - at Mesosphere we're fans of DataDog, an easy way to collect, aggregate and monitor time series data. It has great integration with other alerting services like PagerDuty, although in this example we trigger PagerDuty directly.




A locally running Python script using the requests and dogapi libraries to interact with our internal service's REST API and DataDog's API respectively. We attempt an action twice, logging (using DataDog's provided library) the outcome of each action. If both actions fail, we use the PagerDuty REST API to trigger a page that goes to our on-call team.


The script requires several credentials to access our service's REST API, essentially consisting of a pre-configured key (or OAuth token) along with an SSH public key and the hostname on which to access the internal service. For convenience, these credentials are read in from environment variables using Python's os.environ['MY_VAR'].


Our simple integration testing script (with our proprietary calls stubbed out) is shown below:


[highlight code='python']




import jsonimport osimport requestsimport socketimport timefrom dogapi import dog_http_api as api


# DataDog settingsapi.api_key = '<REPLACEME>'api.application_key = '<REPLACEME>'


def try_action(ssh_key):    # access an API    # code snipped


def cleanup_action():    # access an API    # code snipped


def send_to_datadog(host, success):    ts = int(time.time())    if success:        result = "success"    else:        result = "failure"    metric_name = "example.integration.{}".format(result)    api.metric(metric_name, (ts, 1), tags=["host:{}".format(host)])    print ("POSTED to DataDog")


def trigger_pagerduty(host, message):    trigger = {"service_key" :  "<REPLACEME>",               "event_type" : "trigger",               "description" : "Integration test failure",               "client" : "Example Integration Test",               "client_url" : socket.gethostname(),               "details" : { "failed_host": host,                             "provider" : provider,                             "message" : message }                           }'', json = trigger)    print ("TRIGGERED PagerDuty")


host = os.environ['HOST']ssh_key = os.environ['SSH_KEY']oauth_token = os.environ.get('OAUTH_TOKEN')


overall_success = Falsemessage = ''


for i in range(2):    (success, message) = try_action(ssh_key)    overall_success = (success or overall_success)    send_to_datadog(host, success)    (success, message) = cleanup_action()


if not overall_success:        trigger_pagerduty(host, message)






Since this script uses multiple libraries and we plan to run it on any one of various hosts in our Mesos cluster, Docker is a must.


We use a minimal python-monitoring Dockerfile published as a public image to the Mesosphere Docker Hub account. This is based upon an Ubuntu base image and has various versions of Python installed, along with the necessary libraries for this application.


To run our application in a python-monitoring container, the following works:


[highlight code='bash']


 docker run -t -i --entrypoint=/my-repo/ \              -e "PROVIDER=$PROVIDER" \              -e "HOST=$_HOST" \              -e "OAUTH_TOKEN=$OAUTH_TOKEN" \              -e "SSH_KEY=$SSH_KEY" \              --volume=$(pwd):/my-repo mesosphere/python-monitoring:latest




This command will mount the current directory into /my-repo within the container and grab the current values of the environment variables and pass them through to the container.


Note how the credentials are passed through as environment variables. This makes it considerably simpler to set up a Chronos job later.


An alternative approach is to have your credentials stored in a securely hosted artifact and include this in your job description. When Mesos runs your Chronos job, it'll fetch this artifact into the current working directory (which is mounted into the container).


Production Setup


To set this up in Chronos, it is necessary to post the JSON job description to our running Chronos instance. The job description is fairly straightforward.


[highlight code='json']


{  "schedule": "R/2015-03-13T00:00:00Z/PT1H",  "name": "Example Integration Test",  "container": {    "type": "DOCKER",    "image": "mesosphere/python-monitoring"  },  "cpus": "1.0",  "mem": "512",  "uris": [    ""  ],  "command": "cd $MESOS_SANDBOX && ./",  "environmentVariables": [    {      "name": "PROVIDER",      "value": "Google"    },    {      "name": "HOST",      "value": ""    },    {      "name": "OAUTH_TOKEN",      "value": "<REPLACEME>"    },    {      "name": "SSH_KEY",      "value": "<REPLACEME>"    }  ]}




In the JSON above, we:


Configure the schedule to run (see the Chronos README for in-depth information about specifying ISO-8601 schedules). In this example, we run every hour, beginning at midnight on the 13th of March, 2015.

Name our job (in Chronos, names are IDs, so choose carefully)

Specify the Docker container to pull down

List URIs to pull assets from (i.e. our Python script)

Specify the command to run

Specify various environment variables




$MESOS_SANDBOX is a special environment variable that the Mesos Docker executor provides to the running task. It mounts the current working directory into the container at the path stored in $MESOS_SANDBOX.


In this example, we cd into the $MESOS_SANDBOX directory and execute as a script.


Environment Variables


The name, value pairs within the environmentVariables array are those variables made available to the job and are also implicitly passed through to Docker (in a similar way to the -e name:value method used when running the container directly).


Networking Rules


This is specific to your setup, you may need to ensure the Mesos cluster on which your job will execute has access to the service you're testing against. In our case, we needed to whitelist access from our cluster to the cloud instances hosting the service we were testing.


Using a private Docker Hub image


Whilst we used a public image in this example, you often want to schedule containers based on private images. This is easy to accomplish, simply add a value pointing to a valid .dockercfg to the uris field of your job description.


Running your job


Using the excellent cURL alternative, httpie, you can easily post your job description to create a new job:


http -v POST < integration-test.json


It'll soon show up in the Chronos UI at You can force run it through the UI to see if it's all working correctly. If not (if the status changes to failed), you can access task logs through the Mesos UI or using the Mesos command line tool.




We find this tool invaluable to automatically check that our services are up and running correctly. While this is a fairly specific example of how we use our infrastructure, this post shows how straightforward it is to set up and run any sort of Dockerized batch job with Chronos on a Mesos cluster.

Ready to get started?