Product, Use Cases, Partners

Deploying the Crate distributed database using Mesos and Marathon

Feb 25, 2015

Christian Haudum


14 min read

Editor's Note: This guest post is written by Christian Haudum of Crate. Crate is a venture-funded startup that has built a container-native distributed database for large and real-time datasets. We're proud of the thriving ecosystem of third-party datacenter services being built around Mesos and the Mesosphere DCOS, and we wanted to give Crate an opportunity to share their new product with our audience.
Crate is a distributed database designed to work seamlessly in containerized environments. Due to its shared-nothing architecture it's a perfect fit as a persistence layer for clustered applications. Crate gives you the resilient database cluster with automatic sharding and replication without forcing you to give up the simplicity of SQL as a query language. You will also benefit from the capability of automatic healing if nodes should fail.
Using Mesosphere's Marathon to manage Crate containers makes it easy to deploy a containerized database cluster very efficiently and allocate database resources quickly depending on your demands, and therefore simplifies the management of elastic applications.
In this post we'll quickly to through the necessary steps to set up a scalable Crate cluster. We'll skip over some of the basics, so you should take a look at the Mesosphere documentation and the Crate Docker installation guide first if you aren't familiar with those.
Ok, let's start!
Initial Mesos/Marathon Setup
We'll set up a Mesos cluster on Google Compute Engine with one single master and three slaves following the Mesosphere Install Guide.
> gcloud compute instances create mesos-master \      --image centos-7 --zone us-central1-b --network mesos
> gcloud compute instances create mesos-slave-{1..3} \      --image centos-7 --zone us-central1-b --network mesos
The mesos network has firewall rules applied that prevent un-authorized access to the ports that will be exposed to the pubic otherwise! Crate uses the ports 4200 (HTTP) and 4300 (internal transport) per default.
Node Configuration
Instances within the same network are able to see each other by their hostname, meaning that you don't need to know the internal IP for the Zookeeper configuration, but only the hostname of the master that we chose when we started the node.
As documented, we also need to change the configuration of the mesos-slave service, more specifically the containerizers and the registration timeout.
$ echo 'docker,mesos' > /etc/mesos-slave/containerizers$ echo '10mins' > /etc/mesos-slave/executor_registration_timeout$ echo 'ports(*):[31000-31099, 31101-32000, 4200-4399]' > /etc/mesos-slave/resources
If the mesos-slave service refuses to start, you'd probably need to delete the temporary slave recovery file.
$ sudo rm -f /tmp/mesos/meta/slaves/latest
It is recommended to set the timeout to 10mins, because the first docker run needs to pull the images and that may take a while ;)
Now that everything is ready, we can actually start with the Crate specific setup.
Launching Crate with Marathon
Launching a Crate cluster with Marathon is very simple. All you need to do is defining a few settings in a Marathon configuration file.
We're going to look at possible cluster setups, one with unicast discovery and a second one with multicast discovery (utilizing Weave). These are currently the 2 possible ways how multiple nodes of Crate can find each other and form a cluster.
In both cases Mesos-DNS is a neat way to easily let your application "find" the Crate nodes by a human readable domain name.
Using Unicast Discovery
The easiest way to get started with a Crate cluster using Marathon is to use unicast discovery.
On the mesos-master instance create a Marathon config file and call it for example CrateUnicast.json. This will be submitted to Marathon later to start the Crate instances.
{  "id": "crate-mesos-unicast",  "container": {    "type": "DOCKER",    "docker": {      "image": "crate/crate:0.47.3",      "privileged": true,      "network": "HOST",      "parameters": []    },    "volumes": [      {        "containerPath": "/tmp",        "hostPath": "/tmp",        "mode": "RO"      },      {        "containerPath": "/data",        "hostPath": "/data/crate",        "mode": "RW"      }    ]  },  "instances": 3,  "cpus": 1,  "mem": 2014,  "uris": [],  "cmd": "crate -Des.zen.discovery.minimum_master_nodes=2",  "env": {    "CRATE_HOSTS": "mesos-slave-1:4300,mesos-slave-2:4300,mesos-slave-3:4300",    "CRATE_HEAP_SIZE": "1024m"  },  "constraints": [    [      "hostname",      "UNIQUE"    ]  ],  "ports": []}
We set the key network for the container to HOST, so Mesos automatically binds Crate's internal ports (4200/4300) to the host network.
Additionally we specify the environment variables CRATE_HOSTS (all available slaves at port 4300) and CRATE_HEAP_SIZE (half of the host's available memory).
The constraint ["hostname", "UNIQUE"] guarantees that only one single Crate container is running on a slave.
In total we'll have 3 nodes. In order to have an unambiguous quorum we need to provide the minimum_master_nodes in the run command (cmd key) and set it to 2 (=num_nodes/2+1, see Multi Node Setup).
For the volumes, we can either create a re-usable data directory - in our case /data/crate - on the root disk (e.g. if you have an SSD root disk) or mount a separate disk. Then just adopt the path accordingly.
Now let's submit this config file to Marathon, which is running on port 8080.
$ curl -s -XPOST http://localhost:8080/v2/apps -d@CrateUnicast.json -H "Content-Type: application/json"
In the Marathon web UI (which runs on port 8080) you can see that the app is popping up. Once the instances are running, we can verify the cluster with a curl command:
$ curl -s -XPOST mesos-slave-1:4200/_sql?pretty -d '{    "stmt":"select name, hostname from sys.nodes"  }'
The cluster is of course reachable on every single slave:
$ curl -s -XPOST mesos-slave-2:4200/_sql?pretty -d '{    "stmt":"select name, hostname from sys.nodes"  }'$ curl -s -XPOST mesos-slave-3:4200/_sql?pretty -d '{    "stmt":"select name, hostname from sys.nodes"  }'
So far, it seems that there haven't been that many advantages over a conventional docker deployment of Crate - except the overhead of setting up the Mesos cluster. However, Marathon is much more than just a deployment tool, it is a "cluster-wide init and control system for services or Docker containers". That means, that you can easily start, stop and manage application clusters by just writing the app configuration and submitting it to Marathon.
To test Marathon's capability to retain the application cluster contraints that we've specified in the CrateUnicast.json file, you can manually stop a docker container and watch the web UI restarting the lost instance. If you have access to the Crate Admin UI (MESOS_SLAVE_IP:4200/admin/) you should import a few thousand tweets using the Get Started Tutorial and set the number of replicas for the tweets table to 1-2 first. Then you're able to see Crate's "self-healing" capacity.
Note: It is also possible to scale your Crate cluster using the Marathon web UI, but you need to be careful, because the minimum_master_nodes setting is applied to each newly created Crate node.
Using Multicast Discovery
We've already written a blog post on how to utilize Weave for multicast discovery for Crate.
This time we'll go a little further and adopt the concept of automatic IP assignment of Docker containers laid out in the great blog post Adventures with Weave and Docker by Stefan Schimanski.
The idea is that Weave creates a virtual network and each host defines its own subnet, in which docker can assign IPs automatically to containers.
So we're going to install Weave on the slaves, create a bridge and tell the docker daemon to use a fixed IP range for its containers.
mesos-master  bridge cidr  container cidr
mesos-slave-1  bridge cidr  container cidr
mesos-slave-2  bridge cidr  container cidr
mesos-slave-3  bridge cidr  container cidr
This will allow us to run 255 Mesos slaves and 255 containers on each slave. That should be enought for now, but if you want you can use wider IP ranges.
In order to do multicast discovery we need to first install Weave on the Mesos master and all the nodes:
$ sudo yum install bridge-utils -y$ sudo curl --silent --location --output /usr/sbin/weave$ sudo chmod +x /usr/sbin/weave
If you haven't installed Docker on mesos-master you'd need to do that now, because Weave depends on it. Additionally, we need to tell the Docker to used a fixed IP range for the containers. This can be achieved by adding --bridge and --fixed-cidr to the launch options in /etc/sysconfig/docker.
# /etc/sysconfig/dockerOPTIONS=--selinux-enabled -H fd:// --bridge=weave --fixed-cidr= DOCKER_TMPDIR=/var/tmp
Create the weave bridge and expose it to the host:
$ sudo weave create-bridge$ sudo weave expose
(Re)start the Docker service so it uses the new launch options and start the weave network:
$ sudo service docker restart$ sudo weave launch
For the slaves it is basically the same procedure, you can use this bash script:
#!/bin/bash -xsudo yum install bridge-utils -ysudo curl --silent --location --output /usr/sbin/weave chmod +x /usr/sbin/weave# $1 .. unique id of the slave (1..254)HOST_ID="$1"BRIDGE_CIDR="192.168.0.${HOST_ID}/16"NETWORK_CIDR="192.168.${HOST_ID}.0/24"echo "# /etc/sysconfig/docker# Modify these options if you want to change the way the docker daemon runsOPTIONS=--selinux-enabled -H fd:// --bridge=weave --fixed-cidr=${NETWORK_CIDR}" | sudo tee /etc/sysconfig/docker > /dev/nullsudo weave create-bridgesudo weave expose ${BRIDGE_CIDR}sudo service docker restart# we connect to the already existing network on mesos-mastersudo weave launch mesos-master 
Execute the script with the slave id, 1 for mesos-slave-1, and so on, on each node.
$ ./ 1
Verify the network status:
$ sudo weave status
weave router 0.9.0Encryption offOur name is 7a:13:3b:07:3d:61Sniffing traffic on &{8 65535 ethwe ba:11:3a:61:d6:31 up|broadcast|multicast}MACs:02:42:c0:a8:03:01 -> 7a:ff:6c:20:7f:78 (2015-02-12 17:30:46.714657091 +0000 UTC)ba:11:3a:61:d6:31 -> 7a:13:3b:07:3d:61 (2015-02-12 17:22:53.45073163 +0000 UTC)02:42:c0:a8:ff:01 -> 7a:13:3b:07:3d:61 (2015-02-12 17:22:59.07080863 +0000 UTC)7a:13:3b:07:3d:61 -> 7a:13:3b:07:3d:61 (2015-02-12 17:30:08.585423533 +0000 UTC)02:42:c0:a8:01:01 -> 7a:74:e5:fe:bf:fe (2015-02-12 17:30:08.759828667 +0000 UTC)02:42:c0:a8:02:01 -> 7a:0c:69:e3:13:7c (2015-02-12 17:30:08.585252474 +0000 UTC)Peers:Peer 7a:0c:69:e3:13:7c (v2) (UID 1084274921423039471)   -> 7a:13:3b:07:3d:61 []Peer 7a:ff:6c:20:7f:78 (v2) (UID 3372143863024318277)   -> 7a:13:3b:07:3d:61 []Peer 7a:13:3b:07:3d:61 (v6) (UID 1676291112665720550)   -> 7a:74:e5:fe:bf:fe []   -> 7a:0c:69:e3:13:7c []   -> 7a:ff:6c:20:7f:78 []Peer 7a:74:e5:fe:bf:fe (v2) (UID 5496559290939719523)   -> 7a:13:3b:07:3d:61 []Routes:unicast:7a:13:3b:07:3d:61 -> 00:00:00:00:00:007a:74:e5:fe:bf:fe -> 7a:74:e5:fe:bf:fe7a:0c:69:e3:13:7c -> 7a:0c:69:e3:13:7c7a:ff:6c:20:7f:78 -> 7a:ff:6c:20:7f:78broadcast:7a:ff:6c:20:7f:78 -> [7a:74:e5:fe:bf:fe 7a:0c:69:e3:13:7c]7a:13:3b:07:3d:61 -> [7a:74:e5:fe:bf:fe 7a:0c:69:e3:13:7c 7a:ff:6c:20:7f:78]7a:74:e5:fe:bf:fe -> [7a:0c:69:e3:13:7c 7a:ff:6c:20:7f:78]7a:0c:69:e3:13:7c -> [7a:74:e5:fe:bf:fe 7a:ff:6c:20:7f:78]Reconnects:
Now we can launch the cluster with multicast discovery. The app specification (make a new file names CrateWeave.json) becomes even simpler, because the hosts for unicast discovery do not need to be specified.
{  "id": "crate-mesos-weave",  "container": {    "type": "DOCKER",    "docker": {      "image": "crate/crate:0.47.3",      "privileged": true,      "network": "BRIDGE",      "portMappings": [        { "containerPort": 4200, "hostPort": 4200, "protocol": "tcp" },        { "containerPort": 4300, "hostPort": 4300, "protocol": "tcp" }      ],      "parameters": [      ]    }  },  "instances": 3,  "cpus": 0.2,  "mem": 2014,  "uris": [],  "cmd": "crate",  "env": {    "CRATE_HEAP_SIZE": "1024m"  },  "constraints": [    ["hostname", "UNIQUE"]  ],  "ports": []}
Note that in this case we define a port mapping and using the BRIDGE network. We also need to tell the crate binary to use _local_ for the publish_host which will be resolved to the local IP address of the container.
Then submit it to Marathon:
$ curl -s -XPOST http://localhost:8080/v2/apps -d@CrateWeave.json -H "Content-Type: application/json"
The nodes of the Crate cluster will discover each through multicast which is enabled by the Weave network.
And again, the cluster can be verified:
$ curl -s -XPOST mesos-slave-1:4200/_sql?pretty -d '{"stmt":"select id, name, hostname from sys.nodes"}'{  "cols" : [ "id", "name", "hostname" ],  "duration" : 12,  "rows" : [ [ "e3r05rFVTaebqdTIidoLCQ", "Lupo", "b60ad27ae040" ], [ "lXVvEVcYS5GqHY6Y1R7omA", "Valinor", "f3f1e64f843e" ], [ "Yn1M2L1BS9KDMFvBZVw4tA", "Gardener", "5d169e51b2c4" ] ],  "rowcount" : 3}
Using Mesos DNS to Find Nodes
Mesos-DNS supports service discovery in Apache Mesos clusters. It allows applications and services running on Mesos to find each other through the domain name system (DNS), similarly to how services discover each other throughout the Internet.
Unfortunately Crate does not have node discovery via DNS (yet), but Mesos-DNS really makes it easy to find the individual nodes of the existing Crate cluster. This is especially useful when connecting with a client application.
The Mesos-DNS service is written in Go, which needs to be installed first. We will install the service on mesos-master, but it could be also installed on any Mesos node, or even on a dedicated host. It also requires git, if not already installed.
Configure The Master
Mesosphere again provides a very good installation guide, so we can skip it here and proceed with the configuration file.
{  "masters": ["mesos-master:5050"],  "refreshSeconds": 30,  "ttl": 60,  "domain": "crate",  "port": 53,  "resolvers": ["...", ""],  "timeout": 5,  "email": "root.mesos-dns.crate"}
Since we have only a single Mesos master and the DNS serive is running on the same host, you could also set masters to ["localhost:5050"]. If you have more masters you'd need to extend the list accordingly. You should also add the nameserver IPs from /etc/resolv.conf and append them before Google's nameserver in the resolvers attribute.
Now start the service in the background:
$ sudo mesos-dns -config=config.json &
For debugging purposes it makes sense to start the process in the foreground first and use the -v=true option to obtains debugging information:
$ sudo mesos-dns -config=config.json -v=true
Configure The Slaves
Add nameserver <IP_MESOS_MASTER> as first line to /etc/resolv.conf so it looks similar like this:
# Generated by NetworkManagernameserver c.crate-gce.internal. google.internal.nameserver
Mesos tasks will be exposed via DNS using the task.framework.domain notation. Our task is named crate-mesos-unicast, the framework we're using is marathon and as DNS domain we specified crate, so the full qualified name is crate-mesos-unicast.marathon.crate.
It can be tested using dig (requires bind-utils to be installed):
$ dig crate-mesos-unicast.marathon.crate A | grep 'IN A'
crate-mesos-unicast.marathon.crate. 60  IN  A 60  IN  A 60  IN  A
We can now access the Crate cluster from any slave host with its full qualified domain name.
$ curl -s -XPOST crate-mesos.marathon.crate:4200/_sql?pretty -d '{    "stmt":"select name, hostname from sys.nodes"  }'
{  "cols" : [ "name", "hostname" ],  "duration" : 307,  "rows" : [ [ "Cadaver", "mesos-slave-1" ], [ "Svarog", "mesos-slave-2" ], [ "Valkyrie", "mesos-slave-3" ] ],  "rowcount" : 3}
Marathon makes it really easy to deploy a resilient Crate cluster thanks to it's capability to retain the application cluster constraints by automatically healing any failures. Together with Crate's replication it creates a very durable database backend for large distributed applications.
In addition to that, Mesos-DNS allows applications and clients to easily find and connect to nodes of the Crate cluster.
Update (March 3rd, 2015):
In a previous version of the post we used weave expose <CIDR> to expose the bridge IP to the host. However, this issue on Github points out that weave expose can cause routing loops, because it "will add additional iptables masquerading rules which can cause the tunneled data packets to be routed again via the weave interface." - Thanks Ilya for letting us know!
Create the weave bridge and ~~expose it to the host~~ assign the IP address directly to the network device:
$ sudo weave create-bridge$ sudo ip addr add dev weave
For the slaves it is basically the same procedure, you can use this bash script:
#!/bin/bash -xsudo yum install bridge-utils -ysudo curl --silent --location --output /usr/sbin/weave chmod +x /usr/sbin/weave# $1 .. unique id of the slave (1..254)HOST_ID="$1"BRIDGE_CIDR="192.168.0.${HOST_ID}/16"NETWORK_CIDR="192.168.${HOST_ID}.0/24"echo "# /etc/sysconfig/docker# Modify these options if you want to change the way the docker daemon runsOPTIONS=--selinux-enabled -H fd:// --bridge=weave --fixed-cidr=${NETWORK_CIDR}" | sudo tee /etc/sysconfig/docker > /dev/nullsudo weave create-bridgesudo ip addr add dev weave ${BRIDGE_CIDR}sudo service docker restart# we connect to the already existing network on mesos-mastersudo weave launch mesos-master 
^ change also the 4th to last line to reflect the update

Ready to get started?