Cassandra on DC/OS Tutorial
Learn how to configure, deploy, and start using the Apache Cassandra DC/OS Certified Service.
Cassandra is an open-source distributed data store that is highly available, decentralized, fault-tolerant, and easily scaled. It is a popular data store for big data pipelines, specifically the SMACK stack. The Cassandra package was designed to work optimally with DC/OS and includes many features. Additionally, you will find the other SMACK stack technologies as certified packages in DC/OS. This further simplifies the installation of a big data pipeline.
Now that you have installed a test cluster, it is time to start exploring the capabilities of DC/OS! But where should you begin? We recommend installing one of Mesosphere's certified data services! If you have not read our introduction to certified packages read this post. Let's begin by installing Cassandra as our first certified package.
In this post we would like to help you quickly install a Cassandra cluster, or two, in a development environment. We will provide guidance for interacting with the cluster and cluster management.
DC/OS Environment Setup for Cassandra
For a small Cassandra installation, with all the defaults unchanged, you will need three private DC/OS agents. By default, the Casandra package specifies that there can only be one Cassandra node per DC/OS agent. One agent will need a minimum of 1.5 CPU and 5 GB of memory and three agents will each need a minimum of .5 CPU and 4GB of memory. The scheduler itself also requires resources, which is why one node needs proportionally more resources.
If you are only installing three nodes to start and later decide that you need more resources later, no worries. It is simple to scale out your Cassandra cluster with the DC/OS CLI.
Finally, this guide is based on the most current version of DC/OS.
Cassandra can be installed through the DC/OS Dashboard or with the DC/OS CLI. The only notable difference is that installing via the DC/OS Dashboard requires the additional step of installing the dcos cassandra subcommands, whereas they are automatically installed when installing via the DC/OS CLI. We will first walk you through installing Cassandra, with all the defaults, with the DC/OS Dashboard and then with the DC/OS CLI.
In the DC/OS Dashboard navigate to the Catalog page and search for the Cassandra package, it will be at the top. Click on the Cassandra tile to begin the installation process. From the installation page you will see a purple button that says "Review & Run". To install Cassandra simply click the "Review & Run" button, then follow all the prompts. You can monitor the deployment process on the "Services" page in Dashboard. To install the dcos cassandra subcommands navigate to the server with the DC/OS CLI installed, enter
dcos package install --cli cassandra.
Alternatively, you can install Cassandra with the DC/OS CLI. Navigate to the server with DC/OS CLI configured, then simply type
dcos package install cassandra. You can now use the dcos cassandra subcommands to monitor the process,
dcos cassandra plan show deploy.
Interacting with your Cassandra Cluster
In this section, we will review a couple of dcos cassandra subcommands that can be used to discover information about the Cassandra cluster. They are describe and endpoints.
The dcos cassandra describe command returns configuration information about the service, including Cassandra settings, node information, and service information. The output will be in JSON and should resemble the following:
The dcos cassandra endpoints node command returns networking information about your cluster. For each cluster node it will return the IP Address and DNS information. It will also return the VIP load balancing information for the service. The output should resemble the following:
For example, you can use the DNS information from any node with a Cassandra Docker container to connect to the cluster.
On the master node, run the following to instantiate a container with a cqlsh,
docker run -it cassandra:3.0.7 cqlsh node-0-server.cassandra.autoip.dcos.thisdcos.directory.
You can now view existing data or create tables to store data with the Cassandra query language.
Multiple Cassandra Clusters
In this section, we provide a brief overview on installing multiple Cassandra clusters on the same DC/OS cluster.
When installing multiple clusters there is one firm requirement enforced by DC/OS, every package must have a unique name. If you want to have multiple clusters you will need to name them differently. For example, Cassandra, Cass, CassDev, CassTest, CassProd, etc.
The easiest way to install multiple clusters is to use DC/OS Virtual Networks. Virtual Networks will prevent any port conflicts. However, they are not configured with the default installation settings. To configure them, simply check the `VIRTUAL_NETWORK_ENABLED` box in the service section of the package definition. Go ahead and install a second cluster named "cass" with the VIRTUAL_NETWORK_ENABLED` box checked.
You can still install multiple Cassandra clusters without the DC/OS virtual networks. However you will need to review all the default networking settings manually.
Cassandra Cluster Management
In this final section, we review four dcos cassandra subcommands for cluster management and the uninstall process. All of the following processes must be completed on the server with the DC/OS CLI configured.
One common management task is adding cluster nodes. To scale out a Cassandra cluster in DC/OS 1.10 you must use the dcos cassandra subcommand update start with a configuration file,
dcos cassandra update start --options=config.json. In the configuration file, under the nodes section, increase the count. If you need the default configuration file it can be downloaded from the Cassandra installation page, "DOWNLOAD CONFIG.JSON".
Configuration changes are also initiated with the dcos cassandra subcommand update start. Once the JSON configuration file is updated issue,
dcos cassandra update start --options=config.json. A few setting you may want to update are; log_level, backup_restore_strategy, placement_constraint, concurrent_writes, and concurrent_reads.
Another common management task is upgrading the service. The update start subcommand is also useful for this task. Update the package version with
dcos cassandra update start --package-version="". This will initiate a rolling upgrade of your Cassandra nodes. To learn the available versions run dcos cassandra update package-versions.
Backup and Restore
Finally, you can use the dcos cassandra subcommands to backup and restore your data. The backup can be initiated with
dcos cassandra plan start backupand the restore with
dcos cassandra plan start restore. Before initiating a backup you will need to configure the storage container in AWS or Azure. Read the disaster recovery documentation to learn more.
Uninstalling the Cassandra Cluster
Finally, you many need to uninstall your test cluster. Uninstalling a Cassandra cluster is a straightforward process. Use the
dcos package uninstall cassandra --app-id=/cassandrato uninstall the default package. In the DC/OS Dashboard, the package should no longer be listed on the "Services" page.
To ensure all the resources are released and the package is fully cleaned up, we will need to run the Zookeeper cleanup container for Cassandra. This will also clean out the Zookeeper
settings. On your master node run
sudo docker run mesosphere/janitor /janitor.py -r cassandra-role -p cassandra-principal -z dcos-service-cassandra.
Now that we have installed a Cassandra cluster and learned some basic management tasks, you should be able to confidently use the DC/OS Catalog and associated packages. Next, we recommend that you keep testing Cassandra or install another data service. You can also choose to build a data pipeline on top of your freshly installed Cassandra service.