Tech Blog
Creating a Cassandra Database Cluster for Local Development
Creating a Cassandra database cluster for local development
Background
Cassandra is an open source NoSQL database in large-scale production use at companies such as Apple, Netflix and eBay. One of the main advantages of Cassandra is that it can be run as a multiple node cluster in which data is replicated across several nodes. The system is therefore highly resilient, where nodes can be added to and removed from the cluster without downtime or data loss.
In many cases, it is useful to experiment and test your code before deploying to a company wide development environment (if you have one). This blog post describes how to set up a local 2-node Cassandra cluster on your own machine using VirtualBox.
Creating the first node
If you don’t already have it, start by downloading VirtualBox from the downloads page.
To create the first virtual machine (VM), open up VirtualBox and click the “new” button on the menu bar. You should see a box similar to the one below, fill in the fields for the VM you want to create.
Go through the various configuration options:
- 2GB should be sufficient for RAM
“Create a virtual hard disk now”
- VDI
- Dynamically allocated
Name the hard-disk (default of the vm name and)
8GB should be sufficient for size
You should then be left with a new “powered off” virtual machine
Install the operating system
You will also need to download an ISO for a GNU/Linux operating system to be run inside the virtual machine. Lubuntu is often a good choice as being relatively lightweight but with enough out-of-the-box functionality to be used immediately. Lubuntu can be downloaded here.
Double click on the newly created VM; you will be presented with a box to select which operating system to install. Using the folder icon next to the drop-down menu, navigate to and select the ISO you have downloaded.
Select install Lubuntu (or whichever OS you have selected) and follow the instructions. It is safe to “Erase disk and install Lubuntu” as this only happens within the VM, not on your computer.
Remember the username and password you select as we will need these to log in later.
Once the installation has finished and the VM rebooted, we should have a working GNU/Linux system.
Installing and configuring Cassandra
Log into the VM and open up the terminal by clicking ‘Start’ → ‘System Tools’ → ‘LXTerminal’. Perform the following steps to install Cassandra:
Update package lists to latest:
sudo apt get update
Install Java 8 (required for Cassandra):
sudo apt-get install openjdk-8-jdk
Install Cassandra (docs):
echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA
sudo apt-get update
sudo apt-get install cassandra
Copying the VirtualBox image
We should now have a working Cassandra installation on the virtual machine. To create more nodes in the cluster, power down the machine we have just created, right click on the name and select clone.
Select a new name for the VM and perform a “full clone”.
Configuring networking
For each of the VMs, a network must be created to allow communication both between the VMs and the host computer. To do this, click ‘Settings → Network → Adapter 2 → Enable Network Adapter’ and select ‘Host-only adapter’ from the ‘Attached to’ dropdown. Perform this operation for both the VMs.
Configuring Cassandra
To configure Cassandra, the following steps are required on each of the VMs.
- Log into the VM
- Get the IP address assigned to the VM
ip addr show | grep 192
, take a note of the IP address just after ‘inet’- I have
cassandra_demo1: 192.168.99.100
andcassandra_demo2: 192.168.99.101
- Open the Cassandra configuration file
- sudo nano /etc/cassandra/cassandra.yaml
- Update
listen_address
to the IP address of the VM obtained in the previous step - Update
rpc_address
to the IP address of the VM obtained in the previous step - Update
seeds
with the IP address of both VMs, comma separated
- Save and close the file
As we cloned the VM, the Cassandra nodes will appear identical on startup and cause errors in the ring joining process. To remove these, do the following on the second VM:
sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
Starting up Cassandra
Start up Cassandra on the first node with sudo systemctl restart cassandra
and follow the logs with tail -f /var/log/cassandra/system.log
.
There should be a message JOINING: Finished joining ring
to indicate success.
Do the same on the second node, again, a message with JOINING: Finished joining ring
should appear in the logs. You can quickly test for this by using the command grep JOINING /var/log/cassandra/system.log
To confirm both nodes are in the ring, you should see both IP addresses present when
nodetool status
is issued.
Accessing the cluster from the host machine
To access the cluster from the host machine, you can use either of the IP addresses of the nodes. For example,
cqlsh 192.168.99.100
Conclusion
This blog post has outlined how to get a Cassandra cluster up and running for local development. If you have any comments about the post or would like to know more about Cassandra, please contact us.