Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
In a cluster, Cassandra nodes exchange information about one another using a mechanism called Gossip. The nodes in a cluster needs to know one another. Nodes named “seed”s are the centre of this communication mechanism. It’s customary to pick a small number of relatively stable nodes to serve as your seeds. Do make sure that each seed also knows of at least one other. Having two nodes is what is preferred.
Lets have a look at how we can bring a Cassandra cluster up with Cassandra 7.x on ubuntu 10.04
First of all you have to install the java/jdk . As that is out of scope for our discussion please do it on your own and let’s start with cassandra.
Add the following repositories to your apt sources list
[bash]deb http://www.apache.org/dist/cassandra/debian 07x main
deb-src http://www.apache.org/dist/cassandra/debian 07x main[/bash]
Import the following keys and add it to apt-key
gpg –keyserver keyserver.ubuntu.com –recv-keys 4BD736A82B5C1B00
gpg –export –armor 4BD736A82B5C1B00 | sudo apt-key add –
gpg –keyserver keyserver.ubuntu.com –recv-keys F758CE318D77295D
gpg –export –armor F758CE318D77295D | sudo apt-key add –
and make sure that no error is there with accessing the packages.
Installing cassandra on all nodes(machines) with which we intend to build the cluster.
[bash]apt-get install cassandra –yes[/bash]
Now edit the configuration file for Cassandra
Here I will discuss the important directives that has to be edited for the cluster to take effect
eg: initial_token: 136112946768375385385349842972707284582
This parameter determines the position of each node in the Cassandra ring. Initial token for the first seed node should be ‘0’.Here is a simple Python script that helps to calculate the token values.
if (len(sys.argv) > 1):
num=int(raw_input(“How many nodes are in your cluster? “))
for i in range(0, num):
print ‘node %d: %d’ % (i, (i*(2**127)/num))
executing this script will prompt you for the no. of nodes in your cluster. Then it will output the initial tokens for each node.
For eg: Consider a 2 node cluster, the tokens will be
node 0: 0
node 1: 85070591730234615865843651857942052864
You can set this to false as we are just going to start the cluster for the first time.
-< ip address >
As I told you earlier, the seeds mentioned here will control the communication between the nodes.
You can give the ips of the two nodes here for which you assigned the first two initial tokens generated by the script above.
This seed entries should be the same on all nodes of the cluster.
You can leave both empty.
Starting the Cassandra
For starting Cassandra you can either use an init script/ or the command “cassandra”. Here I will use the second option.
As Cassandra service was started during the installation some values will be stored in /var/lib/cassandra/data directory. So Before starting Cassandra follow these steps.
1) /etc/init.d/cassandra stop
2) rm –rf /var/lib/cassandra/data
3) mkdir /var/lib/cassandra/data
After doing these steps on all the nodes please run the following command to start Cassandra on each node starting from the seed node 1
[bash]# cassandra &[/bash]
After starting Cassandra on all the nodes you can check the cluster status using the following command
[bash]nodetool -h <ip of the node > -p 8080 ring[/bash]
[bash]nodetool -h localhost -p 8080 ring[/bash]