• Call: +1 (858) 429-9131

Posts Tagged ‘Cloud computing’

Cassandra Cluster on AWS EC2 with Cassandra 7.x and ubuntu 10.04

Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together  Dynamo’s fully distributed design  and Bigtable’s ColumnFamily-based data model.

In a cluster, Cassandra nodes exchange information about one another using a mechanism called Gossip. The nodes in a cluster needs to know one another.  Nodes named “seed”s are the centre of this communication mechanism. It’s customary to pick a small number of relatively stable nodes to serve as your seeds. Do make sure that each seed also knows of at least one other. Having two nodes is what is preferred.

Lets have a look at how we can bring a Cassandra cluster up with Cassandra 7.x on ubuntu 10.04

First of all you have to install the java/jdk .  As that is out of scope for our discussion please do it on your own and let’s start with cassandra.

Add the following repositories to your apt sources list

vim /etc/apt/sources.list.d/cassandra.list

[bash]deb http://www.apache.org/dist/cassandra/debian 07x main
deb-src http://www.apache.org/dist/cassandra/debian 07x main[/bash]

Import the following keys and add it to apt-key

[bash]

gpg –keyserver keyserver.ubuntu.com –recv-keys 4BD736A82B5C1B00

gpg –export –armor 4BD736A82B5C1B00 | sudo apt-key add –

gpg –keyserver keyserver.ubuntu.com –recv-keys F758CE318D77295D

gpg –export –armor F758CE318D77295D | sudo apt-key add –

[/bash]

Execute

[bash]apt-get update[/bash]

and make sure that no error is there with accessing the packages.

Installing cassandra on all nodes(machines) with  which we intend to build the cluster.

[bash]apt-get install cassandra  –yes[/bash]

Now edit the configuration file for Cassandra

vim /etc/cassandra/cassandra.yaml

Here  I will discuss the important directives that has to be edited for the cluster to take effect

initial_token:

eg:  initial_token:  136112946768375385385349842972707284582

This parameter determines the position of each node in the Cassandra ring. Initial token for the first seed node should be ‘0’.Here is a simple Python script that helps to calculate the token values.

[bash]

#! /usr/bin/python

import sys

if (len(sys.argv) > 1):

num=int(sys.argv[1])

else:

num=int(raw_input(“How many nodes are in your cluster? “))

for i in range(0, num):

print ‘node %d: %d’ % (i, (i*(2**127)/num))

[/bash]

executing this script will prompt you for the no. of nodes in your cluster. Then it will output the initial tokens for each node.

For eg: Consider a 2 node cluster, the tokens will be

node 0: 0

node 1: 85070591730234615865843651857942052864

auto_bootstrap: false

You can set this to false as we are just going to start the cluster for the first time.

seeds:

-< ip address >

As I told you earlier, the seeds mentioned here will control the communication between the nodes.

You can give the ips of the two nodes here  for which you assigned the first two initial tokens generated by the script above.

Example:

Seeds:

-192.168.1.10

-192.168.1.13

This seed entries should be the same on all nodes of the cluster.

listen_address:

&

rpc_address:

You can leave both empty.

Starting  the Cassandra

For starting Cassandra you can either use an init script/ or the command “cassandra”. Here I will use the second option.

As Cassandra service was started during the installation some values will be stored in /var/lib/cassandra/data directory. So Before starting Cassandra follow these steps.

[bash]

1)      /etc/init.d/cassandra stop

2)      rm –rf  /var/lib/cassandra/data

3)      mkdir /var/lib/cassandra/data

[/bash]

After doing these steps on all the nodes please run the following  command to start Cassandra on each node starting from the seed node 1

[bash]# cassandra &[/bash]

After starting Cassandra on all the nodes you can check the cluster status using the following command

[bash]nodetool -h <ip of the node >  -p 8080 ring[/bash]

or

[bash]nodetool -h localhost -p 8080 ring[/bash]

Achieving HIPAA on AWS / EC2 with Windows Server 2008

When you are creating a HIPAA compliant system on cloud service like AWS / EC2 / S3, you have to carefully examine the different levels of data security provided by the Cloud Service provider

At a minimum level, the following should be ascertained:

i) Where is the Cloud provider’s data center physically located. In some countries, HIPAA restricts Protected Health Information ( PHI ) to be stored on servers located outside of the country.

ii) Whether the cloud provider contractually obligated to protect the customer’s data at the same level as the customer’s own internal policies?

iii) Cloud provider’s Backup and Recovery policies

iv) What are the provider’s policies on data handling/management and access control? Do adequate controls exist to prevent impermissible copying or removal of customer data by the provider, or by unauthorized employees of the company?

v) What happens to data when it is deleted? This is very important as customers will be storing data on virtual Machines. Also What happens to cloud hardware when the hardware is replaced?

In this blog we are only looking at the different security levels to be taken by the application developer to make sure that a web application built on AWS / EC2 using Windows Server 2008 / .NET / MSSQL / IIS 7 / is HIPAA compliant. The basic requirement is to encrypt all the data at rest and transit

1. Encrypting Data in transit between the user ( clients ) and the server ( Webserver )

SSL over HTTP ( HTTPS )

Steps used to Implement SSL on IIS are the following:

[bash]
1.Open IIS Manager.
2.Click on the server name.
3.Double-click the “Server Certificates” button in the “Security” section
4.Click on self-signed certificate
5.Enter certificate name and click ok
6. Select the name of the server to which the certificate was installed.

7. From the “Actions” menu (on the right), click on “Bindings.” This will open the “Site Bindings” window

8. In the “Site Bindings” window, click “Add” This will open the “Add Site Binding” window

9. Under “Type” choose https. The IP address should be the IP address of the site , and the port over which traffic will be secured by SSL is usually 443. The “SSL Certificate” field should specify the certificate that was installed in step 5.

10.Click “OK.” . SSL is now installed .
[/bash]

2 ) Encrypting Data at Rest ( Document Root )

EFS with IIS

You can use EFS ( Encrypted File System ) in Windows 2008 Server to automatically encrypt your data when it is stored on the hard disk.

Encrypt a Folder:

[bash]
1. Open Windows Explorer.
2. Right-click the folder that you want to encrypt , and then click Properties.
3. On the General tab, click Advanced.
4. Under Compress or Encrypt attributes, select the Encrypt contents to secure data check box and then click OK.
5. Click OK.
6. In the Confirm Attribute Changes dialog box that appears, use one of the following steps:
i) If you want to encrypt only the folder, click Apply changes to this folder only, and then click OK.
ii) If you want to encrypt the existing folder contents along with the folder, click Apply changes to this folder, subfolders and files, and then click OK.
[/bash]

The folder becomes an encrypted folder. New files that you create in this folder are automatically encrypted


3 ) Encrypting MSSQL Database ( Data at Rest )

TDE ( Transparent Data Encryption )

TDE is a new feature inbuilt in MSSQL Server 2008 Enterprise Edition . Data is encrypted before it is written to disk; data is decrypted when it is read from disk. The “transparent” aspect of TDE is that the encryption is performed by the database engine and SQL Server clients are completely unaware of it. There is absolutely no code that needs to be written to perform the encryption and decryption .So there is no need for changing any code ( Database Queries ) in the Application .

STEPS

i) Create a Master Key

A master key is a symmetric key that is used to create certificates and asymmetric keys. Execute the following script to create a master key:

[bash]
USE master;
CREATE MASTER KEY
ENCRYPTION BY PASSWORD = ‘Pass@word1’;
GO
[/bash]

ii)Create Certificate

Certificates can be used to create symmetric keys for data encryption or to encrypt the data directly. Execute the following script to create a certificate:

[bash]
CREATE CERTIFICATE TDECert
WITH SUBJECT = ‘TDE Certificate’
GO
[/bash]

iii) Create a Database Encryption Key and Protect it by the Certificate

[bash]
1.Go to object explorer in the left pane of the MSSQL SERVER Management Studio
2.Right Click on the database on which TDE Requires
3.Click Tasks and Navigate to Manage Database Encryption
4. Select the encrytion algorithm (AES 128/192/256) and select the certificate you have created
5.Then Mark the check Box for Set Database Encryption On
[/bash]

You can query the is_encrypted column in sys.databases to determine whether TDE is enabled for a particular database.

[bash]
SELECT [name], is_encrypted FROM sys.databases
GO
[/bash]


4 ) Encrypting Data in transit between the Webserver and the MSSQL Database

MSSQL secure connection using SSL

i) Creating a self-singned cert using makecert
[bash]
makecert -r -pe -n “CN=YOUR_SERVER_FQDN” -b 01/01/2000 -e 01/01/2036 -eku 1.3.6.1.5.5.7.3.1 -ss my -sr localMachine -sky exchange -sp “Microsoft RSA SChannel Cryptographic Provider” -sy 12 c:\test.cer
[/bash]

ii) Install this cert

[bash]
Copy c:\test.cer into your client machine, run c:\test.cer from command window, select “Install Certificate”. -&gt; click “Next” -&gt; select “Place all certificates in the following store” –&gt; click “Browser” -&gt; select “Trusted Root Certification Authorities” -&gt; select OK and Finish
[/bash]

iii) Open SQL Server Configuration Manager

[bash]
Expand SQL Server Network Configuration, right-click “Protocols for MSSQLSERVER” then click “properties”. On the “Certificate” tab select the certificate just installed . On the “Flags” tab, set “ForceEncryption” YES.
[/bash]

Now SSL is ready to be used on the server. The only modification needed in the .NET code is connection string. It will be

[bash]
connectionString=”Data Source=localhost;Initial Catalog=mydb;User ID=user1;Password=pas@123;Encrypt=true;TrustServerCertificate=true”
[/bash]

Small to Medium Sized Data Center with Applogic Cloud

We use Applogic (A Grid Operating System) for Grid Computation.The Virtual Private Data Center with Applogic, provides much flexibilities to manage resources of the commodity servers, similar to physical access to the commodity servers. Applogic functions with the help of Xen Hypervisor, Distributed kernel, Logical Connection Manager and other back bone utilities. By using Applogic group, all these resources from different physical servers(commodity server) can be shifted into a single pool of corresponding resources(CPU,RAM,B/w,etc.). Read more…

Deploying a load balanced e-commerce portal in Amazon EC2

Update: NFS should not be used as that will be a SPOF. One should use S3 or other object stores. An alternative could be multi-node GlusterFS if someone needs volumes shared across nodes.

When building an infrastructure for an eCommerce portal on Cloud, it is important to note that it should be available all the time, that it is fail safe with outages like the one we had recently in AWS EU and U.S. East Regions, survive Hardware failure or any other issues like bug in the system or deployment errors. We built an infrastructure on AWS Cloud that address all these issues with LAMP using various AWS Cloud services like EC2, S3, RDS, EBS etc. It is described in detail below:

 

Achieving High Availability & Fail over across Datacenters

Elastic Load Balancer (ELB)

The Elastic Loadbalancer ( ELB ) service provided by AWS tries to achieve the following:

(i) Spans across Datacenters: Loadbalance traffic across mulitple datacenters (AZ )thus providing high availability even if one datacenter goes down. So you should always make sure that when you launch instances under an ELB, you should launch it in different Availability zones. You can also launch instances in the same AZ but by default ELB will redirect request across multiple AZ in a Round Robin way.

(ii) Failover: ELB will periodically monitor the health of the instances and if any of the instance or monitored service ( e.g. Http ) goes down, ELB will stop redirecting requests to that instance and all the request will be redirected to the remaining number of instances registered under ELB. When the instance comes backup, it will again start redirecting requests to that instance.

(iii) Handling root domain ( apex / main domain ) and subdomains: ELB can loadbalance only those requests coming to alias / subdomain( www ). It cannot handle request coming to root domain. This is because when you configure DNS for enabling ELB, you can only set CNAME to ELB for subdomains. There are 2 reasons for this. One is when you configure ELB, you will only get a Public DNS name for the ELB like the following instead of a Public IP.

[bash]Test-1736333854.us-east-1.elb.amazonaws.com [/bash]

This is because AWS changes the Public IP of the ELB periodically for providing scalability for ELB itself. Another reason why you cannot redirect main domain request to ELB is that DNS protocol itself restricts the usage of CNAME or anything other than “A” record for a root domain. So you cannot CNAME root domain to ELB DNS name.

So for serving root domain requests with ELB , there are only work arounds like mentioned below:

a) We have to assign an elastic IP for an instance under ELB. But what if this instance goes down? Set heartbeat to switch EIP? This is a bit complicated setup as switching EIP to instances present across AZ takes time.

b)The other option is to have the root domain point to the IP addresses of the destination by configuring one or more “A” records (address records) for root domain. You can do that if you know the destination IP addresses are fixed, such as if you are using EC2 Elastic IP addresses. We wouldn’t recommend this because IP addresses will be cached at the client end for long time even if you set low value of TTL at the nameservers. This is because TTL value can also be configured at the the client end overriding the TTL value provided by the nameserver of the domain. e.g. with nscd ( Nameserver Caching Daemon) you can set the TTL value manually in its configuration file.

c) You can keep a separate web server not under ELB with a Redirect Rule for redirecting root domain requests to www. You should make sure that this webserver is highly available as well.

d) A better solution is to go for Domain Registrars ( DNS service providers ) who provide this feature of redirecting root domain requests to www. So this can be handled at the DNS itself. The DNS service provided by AWS “Route53” can be used for this ‘Zone apex’ ( root domain ) redirection.

(iv) SSL Termination

There is support for “SSL termination” in ELB which means you can use ELB to loadbalance HTTPS requests too. You just need to buy the SSL certificate and simply upload it to ELB. ELB will redirect all the HTTPS request to the backend servers. So you can make an eCommerce portal highly secure and highly available with ELB.

(v) Persistent Session

You can enable Sticky Session with ELB but the problem is users will be logged out if any of the instance / webserver goes down and ELB will redirect the subsequent requests from the same user to a different instance and it will prompt the user to login again. To tackle this there were few options we had considered –
a)You can either setup distributed failover memcached server or
b)You can use RDS for storing Session.

We went for RDS as our Session Management store since RDS is an excellent choice for Database Administration as well if you are using MySQL as the Database.

Your application must be configured to write session data to an RDS database. So when an instance / webserver goes down and when the ELB redirects the user request to a different instance, the user will not be asked to login again as all servers are reading session data from the same place that is RDS. The user won’t notice anything at all, even though they’ve now started talking to another server. We recommend using a Multi-AZ RDS instance and write session data into this. So if one of your EC2 instances goes down, the other instances will still have access to the RDS database, and likewise if an RDS zone goes down, Amazon fail this over to the second AZ internally, transparently to you and your application.

So the easiest and most reliable way to share sessions for failover on a multi-server environment is to use RDS, since Amazon handle the database layer’s failover for you.

So basically you can achieve two things by using RDS – Session management and Database Management.

 

AutoScaling

The Autoscaling service provided by AWS allows you to scale horizontally up / down with CPU usage, RAM, Disk I/O etc.

Ideally you should use a Base AMI with Autoscaling that will pull the required packages from a Centralized location like Chef Platform and code from the Version Control System or S3. You can write a startup script to run on instance bootup for this purpose. So when Autoscaling launches a new instance it will pull all the latest updated versions of the packages, code and also any other required custom configurations from a centralized location. This will also make it easier to manage all the configuration details, code updates from a centralized location using tools like Chef Platform, Version Control System or S3 respectively.

Apart from Centralised Configuration / Code management, the reason for using Base ami with Autoscaling is that it is not possible to change the ami configured with Autoscaling service dynamically.

 

Storage for Application Files

We came across lot of options for storing the application files. However you have to consider your priorities before you select a storage service for the code. Following are the points to consider for your application file storage system:

(i)Latency issues: All shared storage systems like NFS / GlusterFS / EBS / S3 etc have latency issues when compared to Instance store (Ephemeral Storage)

(ii)High availability: If you are using a shared storage service like NFS, it should never go down for the entire system to be available all the time.

(iii)Access to the code: How to get the latest code during incremental roll out of a new instance because if you are using a shared storage, it becomes difficult to gives access to the shared storage system when a new instance is launched

We went for instance store / ephemeral store that gives you better I/O performance. You can keep your own highly available SVN repository or go for publicly available Version Control Systems like GitHub. At the same time you can also keep a copy in S3 and sync to it whenever there is a code update. This will make it more redundant.

The problem with using shared storage service like NFS / GlusterFS with EBS / S3 is it becomes difficult to avoid single point failure for NFS / GlusterFS service. But if your site doesn’t have much hits and your priority becomes redundancy, you can go for mounting S3 as filesystem using tools like s3cmd and use that as a shared storage with NFS for multiple instance. The problem with S3 is that it is not intended to be used as a filesystem and there have been issues reported with speed and caching. Or you can use EBS volume for code storage if you have only a single instance serving the request. Even using NFS with EBS volumes ( with frequent snapshots to S3 ) gives better performance than using S3 as shared storage for files.

Not only does instance store gives you better performance, error rates very rare. with EBS volumes error rates are reported frequently. Recent outages with AWS EU & US East Regions shows that the down time was made worse due to increase in time taken to recover from EBS errors.

 

Code Deployment

For automating code deployment, you can configure deployment tools like Capistrano. This will become very handy when you have multiple servers to update simultaneously. Capistrano uses Ruby language and is built for Ruby code deployment but with little changes, you can automate deployment of PHP / Perl / Python / JAVA based application.

chef-deploy is another tool that comes with chef for automating code deployment. Continuous Integration tools like Hudson / Cruise Control are excellent tools when you want to automate the Build, Deployment, Test and Rollback process.

For code deployment, we follow a Release Management process where we keep a staging environment that is an exact replica of the production environment. We push code to the production environment only when it’s been completely tested in the staging environment and approved by the Release Manager. This will further reduce the errors / bugs / and downtime time caused due to the code release.

 

Database Server

We went for RDS across AZ for High availability. AWS will take care of Redundancy, Performance Optimisation, Scalability and Backup. You can avoid the hassle of managing a Database Server by using RDS. RDS is as an excellent distributed highly available Session Management System. You can also take regular backup from RDS and keep it in S3.

You can also use Master–Slave Replication setup instead of RDS. This is also a good option for achieving high availability for Database server. The challenging part will be to manually configure failover for both master and slave servers, achieving scalability with traffic, backup configuration and performance optimization with increasing load. With RDS, all these will be managed by AWS.

 

Log handling

Keep all the important logs like Application logs, Syslogs, SSH log etc in EBS volume. You can either schedule regular snapshots of these EBS volume to S3 or you can even sync these log files to an S3 bucket periodically using tools like s3sync.

 

Configuration Management

If you have more than one server or are planning to scale up in future or would like to automate a lot of administration / coding stuffs, you should definitely use one of the Open Source freely available Configuration Management tools like Chef / puppet / Cfengine

Chef is new and has default support for AWS / EC2. We use Chef extensively for managing our infrastructure in AWS. Chef provide a lot of readily available cookbooks ( recipes / roles ) for LAMP, JAVA app, Cassandra, Hadoop, Nagios etc which can be used readily ( or with minimum customization ) to automate the infrastructure setup and configuration. Chef also comes with a tool called Chef-deploy for automating deployment of code.

So using Chef along with tools like Hudson / Cruisecontrol, you can automate the entire setup from infrastructure setup to configuration management to building, deployment and testing of your application.

 

Performance

To improve performance you can implement the following:

(i)Use caching mechanisms like Memcache(DB scaling) / aiCache / Varnish.

(ii)CDN ( Content Delivery Network ) is a must if you want to provide better end-user response time. There are lot of CDN providers but we recommend AWS CloudFront or Akamai for serving static files and images. For start-up and small business, CDN might be costly but as your target audience grows larger and becomes more global, a CDN is necessary to achieve fast response times.

 

Monitoring & Alert

For monitoring, go for open source monitoring tools along with a SaaS based monitoring application.

(i)There are lot of free and open source option available in the market – Nagios, Zenoss,Zabbix etc. This can be automated with Chef in such a way that when a new server is launched in to the cluster, it will be automatically added to the Nagios list of monitored servers.

(ii)You can also use excellent SaaS based monitoring apps like Pingdom, mon.itor.us, site24x7.com etc for monitoring and alerting via email, SMS or Twitter.

(iii)Custom scripts or tools like Munin & Monit for monitoring and restarting services if it crashes.

 

Backup

You can keep copies of code in an S3 Bucket and sync it with tools like s3sync with every update. For DB Backup, in addition to automated RDS Backup, you can take periodical standard DB backups using mysqldump and store it in S3 bucket.You can also use EBS volumes for keeping replica of code and DB Backup with periodical snapshots to S3.

An important thing to note about S3 storage is it is only a Highly available Storage System. It is not backed up automatically. That means if you delete anything manually from s3, it will be forever gone unless you have manually backed it up with multiple copies in S3. So make sure that you have enough backups available in S3.

Resolving the degraded instance scenario of AWS ec2

There are times when you might receive certain warning messages from amazon.

“Hello,

We have noticed that one or more of your instances are running on a host degraded due to hardware failure.

i-111111

The host needs to undergo maintenance and will be taken down. Your instances will be terminated at this point. We recommend that you launch replacement instances and start migrating to them.” Read more…

Bundling an Amazon EC2 instance with cPanel : HowTo

In this Article we will explain how to bundle an instance which already has cPanel installed in it. The only thing you need to consider is cPanel licensing as cPanel provides license for the Elastic IP address. The article assumes that cPanel is licensed to an Elastic IP and the IP is still with us so that it can be reassigned to the new instance once its launched.

Why should we bundle a cPanel EC2  instance:

Many times we have seen issues like instances becoming unresponsive or not reachable by SSH etc. In such cases if we have a an AMI bundled and ready to start, we can go live in less than 5 minutes. We could save our clients by launching the AMIs with cPanel in couple of instances where the EC2 instances became unresponsive. Read more…

How to configure Memcached on AWS EC2: A Starter’s Guide

Memached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load and session management. Lets focus on session management first and build up a caching daemon to store PHP sessions in a load balanced environment.  In this post I will explain how you can easily install it and make it available in LAMP. Read more…