• Call: +1 (858) 429-9131

Posts Tagged ‘Cloud computing’

Upgrading Rocket Chat

This is a very quick post. Many organizations are using Rocket chat as the slack / IRC alternative.

One important point that is not often documented is the upgrade. We ourselves ran into issues with Node.js errors, Meteror errors etc multiple times.

Here is how we can upgrade without breaking anything.

take a backup of the installation directory

cd /home/rocket
su -l rocket
cp -rfp Rocket.chat backups/date +%F

Assuming that backups are kept in /home/rocket/backups and db backups are in /home/rocket/backups/db

cd backups/db
mongodbump
rm -rf /home/rocket/Rocket.chat

#get the new version

cd /home/rocket
curl -L https://rocket.chat/releases/latest/download -o rocket.chat.tgz
tar xf rocket.chat.tgz
mv bundle Rocket.chat

install dependencies

(cd /home/rocket/Rocket.chat/programs/server && npm install )

All set to start the new server, migrate the dabase etc.

cd /home/rocket/Rocket.chat
node main .js

The last step performs upgrade and maintenance of the db scheme. With most of the versions this method will work.

It is assumed that the user rocket is the user under which Rocket.chat is installed.

Further, the server is running as a service which was done with :

sudo forever-service install -s main.js -e "ROOT_URL=https://chat.agileblaze.net/ MONGO_URL=mongodb://localhost:27017/rocketchat PORT=3000" rocketchat

Mosh aka mobile-shell

Mosh_demo_screenshot

Stumbled upon Mobile shell (Mosh) which allows persistent connection over intermittent connections, VPN – WiFi – networking roaming etc. Its quite useful especially when we have tons of nodes across multiple cloud providers & resulting attempts of their attention grabbing. Automating few things in a CRM made us  end up setting up a Postfix mail server after something like more than a decade, got frustrated over the nearly non-existent internet provided by Asianet DSL & all credit of this goes to them.

 

TL;DR Here is a quick guide to get Mosh working on Mac & GNU/Linux flavours.

 

Mosh uses UDP. Yes, you heard it right.

By default it uses 60000 to 61000 for establishing connections. We would need to open up some of these ports, say a subset of this in the fire wall mechanism that we use to get access to the servers.

There is a client & a server

I missed out this part ! Well, we need to install Mosh on the client and the server. (ie apt-get install mosh or yum install mosh on the servers too BOFH.)

On Amazon / AWS / EC2 cloud,

Open up few UDP ports in the security groups. We opened up 10 ports.

On DigitalOcean or any other provider open the ports in your firewall.

Client side installation:

On Mac,  we ran into issues with libprotobuf

> mosh migrate2cloud.com
dyld: Library not loaded: /usr/local/lib/libprotobuf.7.dylib
  Referenced from: /usr/local/bin/mosh-client
  Reason: image not found
Died at /usr/local/bin/mosh line 201. 

The solution is to upgrade Brew (well why shouldn’t one use brew ?)

brew update ; brew upgrade ;  brew remove libprotobuf ; brew install libprotobuf

will do the magic. If not,  we can try

         brew remove mosh ; brew install mosh

as well. If its doesn’t work, RTFM & the FAQ 🙂

Another issue we ran into was the locale & UTF-8 encoding. We fixed it by installing the locale in the client and server and exporting the following environment variable to the bash profile.

# for mosh
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
export MM_CHARSET=utf8
export LC_COLLATE=”en_US.UTF-8″
export LC_TIME=”en_US.UTF-8″
export LC_NUMERIC=”en_US.UTF-8″
export LC_MONETARY=”en_US.UTF-8″
export LC_MESSAGES=”en_US.UTF-8″

You load the environment variables by doing the following in bash.

. ~/.bash_profile   # don’t miss the dot at the beginning

Firewalls, Tunnelling, NAT

It may not work just yet as you may have to deal with NAT traversal and other nasty things. Creating an SSH tunnel can solve these issues.

mosh –ssh=”ssh -4 -R 2222:localhost:22 -i /Users/migrate2cloud/keys/ssh-key”  root@server.com

Reattaching a detached Mosh:mobile-shell is not possible. But we can run screen inside mobile shell.

do pkill mosh-server instead … or pgrep mosh-server -> kill etc

PS: if you use CIRU.org, things may be different for you.

That’s it. DigitalOcean guys have come up with a  nice write up here which is very helpful. There is also an Android client and iOS client in the making. On GNU/Linux I use KDE Konsole & on Mac iTerm : these are 2 good tools that are very useful IMHO.

Web 2.0 application architecture Template

Application created for a Startup based in Chicago

The term ‘Load Balancer’ is quite self-explanatory, it balances the load on application servers behind it. There can be ‘n’ number of application servers behind the Load Balancer  (LB) which would not be directly facing the end users.

Read more…

Openstack Cloud Software

OpenStack : The Mission

“ To produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable.”

OpenStack is a collection of open source software projects that enterprises/service providers can use to setup and run their cloud compute and storage infrastructure.Rackspace and NASA are the key initial contributors to the stack. Rackspace contributed their “Cloud Files” platform (code) to power the Object Storage part of the OpenStack, while NASA contributed their “Nebula” platform (code) to power the Compute part. OpenStack consortium has managed to have more than 150 members including Canonical, Dell, Citrix etc.

There are 5 main service families under OpenStack

Nova         –   Compute Service

Swift         –    Storage Service

Glance      –    Imaging Service

Keystone  –    Identity Service

Horizon    –    UI Service

Open Stack Compute Infrastructure (Nova)

Nova is the Computing Fabric controller for the OpenStack Cloud. All activities needed to support the life cycle of instances within the OpenStack cloud are handled by Nova. This makes Nova a Management Platform that manages compute resources, networking, authorization, and scalability needs of the OpenStack cloud. But, Nova does not provide any virtualization capabilities by itself; instead, it uses libvirt API to interact with supported hypervisors. Nova exposes all its capabilities through a web services API that is compatible with the EC2 API of Amazon Web Services.

Functions and Features:

• Instance life cycle management

• Management of compute resources

• Networking and Authorization

• REST-based API

• Asynchronous eventually consistent communication

• Hypervisor agnostic : support for Xen, XenServer/XCP, KVM, UML, VMware vSphere and Hyper-V

OpenStack Storage Infrastructure (Swift)

Swift provides a distributed, eventually consistent virtual object store for OpenStack. It is analogous to Amazon Web Services – Simple Storage Service (S3). Swift is capable of storing billions of objects distributed across nodes. Swift has built-in redundancy and fail-over management and is capable of archiving and media streaming. It is extremely scalable in terms of both size (several petabytes) and capacity (number of objects).

Functions and Features

• Storage of large number of objects

• Storage of large sized objects

• Data Redundancy

• Archival capabilities – Work with large datasets

• Data container for virtual machines and cloud apps

• Media Streaming capabilities

• Secure storage of objects

• Backup and archival

• Extreme scalability

OpenStack Imaging Service (Glance)

OpenStack Imaging Service is a lookup and retrieval system for virtual machine images. It can be configured to use any one of the following storage backends:

• Local filesystem (default)

• OpenStack Object Store to store images

• S3 storage directly

• S3 storage with Object Store as the intermediate for S3 access.

• HTTP (read-only)

Functions and Features

• Provides imaging service

OpenStack Identity Service (Keystone)

Keystone provides identity and access policy services for all components in the OpenStack family. It implements it’s own REST based API (Identity API). It provides authentication and authorization for all components of OpenStack including (but not limited to) Swift, Glance, Nova. Authentication verifies that a request actually comes from who it says it does. Authorization is verifying whether the authenticated user has access to the services he/she is requesting for.

Keystone provides two ways of authentication. One is username/password based and the other is token based. Apart from that, keystone provides the following services:

• Token Service (that carries authorization information about an authenticated user)

• Catalog Service (that contains a list of available services at the users’ disposal)

• Policy Service (that let’s keystone manage access to specific services by specific users or groups).

Openstack Administrative Web-Interface (Horizon)

Horizon the web based dashboard can be used to manage /administer OpenStack services. It can be used to manage instances and images, create keypairs, attach volumes to instances, manipulate Swift containers etc. Apart from this, dashboard even gives the user access to instance console and can connect to an instance through VNC. Overall, Horizon

Features the following:

• Instance Management – Create or terminate instance, view console logs and connect through VNC, Attaching volumes, etc.

• Access and Security Management – Create security groups, manage keypairs, assign floating IPs, etc.

 • Flavor Management – Manage different flavors or instance virtual hardware templates.

 • Image Management – Edit or delete images.

 • View service catalog.

 • Manage users, quotas and usage for projects.

 • User Management – Create user, etc.

 • Volume Management – Creating Volumes and snapshots.

 • Object Store Manipulation – Create, delete containers and objects.

 • Downloading environment variables for a project.

INSTALLATING OPEN STACK

We can install open stack ESSEX very easily using StackGeek script. Login to your box and install git with apt-get. We’ll become root and do an update first.

sudo  su
apt-get update
apt-get install git

Now checkout the StackGeek scripts from Github:

git clone git://github.com/StackGeek/openstackgeek.git   
cd openstackgeek

Install the Base Scripts

Be sure to take a look at the scripts before you run them. Keep in mind the scripts will periodically prompt you for input, either for confirming installation of a package, or asking you for information for configuration.

Start the installation by running the first script:

./openstack_base_1.sh

When the script finishes you’ll see instructions for manually configuring your network. You can edit the interfaces file by doing a:

vim /etc/network/interfaces

Copy and paste the network code provided by the script into the file and then edit:

auto eth0 
iface eth0 inet static
  address 192.168.1.48		
  network 192.168.1.0		
  netmask 255.255.255.0
 broadcast 192.168.1.255
  gateway 192.168.1.124			
  dns-nameservers 8.8.8.8  
auto eth1

Change the settings for your network configuration and then restart networking and run the next script:

/etc/init.d/networking restart

Then run the second script :

./openstack_base_2.sh

After the second script finishes, you’ll need to set up a logical volume for Nova to use for creating snapshots and volumes. Nova is OpenStack’s compute controller process.

Here’s the output from the format and volume creation process:-

root@manager-System-Product-Name:/openstackgeek# fdisk /dev/sda
Device contains neither a valid DOS partition table,nor Sun,SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xb39fe7af.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p Partition number (1-4, default 1): 3  
First sector (2048-62914559, default 2048): 
 Using default value 2048 Last sector,(2048-62914559,default 62914559): 
Using default value 62914559 
Command (m for help): w The partition table has been altered! 
Calling ioctl() to re-read partition table. Syncing disks.
root@manager-System-Product-Name:/openstackgeek# pvcreate -ff /dev/sda3
 Physical volume "/dev/sda3" successfully created
root@manager-System-Product-Name:/openstackgeek# vgcreate nova-volumes /dev/sda3
 Volume group "nova-volumes" successfully created 

Note: Your device names may vary.

Installing MySql

The OpenStack components use MySQL for storing state information. Start the install script for MySQL by entering the following:

./openstack_mysql.sh

You’ll be prompted for a password used for each of the components to talk to MySQL:
Enter a password to be used for the OpenStack services
to talk to MySQL (users nova, glance, keystone): redhat
Note(Here “redhat” is the password given to nova,glance,keystone) 

During the installation process you will be prompted for a root password for MySQL. In our install example we use the same password, ‘redhat’. At the end of the MySQL install you’ll be prompted for your root password again.

mysql start/running, process 8796
################################################################################ 
Creating OpenStack databases and users. 
Use your database password when prompted. 
 Run './openstack_keystone.sh' when the script exits. 
################################################################################
Enter password:
After MySQL is running, you should be able to login with any of the OpenStack 
users and/or the root admin account by doing the following:

mysql -u root -predhat
mysql -u nova -predhat nova
mysql -u keystone -predhat keystone
mysql -u glance -predhat glance

Installing Keystone

Keystone is OpenStack’s identity manager. Start the install of Keystone by doing:

./openstack_keystone.sh

You’ll be prompted for a token, the password you entered for OpenStack’s services, and your email address. The email address is used to populate the user’s information in the database.

Enter a token for the OpenStack services to auth wth keystone: redhattoken 
Enter the password you used for the MySQL users (nova, glance, keystone):redhat 
Enter the email address for accounts(nova,glance,keystone):user@company.com
You should be able to query Keystone at this point. 
You’ll need to source the“stackrc” file before you talk to Keystone:
 . ./stackrc   
 keystone user-list    
 Keystone should return a list of users:
+----------------------------------+---------+------------------------+--------+
|                id                | enabled |         email          |  name  |
+----------------------------------+---------+------------------------+--------+
| b32b9017fb954eeeacb10bebf14aceb3 | True    | user@company.com       | demo   |
| bfcbaa1425ae4cd2b8ff1ddcf95c907a | True    | user@company.com       | glance |
| c1ca1604c38443f2856e3818c4ceb4d4 | True    | user@company.com       | nova   |
| dd183fe2daac436682e0550d3c339dde | True    | user@company.com       | admin  |
+----------------------------------+---------+------------------------+--------+

Installing Glance

Glance is OpenStack’s image manager. Start the install of Glance by doing:

./openstack_glance.sh

The script will download an Ubuntu 12.04 LTS cloud image from StackGeek’s S3 bucket.Once it’s done, you should be able to get a list of images:

glance index

Here’s the expected output:

ID              :- 71b8b5d5-a972-48b3-b940-98a74b85ed6a 
Name            :- Ubuntu 12.04 LTS
Disk Format     :- qcow2 
Container Format:- ovf 
Size            :- 226426880

Installing Nova

We’re almost done installing! The last component is the most important one as well. Nova is OpenStack’s compute and network manager. It’s responsible for starting instances, creating snapshots and volumes, and managing the network. Start the Nova install by doing:

./openstack_nova.sh

You’ll immediately be prompted for a few items, including your existing network interface’s IP address, the fixed network address, and the floating pool addresses:

######################################################
The IP address for eth0 is probably 192.168.1.48.
Keep in mind you need an eth1 for this to work.
######################################################
Enter the primary ethernet interface IP: 192.168.1.48
Enter the fixed network (eg. 10.0.2.32/27): 192.168.1.0/24
Enter the fixed starting IP (eg. 10.0.2.33): 192.168.1.1
############################################################################
The floating range can be a subset of your current network. 
Configure your DHCP server to block out the range before you choose it here. 
An example would be 10.0.1.224-255
############################################################################
Enter the floating network (eg. 10.0.1.224/27):  
Enter the floating netowrk size (eg. 32):

The fixed network is a set of IP addresses which will be local to the compute nodes. Think of these addresses as being held and routed internally inside any of the compute node instances.

The floating network is a pool of addresses which can be assigned to the instances you are running. For example, you could start a web server and map an external IP to it for serving a site on the Internet.


Finish Installing Nova

Nova should finish installing after you enter all the network information. When it’s done, you should be able to get a list of images from Glance via Nova:

 nova image-list

And get the expected output we saw earlier from Glance:

root@manager-System-Product-Name:/openstackgeek# nova image-list
+--------------------------------------+------------------+--------+--------+
|                  ID                  |       Name       | Status | Server |
+--------------------------------------+------------------+--------+--------+
| 71b8b5d5-a972-48b3-b940-98a74b85ed6a | Ubuntu 12.04 LTS | ACTIVE |        |
+--------------------------------------+------------------+--------+--------+

Installing Horizon

Horizon is the UI and dashboard controller for OpenStack. Install it by doing:

./openstack_horizon.sh

When it’s done installing, you’ll be given a URL to access the dashboard. 
You’ll be able to login with the user ‘admin’ 
and whatever you entered earlier for your password. 
If you’ve forgotten it, simply grep for it in your environment:

env |grep OS_PASSWORD

The URL will be : http://192.168.1.48

You can login the Openstack dashboard by the following credentials

USER : admin

PASSWORD : redhat

DevOps on EC2 using Capistrano

DevOps is the combination of development and operation processes. Cloud with your DevOps offers some fantastic properties. The ability to leverage all the advancements made in software development around repeatability and testability with your infrastructure. The ability to scale up as need be real time and among other things being able to harness the power of self healing systems.

The process piece of devops is about taking the principles behind Agile to the entire continuous software development process. The obvious step is bringing Agile ideas to the operations team, which is sorely needed. Traditionally in the enterprise, the application development team is in charge of gathering business requirements for a software program and writing code. The development team tests their program in an isolated development environment for quality assurance which is later handed over to the operations team. The operations team is tasked with deploying and maintaining the program. The problem with this paradigm is that when the two teams work separately, the development team may not be aware of operational roadblocks that prevent the program from working as anticipated.

Capistrano

Capistrano is a developer tool for running scripts on multiple servers, mainly used for deploying web applications on to the servers. It is typically installed on a workstation, and used to deploy code from your source code management to one, or more servers. Capistrano is originally called “SwitchTower”, the name was changed to Capistrano in March 2006 because of some trademark conflict. It is a time saving command line tool and it is very useful to AWS/EC2 servers because we can deploy the code to 1000’s of aws servers by using a single command. For the security of servers we are commonly using aws ssh key authentication. In capistrano  we use this aws ssh key to deploy the web applications to the aws servers.

In Cloud Computing, deploying applications to production/live servers is always a delicate task. The whole process needs to be quick to minimize downtime. Automating the deployment process helps running repetitive tasks minimizing the possibility human error. It is also a good idea to have a proven and easy way to rollback to a previous version if something goes wrong.

It is a standalone utility that can also integrate nicely with Rails. We simply provide Capistrano with a deployment “recipe” or “formula” that describes our various servers and their roles. It is a single-command deployment. it even allows us to roll a bad version out of production and it revert back to the previous release very easily.

Capistrano Deployment

The main functionality of the Capistrano is to Deploy the rails application which we have already developed and we are using the “SVN” or “GIT” to manage the code. It will transfer all the files of our rails application which we have developed in our local host to aws servers directly by simply executing a simple command in our command prompt.

Steps to deploy a rails application

[shell]gem install capistrano[/shell]

Now,we need to capistranize our rails application using the following commands

[shell]capify .[/shell]

It will create two files

[shell]

config/deploy.rb
capfile .

[/shell]

How to set up deploy.rb file

[shell]

require ‘rubygems’
require ‘activesupport’
set :application, “<application name>”
set :scm_username/ “<username>”
set :use_sudo, false
set :repository, “http://#{scm_username}@www.example.com/svn/trunk”
set :deploy_to, “/var/www/#{application}”
set :deploy_via, :checkout
set :scm, :git
set :user, “root”
role :app, “<domain_name>”
role :web, “<domain_name>”
rold :db, “<domain_name>”, :primary => true
namespace :migrations do
desc “Run the Migrations”
task :up, :roles => :app do
run “cd #{current_path}; rake db:auto:migrate;”
end
task :down, :roles => :app do
run “cd #{current_path}; rake db:drop; rake
db:create”
end
end

[/shell]

where,

scm_username’ is your user name
application’ is an arbitrary name you create to identify your application on the server
use_sudo’ specifies to capistrano that it does not need to append ‘sudo’ before all the commands it will run
repository’ identifies where your subversion repository is located

If we aren’t deploying to server’s default path, we need to specify the actual location by using the ‘deploy_to’ variable as given below

[shell]
set :deploy_to, “/var/www/#{application}”
set :deploy_via, :checkout
[/shell]

If we are using the git to manage our source code, specify the SCM by using the ‘scm’ variable as given below

[shell]
set :scm, :git
set :user, “root”
role :app, “<domain_name>”
role :web, “<domain_name>”
rold :db, “<domain_name>”, :primary => true
[/shell]

Since most rails users will have the same domain name for their web,app and database, we can simply use our domain variable we set earlier.

[shell]
namespace :migrations do
desc “Run the Migrations”
task :up, :roles => :app do
run “cd #{current_path}; rake db:auto:migrate;”
end
task :down, :roles => :app do
run “cd #{current_path}; rake db:drop; rake
db:create”
end
end

[/shell]

After completion of our settings in the deploy.rb file, we need to commit the application by using “svn commit” command if we use svn.

Then we need to run the following command:

[shell]

cap deploy:setup

[/shell]

It is used to create the directory structure in server.

[shell]cap deploy:check[/shell]

It checks all the dependencies/things like directory permission and necessary utilities to deploy the application by using capistrano.

If everything is successful, you should see a message like:
You appear to have all necessary dependencies installed
And finally deploy the application by using the following command:

[shell]cap deploy[/shell]

Command finished successfully

To Clean up the releases directory, leaving the five most recent releases

[shell]Cap cleanup[/shell]

Prints the difference between what was last deployed, and what is currently in our repository

[shell]cap diff_from_last_deploy[/shell]

To Rolls back to the previously deployed version

[shell]cap deploy:rollback:code[/shell]

Amazon’s EC2 cloud cuts the requisition time of the order & delivery stages down to just minutes. This is already a 75% savings in deployment time! But, without automated deployment, you’ll still need a week to get your application installed.

DevOPS on AWS Cloud using Opscode Chef

Rule the Cloud‘ with Chef
Chef is Infrastructure as Code,an API for your entire infrastructure. Assuming that you are well versed with cloud if not still you should have atleast heard of cloud computing and it is still an evolving paradigm and Cloud computing companies are the newest buzz in the IT sector. Chef is used in conjunction with cloud  from cloud providers say Amazon’s AWS. If a software thats being developed is a mix of technology which is interdependent and works in perfect harmony then why not the people behind it, this thought has led to the emergence of a new cultral trend called DevOPS. Now if you setup a number of instances on the cloud then whats next – new instances on cloud are just like bare metal server and the configuration has to be done from scratch and it would be feasible to do so manually for couple of them what if the count just got bigger say 100 live instances with different unix distros, although a script could be written but still it will not suffice,  in the long run considering management too. Here the CHEF comes into play

“chef is sysadmin robot performing configuration tasks automatically and much more quickly than a single admin could ever hope to” – Jesse Robbins, Opscode CEO.

CHEF is an open source configuration management tool using pure-Ruby,the chef domain specific language for writting system configuration related stuff (recipes and cookbook)

CHEF brings a new feel with its interesting naming conventions relating to cookery like Cookbooks (they contain codes for a software package installation and configuration in the form of Recipes), Knife (API tool), Databags (act like global variables) etc

Although there are many configuration management tools prevailing in the industry CHEF was able to secure its position in the race.

“CHEF take a step farther passes puppet and cfengine — like doing “LIVE SEARCH” within  configuration management like loadbalancer can call out to get a list of the app servers you need to balance  or an applicaton server can call out, get a reference to the master database server  etc …..the centralised chef server is indexing all the information about your infrasturctre  so that you could search in the command line using knife you know in real time so that application could lever that data..” by Seth Chisamore from the OPSCODE.

A techonology peak that isnt fluffy – Cloud
For those folks new to cloud- Its a whole bunch of activites which began as an innovation, recently given out as products and now they have become so widespread and so feature complete that they became suitable for utility services.

So if you dont want cloud in your business its like saying you dont want to use the electricity instead you built your own generator and use it according to your need. Now what do we loose if we continue with that is the competitive edge ie you get the pressure to keep your stuff upgraded inorder to find your place relative to the others in the ecosystem.

Cloud is API oriented, everything you see in cloud is ulitmately programmable.

Virtualization is the foundation of Cloud but virtualization is not Cloud by itself. It certainly enables many of the things we talk about when we talk Cloud but it is not necessary sufficient to be a cloud. Google app engine is a cloud that does not incorporate virtualization. One of the reasons that virtualization is great is because you can automate the procurement of new boxes.

A Culture thats on path to revolutionize IT – DevOPS
Devops is something that orginated in webshops predominantly and it require a kind of tools thats really not available except for home grown tools which the big webshops built over and over again. So the organisation who wanted to use devops started using the tools that enable this transition as most organisations depends on web as a source of revenue in a variety of different ways, even the enterprise desire to be as agile as the webshops. This has begun a revolution from the website permeate into the enterprise base more frequently.

Considering a real life example for Devops say facebook, the most popular social networking site here the developers/QA/operations – there is alot of communications, cross talk happening between them like the developers has to write codes, QA who has to make sure the good code goes out, the operations team has to make sure its up and running. Finally all of these has to be in records which altogether seems to be inefficient, this led to the evolving of the entire system. According to the conventional practices where the developers writes the code and throws it off to the testing. Once the testing is done then it moves to the operations etc. Contrary to that the developers , operations team are all involved in the entire lifecycle of the project as a team. This creates a symbiotic relationship. Now the operations people could understand what the engineers needs the most and the developers are able to see the value that operation people brings as they make architecture decisions.

Cloud with your DevOps offers some fantastic properties. The ability to leverage all the advancements made in software development around repeatability and testability with your infrastructure. The ability to scale up as need be real time (autoscaling) and among other things being able to harness the power of self healing systems. DevOps better with Cloud.

Configuration management say CHEF is one of the most fundamental elements allowing DevOps in the cloud. It allows you to have different VMs that have just enough OS that they can be provisioned, automatically through virtualization, and then through configuration management can be assigned to a distinct purpose within the cloud. The CM system handles turning the lightly provisioned VM into the type of server that it is intended to be.

DevOps & Chef
DevOps is nonthing but a cultural movement where everybody say the developers, QA, Operations, Testing etc get along. A project group formation with a mixed skillset that blurs the line between say a developer and sysadmin. This helps the project to meet its deadlines
and avoid unexpected situations. Cloud computing act like a catalyst to this movement. Thereby the CHEF also hops in.

Chef forms a critical layer in the Devops stack.Thanks to the concept of infrastructure as code and virtualization, we can define and build our infrastructure based on text files. Those files can be version-controlled and tested like regular code. The artifact (ami, image), can then be deployed on an infrastructure. The following image gives you an overview on the similarities.

Inadvertently the issues like “what if the application” or “what if the infrasturcture” are resolved, the fact is that application is the infrastructure and infrastructure is the application and we are here to enable business, also it helped bring peoples in the team into better alignment across the board.

Chef configuration is written in pure ruby.

Devops == Ruby

For those who think Bash is enough as a scripting language – Bash becomes a liability not an asset once your script exceeds 100 lines and a total nightmare if you need to parse or output HTML, CSV, XML, JSON, etc. A significant point to be noted is that Chef uses Ruby in its recipes unlike puppet where it uses its own configuration language that is based on Ruby although chef is heavily inspired from puppet. If you chose chef then you are effectively scripting your infrastructure with ruby.

Though Chef was only released on January 15th , 2009 it has gotten rapid adoption and gained a large number of contributors. According to the Opscode wiki there are 545 approved contributors to Opscode projects and 106 companies. Beyond that the #chef IRC channel is typically attended by over 100 users and Opscode staff, signs of a healthy, growing open source community.

Springsource division of VMware have signed on to contribute to the project. They are even being very public about it as seen in this endorsement:

“We are excited about the open source contributions the Springsource Division of VMware has made to Opscode Chef.” said Javier Soltero, CTO of Springsource Management Products at VMware. “Chef is an important tool for automating infrastructure management and we look forward to its continued growth and success.”

Moreover on my experience of using chef I really enjoyed the quick response I could get from the Opscode Support Team for all my queries and they had always being able to direct me towards a solution.

Automation Using Chef to create an Instance on Amazon Cloud Service Provider with Apache webserver configured in it.

Memo
chef-workstation – is the place where we customize our cookbooks and maintains the chef-repo
chef node – is the management node that we create using chef, it configures itself based on its runlist and downloaded cookbooks

The really cool thing with Chef is that you can rerun cookbooks against a node and it will not do anything it has already done i.e it will not change the end result on the target node as defined by the recipes being run against it. So you will always get the same outcome no matter what state the node and actions will not be taken if already done (and conversely run if detected it has not been run).  When reading about Chef you will see this described as being idempotent (There I’ve saved you looking it up).

Prerequisites – an AWS account, EC2 API configured, OS – Ubuntu.

1. Sign up an account at http://www.opscode.com/hosted-chef/# , Here we use the OHC (opscode hosted chef) where we get to create upto 5 nodes for free!!

2.Verify your opscode account.

3.Download the files

Create an organization in the Console page at www.manage.opscode.com, and then download the following files:

  • Your Organization validation key. This is used to automatically register new Chef Clients (like servers you manage).
  • The Knife configuration file.
  • Your User key. This is used to authenticate your user with Hosted Chef.
  • Edit knife.rb  to add aws access key and secret access key
  • knife[:aws_access_key_id]     = “Your AWS Access Key”
  • knife[:aws_secret_access_key] = “Your AWS Secret Access Key”

At this stage I have a chef ready user environment, an OpsCode organisation set up and now I want to start by spinning up an ec2 instance. I will not be going into any depth regarding  the ec2 specifics as that would make this post far too long.

4.Setting Up chef-Workstation

Install Ruby and Development Tools

#sudo apt-get update
#sudo apt-get install ruby ruby-dev libopenssl-ruby rdoc ri irb build-essential wget ssl-cert git-core
#sudo gem update –system

Install RubyGems

#cd /tmp
#wget http://production.cf.rubygems.org/rubygems/rubygems-1.8.10.tgz
#tar zxf rubygems-1.8.10.tgz
#cd rubygems-1.8.10
#sudo ruby setup.rb –no-format-executable

Install Chef

#sudo gem install chef

5.To verify chef installation

#chef-client -v

6.Build the chef repository

#cd ~
#git clone https://github.com/opscode/chef-repo.git

Knife reads configuration files in .chef. so we need to create those as well

#mkdir -p ~/chef-repo/.chef

Copy the keys and knife configuration you downloaded earlier into this directory:

#cp USERNAME.pem ~/chef-repo/.chef
#cp ORGANIZATION-validator.pem ~/chef-repo/.chef
#cp knife.rb ~/chef-repo/.chef

Run the following command to confirm knife is working with the Hosted Chef API.

#cd ~/chef-repo
#knife client list

output : “ORGANIZATION-validator”

7.Now i need to download the apache2 cookbook on to my workstation, customize if required and then upload it to my account on the opscode platform

#knife cookbook site install apache2

this will notify git and also pulls down the desired cookbook

8.Upload the cookbook using the following command

#knife cookbook upload apache2

9.Enter the following command, sit back and  enjoy the show!!!

#knife ec2 server create -G default -I ami-1212ef7b -f m1.small -S <aws ssh key id> -i <ssh identity file> -x root -r ‘recipe[apache2]’


Before proceeding it would probably be a good idea to take time out and read the Opscode  Chef Recipe wiki which has a nice clear explanation on cookbook name spaces. Also remind yourself of the components that make up a cookbook it’s worth noting that recipes manage resources and those resources will be executed in the order they occur.

HADOOP Cluster on AWS EC2 with hadoop-0.20 and ubuntu-10.04

Let’s start with a small introduction- what is hadoop ?. Hadoop is an open-source project administered by the Apache Software Foundation. Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

Technically, Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using a technique called MapReduce.

Dealing with big data requires two things:

  • Inexpensive, reliable storage; and
  • New tools for analyzing unstructured and structured data.

Hadoop creates clusters of machines and coordinates work among them. Clusters can be built with inexpensive computers.If one fails, Hadoop continues to operate the cluster without losing data or interrupting work, by shifting work to the remaining machines in the cluster.

HDFS manages storage on the cluster by breaking incoming files into pieces, called “blocks,” and storing each of the blocks redundantly across the pool of servers.

The main services running in a hadoop cluster will be

1)namenode

2)jobtracker

3)secondarynamenode

These three will be running only on a single node(machine) ; that machine is the central machine which controls the cluster.

4)datanode

5)tasktracker

These two services will be running on all other nodes in the cluster.

HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on.

Above the file systems comes the MapReduce  engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs. The Job Tracker pushes work out to available Task Tracker nodes in the cluster, striving to keep the work as close to the data as possible.

The only purpose of the secondary name-node is to perform periodic checkpoints. The secondary name-node periodically downloads current name-node image and edits log files, joins them into new image and uploads the new image back to the (primary and the only) name-node.

Now Let us have a look at how to build a hadoop cluster using Cloudera hadoop-0.20 on ubuntu-10.04

You should install sun –jdk  first. Then add the following repositories to the apt sources list.

vim /etc/apt/sources.list.d/cloudera.list

[bash]

deb http://archive.cloudera.com/debian lucid-cdh3u0 contrib

deb-src http://archive.cloudera.com/debian lucid-cdh3u0 contrib

[/bash]

Import key

[bash]curl -s http://archive.cloudera.com/debian/archive.key | apt-key add -[/bash]

Then run

[bash]apt-get update[/bash]

For Namenode/Jobtracker ( These two services should run only on a single central machine in the cluster)

[bash]

apt-get install hadoop –yes

apt-get install hadoop-0.20-namenode

apt-get install hadoop-0.20-jobtracker

apt-get install hadoop-0.20-secondarynamenode

[/bash]

Configuration

vim /etc/hadoop/conf/hadoop-env.sh

Append these

[bash]

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24/   ( your java home comes here )

export HADOOP_CONF_DIR=/etc/hadoop/conf

export HADOOP_HOME=/usr/lib/hadoop-0.20

export HADOOP_NAMENODE_USER=hdfs

export HADOOP_SECONDARYNAMENODE_USER=hdfs

export HADOOP_DATANODE_USER=hdfs

export HADOOP_JOBTRACKER_USER=mapred

export HADOOP_TASKTRACKER_USER=mapred

export HADOOP_IDENT_STRING=hadoop

[/bash]

vim /etc/hadoop/conf/core-site.xml

[bash]

<?xml version=”1.0″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<!– Put site-specific property overrides in this file. –>

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://< ip address of this machine >:8020</value>

</property>

</configuration>

[/bash]

vim /etc/hadoop/conf/hdfs-site.xml

 

[bash]

<?xml version=”1.0″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<!– Put site-specific property overrides in this file. –>

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/var/lib/hadoop-0.20/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/var/lib/hadoop-0.20/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

</configuration>

[/bash]

vim /etc/hadoop/conf/mapred-site.xml

[bash]

<?xml version=”1.0″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<!– Put site-specific property overrides in this file. –>

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>< ip address of this machine >:8021</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>/var/lib/hadoop-0.20/system</value>

</property>

<property>

<name>mapred.local.dir</name>

<value>/var/lib/hadoop-0.20/mapred</value>

</property>

</configuration>

[/bash]

——————————————————————————————————————————————

[bash]

mkdir  / var/lib/hadoop-0.20/name

mkdir  / var/lib/hadoop-0.20/data

mkdir  / var/lib/hadoop-0.20/system

mkdir  / var/lib/hadoop-0.20/mapred

chown -R hdfs /var/lib/hadoop-0.20/name

chown -R hdfs /var/lib/hadoop-0.20/data

chown -R mapred /var/lib/hadoop-0.20/mapred

[/bash]

Now format NameNode

[bash]yes Y | /usr/bin/hadoop namenode –format[/bash]

Start namenode

[bash]/etc/init.d/hadoop-0.20-namenode start[/bash]

Check the log Files for error:

less /usr/lib/hadoop-0.20/logs/hadoop-hadoop-namenode-<ip>.log

Also you can check whether the Namenode process is up or not using the command

[bash]# jps[/bash]

Start the SecondaryNamenode

[bash]/etc/init.d/hadoop-0.20-secondarynamenode start[/bash]

Log: less /usr/lib/hadoop-0.20/logs/hadoop-hadoop-secondarynamenode-<ip>.log

[bash]

sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-0.20/system

sudo -u hdfs hadoop fs -chown mapred /var/lib/hadoop-0.20/system

[/bash]

Now Start the JobTracker

[bash]/etc/init.d/hadoop-0.20-jobtracker start[/bash]

Log : less /usr/lib/hadoop-0.20/logs/hadoop-hadoop-jobtracker-ip-10-108-39-34.log

Now  jps  command will show the three processes up

# jps

19233 JobTracker

18994 SecondaryNameNode

18871 NameNode

For Datanode/Tasktracker ( These two services should be running on all the other machines in the cluster )

[bash]

apt-get install hadoop-0.20-datanode

apt-get install hadoop-0.20-tasktracker

[/bash]

Configuration

vim /etc/hadoop/conf/core-site.xml

 

[bash]

<?xml version=”1.0″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

&nbsp;

<!– Put site-specific property overrides in this file. –>

&nbsp;

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://< ip address of the namenode >:8020</value>

</property>

</configuration>

[/bash]

vim /etc/hadoop/conf/hdfs-site.xml

[bash]

<?xml version=”1.0″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

&nbsp;

<!– Put site-specific property overrides in this file. –>

&nbsp;

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/var/lib/hadoop-0.20/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/var/lib/hadoop-0.20/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

</configuration>

[/bash]

vim /etc/hadoop/conf/mapred-site.xml

[bash]

<?xml version=”1.0″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

&nbsp;

<!– Put site-specific property overrides in this file. –>

&nbsp;

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>< ip address of jobtracker  >:8021</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>/var/lib/hadoop-0.20/system</value>

</property>

<property>

<name>mapred.local.dir</name>

<value>/var/lib/hadoop-0.20/mapred</value>

</property>

</configuration>

[/bash]

———————————————————————————————————————————————

[bash]

mkdir  /var/lib/hadoop-0.20/data/

chown -R hdfs /var/lib/hadoop-0.20/data

mkdir /var/lib/hadoop-0.20/mapred

chown -R mapred /var/lib/hadoop-0.20/mapred

[/bash]

Start the DataNode

[bash]/etc/init.d/hadoop-0.20-datanode start[/bash]

Log : less /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-<ip>.log

Start the TaskTracker

[bash]/etc/init.d/hadoop-0.20-tasktracker start[/bash]

Log: less /usr/lib/hadoop-0.20/logs/hadoop-hadoop-tasktracker-<ip>.log

You can now check the interface

http://< namenode-ip >:50070   – for HDFS overview

and

http://< jobtracker –ip>:50030  – for Mapreduce overview