• Call: +1 (858) 429-9131

Posts Tagged ‘Company: Amazon.com’

Mosh aka mobile-shell

Mosh_demo_screenshot

Stumbled upon Mobile shell (Mosh) which allows persistent connection over intermittent connections, VPN – WiFi – networking roaming etc. Its quite useful especially when we have tons of nodes across multiple cloud providers & resulting attempts of their attention grabbing. Automating few things in a CRM made us  end up setting up a Postfix mail server after something like more than a decade, got frustrated over the nearly non-existent internet provided by Asianet DSL & all credit of this goes to them.

 

TL;DR Here is a quick guide to get Mosh working on Mac & GNU/Linux flavours.

 

Mosh uses UDP. Yes, you heard it right.

By default it uses 60000 to 61000 for establishing connections. We would need to open up some of these ports, say a subset of this in the fire wall mechanism that we use to get access to the servers.

There is a client & a server

I missed out this part ! Well, we need to install Mosh on the client and the server. (ie apt-get install mosh or yum install mosh on the servers too BOFH.)

On Amazon / AWS / EC2 cloud,

Open up few UDP ports in the security groups. We opened up 10 ports.

On DigitalOcean or any other provider open the ports in your firewall.

Client side installation:

On Mac,  we ran into issues with libprotobuf

> mosh migrate2cloud.com
dyld: Library not loaded: /usr/local/lib/libprotobuf.7.dylib
  Referenced from: /usr/local/bin/mosh-client
  Reason: image not found
Died at /usr/local/bin/mosh line 201. 

The solution is to upgrade Brew (well why shouldn’t one use brew ?)

brew update ; brew upgrade ;  brew remove libprotobuf ; brew install libprotobuf

will do the magic. If not,  we can try

         brew remove mosh ; brew install mosh

as well. If its doesn’t work, RTFM & the FAQ 🙂

Another issue we ran into was the locale & UTF-8 encoding. We fixed it by installing the locale in the client and server and exporting the following environment variable to the bash profile.

# for mosh
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
export MM_CHARSET=utf8
export LC_COLLATE=”en_US.UTF-8″
export LC_TIME=”en_US.UTF-8″
export LC_NUMERIC=”en_US.UTF-8″
export LC_MONETARY=”en_US.UTF-8″
export LC_MESSAGES=”en_US.UTF-8″

You load the environment variables by doing the following in bash.

. ~/.bash_profile   # don’t miss the dot at the beginning

Firewalls, Tunnelling, NAT

It may not work just yet as you may have to deal with NAT traversal and other nasty things. Creating an SSH tunnel can solve these issues.

mosh –ssh=”ssh -4 -R 2222:localhost:22 -i /Users/migrate2cloud/keys/ssh-key”  root@server.com

Reattaching a detached Mosh:mobile-shell is not possible. But we can run screen inside mobile shell.

do pkill mosh-server instead … or pgrep mosh-server -> kill etc

PS: if you use CIRU.org, things may be different for you.

That’s it. DigitalOcean guys have come up with a  nice write up here which is very helpful. There is also an Android client and iOS client in the making. On GNU/Linux I use KDE Konsole & on Mac iTerm : these are 2 good tools that are very useful IMHO.

“IAM user, who can write to the S3 bucket”

Here we are to educate ourselves as to what “IAM user, who can write to the S3 bucket” is, by using cloudfront distribution and S3 objects, which are of world readable.

 

1.Create a bucket in s3 my-bucket

1. Log in to the AWS Management Console

2. Click on s3 tab

3. Create a new bucket

4. Create a custom/aws bucket policy to make it world readable

Read more…

Web 2.0 application architecture Template

Application created for a Startup based in Chicago

The term ‘Load Balancer’ is quite self-explanatory, it balances the load on application servers behind it. There can be ‘n’ number of application servers behind the Load Balancer  (LB) which would not be directly facing the end users.

Read more…

Apache on the Cloud – The things you should know

    LAMP forms the base of most web applications.  As the load on an server increases, the bottlenecks in the underlying infrastructure become more apparent in the form of slow response to user requests.

     To overcome this slow response  the primary choice of most people is to add more hardware resources ( incase of AWS increasing the instance type). This will definitely  increases performance but will cost you more money.  The webserver and database eat most of the resources. Most commonly used web server is apache and database is MySQL. So if we can optimize these two we can improve the performance.

   Apache optimization techniques can often provide significant acceleration boosts  even when other acceleration techniques are in use, such as a CDN.  mod_pagespeed is a module from Google for Apache HTTP Servers that can improve the page load times of your website. you can read more on this from here.  If you want to deploy a PHP app on AWS Cloud, Its better to using some kind of caching mechanism.  Its already discussed in our blog .

      Once we came into a situation where we have to use a micro instance for a web server with less than 500 hits a day

      When the site started running live, and we feel like disappointed. when accessing website, it would sometimes pause for several seconds before serving the requested page. It took  hours to figure out what was going on. finally we run the command top and quickly discovered that when the site was accessing by certain amount of users the CPU would spike, but the spike was not the typical user or system CPU. For testing what’s happening in  server we used the apache benchmark tool ‘ab’ and run the following command on  localhost.

                                             #ab -n 100 -c 10 http://mywebserver.com/

      This will show  how fast our web server can handle 100 requests, with a maximum of 10 requests running concurrently. In the meantime we were monitoring the output of top command on web server.

     For further investigation we started with  sar – Linux command to  Collect, report, or save system activity information

  #sar 1

      According to amazon documentation “Micro instances (t1.micro) provide a small amount of consistent CPU resources and allow you to increase CPU capacity in short bursts when additional cycles are available”.

       If you use 100% CPU for more than a few minutes, Amazon will “steal” CPU time from the instance, meaning that they throttle your instance.  This last  as long as five minutes, and then you get a few seconds of 100% again, then the restrictions are back.  This will effect your website, making it slow, and even timing-out requests. basically means the physical hardware is busy and the hypervisor can’t give the VM the amount of CPU cycles it wants.

   Real tuning required on prefork. This is where we can tell apache to only generate so many processes. The defaults values  are high, and which cant be handled by micro instance. Suppose you get 10 concurrent requests for a php page and require around 64MB of RAM when requested (you have to make sure that  php memory_limit is above that value). That’s around 640MB of RAM on micro instance of 613MB RAM.  This is the case  with 10 connections – apache is configured to allow 256 clients by default,  We need to  scale these down , normally with 10-12 MaxClients. As per out case, this is still a huge number because 10-12 concurrent connections would use all our memory. If you want to be really cautious, make sure that your max memory usage is less than 613MB. Something like 64M php memory limit and 8 max clients keeps you under your limit with space to spare – this helps ensure that our MySQL process when your server is under load.

           Maxclients an important tuning parameter regarding the performance of the Apache web server. We can calculate the value of this for a t1.micro instance

Theoretically,

MaxClients =(Total Memory – Operating System Memory – MySQL memory) / Size Per Apache process.

t1.micro have a server with 613MB of Total memory. Suppose We are using RDS instead of mysql server.

Stop apache and run

#ps aux | awk ‘{sum1 +=$4}; END {print sum1}’.

 we will get the amount of memory thats used by processes other than apache.

Suppose we get a value around 30.

from top command we can check the average memory that each apache resources use.

suppose its 60mb.

Max clients = (613 – 30 ) 60 = 9.71 ~ 10 approx …

       Micro instances are awesome, especially when cost becomes a major concern, however that they are not right for all applications. A simple website with only a few hundreds  hits a day will do just fine since it will only need CPU in short bursts.

      For Servers that serves dynamic content, better approach is to employ a reverse-proxy. This would be done this apache’s mod_proxy or Squid. The main advantages of this configurations are content caching, load balancing etc. Easy method is to use mod_proxy and the ProxyPass directive to pass content to another server. mod_proxy supports a degree of caching that can offer a significant performance boost. But another advantage is that since the proxy server and the web server are likely to have a very fast interconnect, the web server can quickly serve up large content, freeing up a apache process, why the proxy slowly feeds out the content to clients

If you are using ubuntu, you can enable module by

                                        #a2enmod proxy

                                        #a2enmod proxy_http    

and in apache2.conf

                                         ProxyPass  /  http://192.168.1.46/

                                         ProxyPassReverse  /   http://192.168.1.46/

         The ProxyPassreverse directive captures the responses from the web server and masks the URL as it would be directly responded by the Apache  hiding the identity/location of the web server. This is a good security practice, since the attacker won’t be able to know the ip of our web server.

      Caching with Apache2 is another important consideration.  We can configure apache  to set the Expires HTTP header, max-age directive of the Cache-Control HTTP header of static files ,such as images, CSS and JS files, to a date in the future so that these files will be cached by your visitors browsers. This saves bandwidth and makes web site appear faster if a user visits your site for a second time, static files will be fetched from the browser cache

                                      #a2enmod expires

  edit  /etc/apache2/sites-available/default

  <IfModule mod_expires.c>
               ExpiresActive On
               ExpiresByType image/gif “access plus 4 weeks”
               ExpiresByType image/jpg “access plus 4 weeks”

</IfModule>

This would tell browsers to cache .jpg, .gif  files for four week.

       If your server requires a large amount of read / write operations, you might consider provisioned IOPS ebs volumes on your server. This is really effective if you use database server on ec2 instances.  we can use iostat on the command line to take a look at your read/sec and write/sec. You can also use CloudWatch metrics to determine read and write operations.

       Once we move to the security side of apache, our major concern is DDos attacks. If a server is under a DDoS attack, it is quite difficult to detect the attack before the damage is done.  Attack packets usually have spoofed source IP addresses. Hence, it is more difficult to trace them back to their real source. The limit on the number of simultaneous requests that will be served by Apache is decided by the MaxClients directive, and is set to safe limit, by default. Any connection attempts over this limit will normally be queued up.

     If you want to protect your apache against DOS,  DDOS attacks use mod_evasive module.  This module is designed specifically as a remedy for Apache DoS attacks. This module will allow you to specify a maximum number of requests executed by the same IP address. If the limit is reached, the IP address is blacklisted for the time period you specify.

PHP Caching : The way to speed up PHP sites.

     There are many sites which  is built in PHP. PHP provides the power to simply ‘pull’ content from an external source.   it could just as easily be an MySQL database or an XML file etc.

    The downside to this is processing time, each request for one page can trigger multiple database queries, processing of the output, and formatting it for display. This can be quite slow on complex sites (or slower servers).  Dynamic sites probably have very little changing content, this page will almost never be updated after the day it is written. Each time someone requests it the scripts goes and fetches the content, applies various functions and filters to it, then outputs it to you

       This is where caching can help us out, instead of regenerating the page every time, the scripts running this site generate it the first time they’re asked to, then store a copy of what they send back to your browser. The next time a visitor requests the same page, the script will know it’d already generated one recently, and simply send that to the browser without all the hassle of re-running database queries or searches.

Different Caching mechanism are discussed below.

APC

      APC stands for Alternative PHP Cache, and is a free and open opcode cache for PHP. It provides a robust framework for caching and optimizing PHP performance. APC also provides a user cache for storing application data. APC for caches that do not change often and will not grow too big to avoid fragmentation. The default setting of APC will allow you to store 32 MiB for the opcode cache and the user cache combined

Installing apc on ubuntu

#apt-get install php-apc

edit  apc.ini   ; default location on new php5 is –> /etc/php5/conf.d/20-apc.ini

extension = apc.so;  uncomment this line   
apc.shm_segments=1;   ( by default its enabled .. give 0 to disable)

you can customize your values here. for getting the default values install php5-cli and from command line run

#php -i | grep apc

For monitoring apc cache hits and miss, apc providing a php script. which is located at /usr/share/doc/php-apc/apc.php. Copy this file to your document root and you will be able to monitor your apc status.

http://localhost/apc.php

for performance benchmarking we created two php files

test1.php

<?php 
         $start = microtime(true);
         for ($i = 0; $i < 500000; $i++)
             {
                include('test_include.php');
             }
         $end = microtime(true);          echo "Start: " . $start . "<br />";
         echo "End: " . $end . "<br />";
         echo "Diff: ". ($end-$start) . "<br />";
?>

test2.php

<?php
          $t =    "migrate2cloud";
 ?>

Without apc…

Start: 1360937874.8965 
End: 1360937883.1506 
Diff: 8.2541239261627

With apc..

Start: 1360937935.5746 
End: 1360937936.7291 
Diff: 1.1545231342316

 without apc it took 8 seconds to complete the request .. with apc.. 1.15 seconds..

Memcached

        Memcached system uses a client–server architecture. The servers maintain a key–value associative array; the clients populate this array and query it. Keys are up to 250 bytes long and values can be at most 1 megabyte in size. Clients use client-side libraries to contact the servers which, by default, expose their service at port 11211. Amazon provides a Service called Amazon elasticache for memcache through which we can configure memcache clusters for caching purposes.

installation and configuration

apt-get install memcached 
apt-get install php5-memcached

enable memcache module in /etc/php5/apache2/conf.d/20-memcached.ini  or in php.ini

edit php.ini 
session.save_handler = memcached 
extension=memcache.so
extension=memcached.so

Restart apache and memcache..

php script used for memcache testing..

 
<?php
//memcached simple test  
$memcache = new Memcache;
$memcache->connect('localhost', 11211) or die ("Could not connect");
$key = md5('42data');  //something unique  
for ($k=0; $k<5; $k++) {
$data = $memcache->get($key);
    if ($data == NULL) {
       $data = array();
       //generate an array of random shit  
       echo "expensive query";
       for ($i=0; $i<100; $i++) {
           for ($j=0; $j<10; $j++) {
               $data[$i][$j] = 42;  //who cares  
           }
       }
       $memcache->set($key,$data,0,3600);
    } else 
 {
       echo "cached";
    }  } 

You can monitor memcache using phpmemcacheadmin

http://code.google.com/p/phpmemcacheadmin/

Varnish – Cache

Varnish has a concept of “backend” or “origin” servers. A backend server is the server providing the content Varnish will accelerate. Our first task is to tell Varnish where it can find its content. open the varnish default configuration file. Iif you installed from a package it is probably /etc/varnish/default.vcl.

Somewhere in the top there will be a section that looks a bit like this.:

backend default { .host = "127.0.0.1"; .port = "80"; }

Change the port number to your apache ( or whatever the webserver you are using) port number.

this piece of configuration defines a backend in Varnish called default. When Varnish needs to get content from this backend it will connect to port 80 on localhost (127.0.0.1).

# varnishd -f /etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1:2000 -a 0.0.0.0:80

The -f options specifies what configuration varnishd should use.

The -s options chooses the storage type Varnish should use for storing its content

-T 127.0.0.1:2000 — Varnish has a built-in text-based administration interface

-a 0.0.0.0:80 — specify that I want Varnish to listen on port 80

For logging varnish — In terminal window you started varnish type varnishlog

When someone accessing your page you will get log like

#varnishlog
11 SessionOpen c 127.0.0.1 58912 0.0.0.0:80 
11 ReqStart c 127.0.0.1 58912 595005213 
11 RxRequest c GET 
11 RxURL c / 
11 RxProtocol c HTTP/1.1 
11 RxHeader c Host: localhost:80
11 RxHeader c Connection: keep-alive

Where not to use Caching

          Caching should not be used for some things like search results, forums etc… where the content has to be upto the times and changes depending on user’s input. It’s also advisable to avoid using this method for things like a Flash news page, in general dont use it on any page that you wouldn’t want the end users browser or proxy to cache.

DevOps on EC2 using Capistrano

DevOps is the combination of development and operation processes. Cloud with your DevOps offers some fantastic properties. The ability to leverage all the advancements made in software development around repeatability and testability with your infrastructure. The ability to scale up as need be real time and among other things being able to harness the power of self healing systems.

The process piece of devops is about taking the principles behind Agile to the entire continuous software development process. The obvious step is bringing Agile ideas to the operations team, which is sorely needed. Traditionally in the enterprise, the application development team is in charge of gathering business requirements for a software program and writing code. The development team tests their program in an isolated development environment for quality assurance which is later handed over to the operations team. The operations team is tasked with deploying and maintaining the program. The problem with this paradigm is that when the two teams work separately, the development team may not be aware of operational roadblocks that prevent the program from working as anticipated.

Capistrano

Capistrano is a developer tool for running scripts on multiple servers, mainly used for deploying web applications on to the servers. It is typically installed on a workstation, and used to deploy code from your source code management to one, or more servers. Capistrano is originally called “SwitchTower”, the name was changed to Capistrano in March 2006 because of some trademark conflict. It is a time saving command line tool and it is very useful to AWS/EC2 servers because we can deploy the code to 1000’s of aws servers by using a single command. For the security of servers we are commonly using aws ssh key authentication. In capistrano  we use this aws ssh key to deploy the web applications to the aws servers.

In Cloud Computing, deploying applications to production/live servers is always a delicate task. The whole process needs to be quick to minimize downtime. Automating the deployment process helps running repetitive tasks minimizing the possibility human error. It is also a good idea to have a proven and easy way to rollback to a previous version if something goes wrong.

It is a standalone utility that can also integrate nicely with Rails. We simply provide Capistrano with a deployment “recipe” or “formula” that describes our various servers and their roles. It is a single-command deployment. it even allows us to roll a bad version out of production and it revert back to the previous release very easily.

Capistrano Deployment

The main functionality of the Capistrano is to Deploy the rails application which we have already developed and we are using the “SVN” or “GIT” to manage the code. It will transfer all the files of our rails application which we have developed in our local host to aws servers directly by simply executing a simple command in our command prompt.

Steps to deploy a rails application

[shell]gem install capistrano[/shell]

Now,we need to capistranize our rails application using the following commands

[shell]capify .[/shell]

It will create two files

[shell]

config/deploy.rb
capfile .

[/shell]

How to set up deploy.rb file

[shell]

require ‘rubygems’
require ‘activesupport’
set :application, “<application name>”
set :scm_username/ “<username>”
set :use_sudo, false
set :repository, “http://#{scm_username}@www.example.com/svn/trunk”
set :deploy_to, “/var/www/#{application}”
set :deploy_via, :checkout
set :scm, :git
set :user, “root”
role :app, “<domain_name>”
role :web, “<domain_name>”
rold :db, “<domain_name>”, :primary => true
namespace :migrations do
desc “Run the Migrations”
task :up, :roles => :app do
run “cd #{current_path}; rake db:auto:migrate;”
end
task :down, :roles => :app do
run “cd #{current_path}; rake db:drop; rake
db:create”
end
end

[/shell]

where,

scm_username’ is your user name
application’ is an arbitrary name you create to identify your application on the server
use_sudo’ specifies to capistrano that it does not need to append ‘sudo’ before all the commands it will run
repository’ identifies where your subversion repository is located

If we aren’t deploying to server’s default path, we need to specify the actual location by using the ‘deploy_to’ variable as given below

[shell]
set :deploy_to, “/var/www/#{application}”
set :deploy_via, :checkout
[/shell]

If we are using the git to manage our source code, specify the SCM by using the ‘scm’ variable as given below

[shell]
set :scm, :git
set :user, “root”
role :app, “<domain_name>”
role :web, “<domain_name>”
rold :db, “<domain_name>”, :primary => true
[/shell]

Since most rails users will have the same domain name for their web,app and database, we can simply use our domain variable we set earlier.

[shell]
namespace :migrations do
desc “Run the Migrations”
task :up, :roles => :app do
run “cd #{current_path}; rake db:auto:migrate;”
end
task :down, :roles => :app do
run “cd #{current_path}; rake db:drop; rake
db:create”
end
end

[/shell]

After completion of our settings in the deploy.rb file, we need to commit the application by using “svn commit” command if we use svn.

Then we need to run the following command:

[shell]

cap deploy:setup

[/shell]

It is used to create the directory structure in server.

[shell]cap deploy:check[/shell]

It checks all the dependencies/things like directory permission and necessary utilities to deploy the application by using capistrano.

If everything is successful, you should see a message like:
You appear to have all necessary dependencies installed
And finally deploy the application by using the following command:

[shell]cap deploy[/shell]

Command finished successfully

To Clean up the releases directory, leaving the five most recent releases

[shell]Cap cleanup[/shell]

Prints the difference between what was last deployed, and what is currently in our repository

[shell]cap diff_from_last_deploy[/shell]

To Rolls back to the previously deployed version

[shell]cap deploy:rollback:code[/shell]

Amazon’s EC2 cloud cuts the requisition time of the order & delivery stages down to just minutes. This is already a 75% savings in deployment time! But, without automated deployment, you’ll still need a week to get your application installed.

DevOPS on AWS Cloud using Opscode Chef

Rule the Cloud‘ with Chef
Chef is Infrastructure as Code,an API for your entire infrastructure. Assuming that you are well versed with cloud if not still you should have atleast heard of cloud computing and it is still an evolving paradigm and Cloud computing companies are the newest buzz in the IT sector. Chef is used in conjunction with cloud  from cloud providers say Amazon’s AWS. If a software thats being developed is a mix of technology which is interdependent and works in perfect harmony then why not the people behind it, this thought has led to the emergence of a new cultral trend called DevOPS. Now if you setup a number of instances on the cloud then whats next – new instances on cloud are just like bare metal server and the configuration has to be done from scratch and it would be feasible to do so manually for couple of them what if the count just got bigger say 100 live instances with different unix distros, although a script could be written but still it will not suffice,  in the long run considering management too. Here the CHEF comes into play

“chef is sysadmin robot performing configuration tasks automatically and much more quickly than a single admin could ever hope to” – Jesse Robbins, Opscode CEO.

CHEF is an open source configuration management tool using pure-Ruby,the chef domain specific language for writting system configuration related stuff (recipes and cookbook)

CHEF brings a new feel with its interesting naming conventions relating to cookery like Cookbooks (they contain codes for a software package installation and configuration in the form of Recipes), Knife (API tool), Databags (act like global variables) etc

Although there are many configuration management tools prevailing in the industry CHEF was able to secure its position in the race.

“CHEF take a step farther passes puppet and cfengine — like doing “LIVE SEARCH” within  configuration management like loadbalancer can call out to get a list of the app servers you need to balance  or an applicaton server can call out, get a reference to the master database server  etc …..the centralised chef server is indexing all the information about your infrasturctre  so that you could search in the command line using knife you know in real time so that application could lever that data..” by Seth Chisamore from the OPSCODE.

A techonology peak that isnt fluffy – Cloud
For those folks new to cloud- Its a whole bunch of activites which began as an innovation, recently given out as products and now they have become so widespread and so feature complete that they became suitable for utility services.

So if you dont want cloud in your business its like saying you dont want to use the electricity instead you built your own generator and use it according to your need. Now what do we loose if we continue with that is the competitive edge ie you get the pressure to keep your stuff upgraded inorder to find your place relative to the others in the ecosystem.

Cloud is API oriented, everything you see in cloud is ulitmately programmable.

Virtualization is the foundation of Cloud but virtualization is not Cloud by itself. It certainly enables many of the things we talk about when we talk Cloud but it is not necessary sufficient to be a cloud. Google app engine is a cloud that does not incorporate virtualization. One of the reasons that virtualization is great is because you can automate the procurement of new boxes.

A Culture thats on path to revolutionize IT – DevOPS
Devops is something that orginated in webshops predominantly and it require a kind of tools thats really not available except for home grown tools which the big webshops built over and over again. So the organisation who wanted to use devops started using the tools that enable this transition as most organisations depends on web as a source of revenue in a variety of different ways, even the enterprise desire to be as agile as the webshops. This has begun a revolution from the website permeate into the enterprise base more frequently.

Considering a real life example for Devops say facebook, the most popular social networking site here the developers/QA/operations – there is alot of communications, cross talk happening between them like the developers has to write codes, QA who has to make sure the good code goes out, the operations team has to make sure its up and running. Finally all of these has to be in records which altogether seems to be inefficient, this led to the evolving of the entire system. According to the conventional practices where the developers writes the code and throws it off to the testing. Once the testing is done then it moves to the operations etc. Contrary to that the developers , operations team are all involved in the entire lifecycle of the project as a team. This creates a symbiotic relationship. Now the operations people could understand what the engineers needs the most and the developers are able to see the value that operation people brings as they make architecture decisions.

Cloud with your DevOps offers some fantastic properties. The ability to leverage all the advancements made in software development around repeatability and testability with your infrastructure. The ability to scale up as need be real time (autoscaling) and among other things being able to harness the power of self healing systems. DevOps better with Cloud.

Configuration management say CHEF is one of the most fundamental elements allowing DevOps in the cloud. It allows you to have different VMs that have just enough OS that they can be provisioned, automatically through virtualization, and then through configuration management can be assigned to a distinct purpose within the cloud. The CM system handles turning the lightly provisioned VM into the type of server that it is intended to be.

DevOps & Chef
DevOps is nonthing but a cultural movement where everybody say the developers, QA, Operations, Testing etc get along. A project group formation with a mixed skillset that blurs the line between say a developer and sysadmin. This helps the project to meet its deadlines
and avoid unexpected situations. Cloud computing act like a catalyst to this movement. Thereby the CHEF also hops in.

Chef forms a critical layer in the Devops stack.Thanks to the concept of infrastructure as code and virtualization, we can define and build our infrastructure based on text files. Those files can be version-controlled and tested like regular code. The artifact (ami, image), can then be deployed on an infrastructure. The following image gives you an overview on the similarities.

Inadvertently the issues like “what if the application” or “what if the infrasturcture” are resolved, the fact is that application is the infrastructure and infrastructure is the application and we are here to enable business, also it helped bring peoples in the team into better alignment across the board.

Chef configuration is written in pure ruby.

Devops == Ruby

For those who think Bash is enough as a scripting language – Bash becomes a liability not an asset once your script exceeds 100 lines and a total nightmare if you need to parse or output HTML, CSV, XML, JSON, etc. A significant point to be noted is that Chef uses Ruby in its recipes unlike puppet where it uses its own configuration language that is based on Ruby although chef is heavily inspired from puppet. If you chose chef then you are effectively scripting your infrastructure with ruby.

Though Chef was only released on January 15th , 2009 it has gotten rapid adoption and gained a large number of contributors. According to the Opscode wiki there are 545 approved contributors to Opscode projects and 106 companies. Beyond that the #chef IRC channel is typically attended by over 100 users and Opscode staff, signs of a healthy, growing open source community.

Springsource division of VMware have signed on to contribute to the project. They are even being very public about it as seen in this endorsement:

“We are excited about the open source contributions the Springsource Division of VMware has made to Opscode Chef.” said Javier Soltero, CTO of Springsource Management Products at VMware. “Chef is an important tool for automating infrastructure management and we look forward to its continued growth and success.”

Moreover on my experience of using chef I really enjoyed the quick response I could get from the Opscode Support Team for all my queries and they had always being able to direct me towards a solution.

Automation Using Chef to create an Instance on Amazon Cloud Service Provider with Apache webserver configured in it.

Memo
chef-workstation – is the place where we customize our cookbooks and maintains the chef-repo
chef node – is the management node that we create using chef, it configures itself based on its runlist and downloaded cookbooks

The really cool thing with Chef is that you can rerun cookbooks against a node and it will not do anything it has already done i.e it will not change the end result on the target node as defined by the recipes being run against it. So you will always get the same outcome no matter what state the node and actions will not be taken if already done (and conversely run if detected it has not been run).  When reading about Chef you will see this described as being idempotent (There I’ve saved you looking it up).

Prerequisites – an AWS account, EC2 API configured, OS – Ubuntu.

1. Sign up an account at http://www.opscode.com/hosted-chef/# , Here we use the OHC (opscode hosted chef) where we get to create upto 5 nodes for free!!

2.Verify your opscode account.

3.Download the files

Create an organization in the Console page at www.manage.opscode.com, and then download the following files:

  • Your Organization validation key. This is used to automatically register new Chef Clients (like servers you manage).
  • The Knife configuration file.
  • Your User key. This is used to authenticate your user with Hosted Chef.
  • Edit knife.rb  to add aws access key and secret access key
  • knife[:aws_access_key_id]     = “Your AWS Access Key”
  • knife[:aws_secret_access_key] = “Your AWS Secret Access Key”

At this stage I have a chef ready user environment, an OpsCode organisation set up and now I want to start by spinning up an ec2 instance. I will not be going into any depth regarding  the ec2 specifics as that would make this post far too long.

4.Setting Up chef-Workstation

Install Ruby and Development Tools

#sudo apt-get update
#sudo apt-get install ruby ruby-dev libopenssl-ruby rdoc ri irb build-essential wget ssl-cert git-core
#sudo gem update –system

Install RubyGems

#cd /tmp
#wget http://production.cf.rubygems.org/rubygems/rubygems-1.8.10.tgz
#tar zxf rubygems-1.8.10.tgz
#cd rubygems-1.8.10
#sudo ruby setup.rb –no-format-executable

Install Chef

#sudo gem install chef

5.To verify chef installation

#chef-client -v

6.Build the chef repository

#cd ~
#git clone https://github.com/opscode/chef-repo.git

Knife reads configuration files in .chef. so we need to create those as well

#mkdir -p ~/chef-repo/.chef

Copy the keys and knife configuration you downloaded earlier into this directory:

#cp USERNAME.pem ~/chef-repo/.chef
#cp ORGANIZATION-validator.pem ~/chef-repo/.chef
#cp knife.rb ~/chef-repo/.chef

Run the following command to confirm knife is working with the Hosted Chef API.

#cd ~/chef-repo
#knife client list

output : “ORGANIZATION-validator”

7.Now i need to download the apache2 cookbook on to my workstation, customize if required and then upload it to my account on the opscode platform

#knife cookbook site install apache2

this will notify git and also pulls down the desired cookbook

8.Upload the cookbook using the following command

#knife cookbook upload apache2

9.Enter the following command, sit back and  enjoy the show!!!

#knife ec2 server create -G default -I ami-1212ef7b -f m1.small -S <aws ssh key id> -i <ssh identity file> -x root -r ‘recipe[apache2]’


Before proceeding it would probably be a good idea to take time out and read the Opscode  Chef Recipe wiki which has a nice clear explanation on cookbook name spaces. Also remind yourself of the components that make up a cookbook it’s worth noting that recipes manage resources and those resources will be executed in the order they occur.