Installation of MongoDB and its performance test

Friday, February 15th, 2013 | Posted in Cloud computing

Why MongoDB?

Document-oriented
- Documents (objects) map nicely to programming language data types
- Embedded documents and arrays reduce need for joins
- Dynamically-typed (schemaless) for easy schema evolution
- No joins and no multi-document transactions for high performance and easy scalability
High performance
- No joins and embedding makes reads and writes fast
- Indexes including indexing of keys from embedded documents and arrays
- Optional streaming writes (no acknowledgements)
High availability
- Replicated servers with automatic master failover
Easy scalability
- Automatic sharding (auto-partitioning of data across servers)
- Reads and writes are distributed over shards
- No joins or multi-document transactions make distributed queries easy and fast
- Eventually-consistent reads can be distributed over replicated servers

Mongo data model

A Mongo system (see deployment above) holds a set of databases
A database holds a set of collections
A collection holds a set of documents
A document is a set of fields
A field is a key-value pair
A key is a name (string)
A value is a
- basic type like string, integer, float, timestamp, binary, etc.,
- a document, or
- an array of value
Mongo query language
To retrieve certain documents from a db collection, you supply a query document containing the fields the desired documents should match. For example, {name: {first: 'John', last: 'Doe'}} will match all documents in the collection with name of John Doe. Likewise, {name.last: 'Doe'} will match all documents with last name of Doe. Also, {name.last: /^D/} will match all documents with last name starting with ‘D’ (regular expression match).
Queries will also match inside embedded arrays. For example, {keywords: 'storage'} will match all documents with ‘storage’ in its keywords array. Likewise, {keywords: {$in: ['storage', 'DBMS']}} will match all documents with ‘storage’ or ‘DBMS’ in its keywords array.
If you have lots of documents in a collection and you want to make a query fast then build an index for that query. For example, ensureIndex({name.last: 1}) or ensureIndex({keywords: 1}). Note, indexes occupy space and slow down updates a bit, so use them only when the tradeoff is worth it.

Install MongoDB on Ubuntu 10.04

Configure Package Management System (APT)

The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Create a /etc/apt/sources.list.d/10gen.list file and include the following line for the 10gen repository.

deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen

Now issue the following command to reload your repository:

sudo apt-get update

Install Packages

Issue the following command to install the latest stable version of MongoDB:

sudo apt-get install mongodb-10gen

When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.

Configure MongoDB

These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.

This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.

Note

If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

Controlling MongoDB

Starting MongoDB

You can start the mongod process by issuing the following command:

sudo service mongodb start

You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.

Stopping MongoDB

As needed, you may stop the mongod process by issuing the following command:

sudo service mongodb stop

Restarting MongoDB

You may restart the mongod process by issuing the following command:

sudo service mongodb restart

Controlling `mongos`

As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

Using MongoDB

Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:

mongo
> show dbs (); —> To show your databases
> use <databasename> —-> To switch database
> db.createCollection(“collectionname”) —> To create collection
> db.collectionname.find(); —> To see the contents in the collection
> db.addUser(“theadmin”, “anadminpassword”) —> To create user and password

Mongodb performance test :-

To monitor database system we can use Mongotop

Mongotop tracks and reports the current read and write activity of a MongoDB instance.
Mongotop provides per-collection visibility into use.
Use mongotop to verify that activity and use match expectations.
Mongotop returns time values specified in milliseconds (ms.)
Mongotop only reports active namespaces or databases, depending on the –locks option.
If you don’t see a database or collection, it has received no recent activity.

By default mongotop connects to the MongoDB instance running on the localhost port 27017. However,mongotop can optionally connect to remote mongod instances

Next, we can use Mongostat

Mongostat captures and returns counters of database operations. Mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use Mongostat to understand the distribution of operation types and to inform capacity planning.
The Mongostat utility provides a quick overview of the status of a currently running mongod or Mongos instance. Mongostat is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and Mongos instances.

Use db.serverStatus()
It provides an overview of the database process’s state.

Then REST interface

MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default configurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017

These are a few basic tips on making your application better/faster/stronger without knowing anything about indexes or sharding.

Connecting

Connecting to the database is a (relatively) expensive operation. Try to minimize the number of times you connect and disconnect: use persistent connections or connection pooling (depending on your language).

there are some side effects with the PHP connection code.

$connection = new Mongo ( );

$connection->connect( );

In this code it appears the user wants to create a new connection. However, under the hood the following is happening:

The constructor connects to the database.
connect( ) sees that you’re already connected, assumes you want to reset the connection.
Disconnects from the database.
Connects again.

The result is that you have doubled your execution time.

ObjectIds

ObjectIds seem to be uncomfortable, so they convert their ObjectIds into strings. The problem is, an ObjectId takes up 12 bytes but its string representation takes up 29 bytes (almost two and a half times bigger).

Numbers vs. Strings

MongoDB is type-sensitive and it’s important to use the correct type: numbers for numeric values and strings for strings.

If you have large numbers and you save them as strings (“1234567890″ instead of 1234567890), MongoDB may slow down as it strcmps the entire length of the number instead of doing a quicker numeric comparison. Also, “12″ is going to be sorted as less than “9″, because MongoDB will use string, not numeric, comparison on the values. This can lead to some errors.

Driver-specific
Find out if you’re driver is particularly weaknesses (or strengths). For instance, the Perl driver is one of the fastest drivers, but it is not good at decoding Date types (Perl’s DateTime objects take a long time to create).
MongoDB adopts a documented-oriented format, so it is more similar to RDBMS than a key-value or column oriented format.

MongoDB operates on a memory base and places high performance above data scalability.Mongo DB uses BSON for data storage

Mongo uses memory mapped files, which means that a lot of the memory reported by tools such as top may not actually represent RAM usage. Check mem[“resident”], which tells you how much RAM Mongo is actually using.

“mem” : {
“resident” : 2,
“virtual” : 2396,

“supported” : true,
“mapped” : 0
},

Backup

There are basically two approaches to backing up a Mongo database:

Mongodump and Mongorestore are the classic approach. Dumps the contents of the database to files. The backup is stored in the same format as Mongo uses internally, so is very efficient. But it’s not a point-in-time snapshot.
To get a point-in-time snapshot, shut the database down, copy the disk files (e.g. with cp) and then start mongod up again. Alternatively, rather than shutting mongod down before making your point-in-time snapshot, you could just stop it from accepting writes:

> db._adminCommand({fsync: 1, lock: 1})
{
“info” : “now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock”,

“ok” : 1
}

To unlock the database again, you need to switch to the admin database and then unlock it

> use admin
switched to db admin
> db.$cmd.sys.unlock.findOne()
{ “ok” : 1, “info” : “unlock requested” }

Replication
Start your master and slave up like this:

$ mongod –master –oplogSize 500

$ mongod –slave –source localhost:27017 –port 3000 –dbpath /data/slave

When seeding a new slave server from master use the –fastsync option.

You can see what’s going on with these two commands:
> db.printReplicationInfo() # tells you how long your oplog will last
> db.printSlaveReplicationInfo() # tells you how far behind the slave is

If the slave isn’t keeping up,Check the mongo log for any recent errors. Try connecting with the mongo
console. Try running queries from the console to see if everything is working. Run the status commands
above to try and find out which database is taking up resources.
Timeout

Connection timeout in milliseconds. Defaults to 20000

Connection::query_timeout.

How many milliseconds to wait for a response from the server. Set to 30000 (30 seconds) by default. -1 waits forever (or until TCP times out, which is usually a long time).

Default pool

The default pool has a maximum of 10 connections per mongodb host. This value is controlled by the variable “connectionsPerHost” within the class

MongoDB Server Connections

The MongoDB server has a property called “maxConns” that is the max number of simultaneous connections. The
default number for maxConns is 80% of the available file descriptors for connections. One way to check the number of connections is by opening the mongo shell and executing:

>db.serverStatus() and in the previous mail I have send the screen shot of this.

The standard format of the MongoDB connection URI used to connect to a MongoDB database server.

mongodb://[username:password@]host1[:port1][,host2[:port2],…[,hostN[:portN]]][/[database][?options]]

Finding the Min and Max values in MongoDB

In MongoDB, the min() and max() functions work as limitors – essentially the same as “gte” (>=) and “lt” (<).

To find the highest (maximum) value in MongoDB, you can use this command;

db.thiscollection.find().sort({“thisfieldname”:-1}).limit(1)

This essentially sorts the data by the fieldname in decending and takes the first value.

The lowest (minimum) value can be determined in a similar way.

db.thiscollection.find().sort({“thisfieldname”:1}).limit(1)

Memory Mapped Storage Engine :-

This is the current storage engine for MongoDB, and it uses memory-mapped files for all disk I/O. Using this strategy, the operating system’s virtual memory manager is in charge of caching. This has several implications:

There is no redundancy between file system cache and database cache: they are one and the same.
MongoDB can use all free memory on the server for cache space automatically without any configuration of a cache size.
Virtual memory size and resident size will appear to be very large for the mongod process.

This is benign: virtual memory space will be just larger thanthe size of the datafiles open and mapped; resident size will vary depending on the amount of memory not used by other processes on the machine.

This command shows the memory usage information :- db.serverStatus().mem

For example :-

> db.serverStatus().mem
{
    “bits” : 64,
    “resident” : 31,
    “virtual” : 146,
    “supported” : true,
    “mapped” : 0,
    “mappedWithJournal” : 0
}

We can verify there is no memory leak in the mongod process by comparing the mem.virtual and mem.mapped values (these values are in megabytes). If you are running with journaling disabled, the difference should be relatively small compared to total RAM on the machine. If you are running with journaling enabled, compare mem.virtual to 2*mem.mapped. Also watch the delta over time; if it is increasing consistently, that could indicate a leak.

Also we can use to check what percent of memory is being used for memory mapped files by the free command:

Here 2652mb of memory is being used to memory map files

root@manager-desktop:~# free -tm

total used free shared buffers cached
Mem: 3962 3602 359 0 411 2652

-/+ buffers/cache: 538 3423

Swap: 1491 52 1439

Total: 5454 3655 1799

Garbage collection handling :-

When we remove an object from MongoDB collection, the space it occupied is not automatically garbage collected and new records are only appended to the end of data files, making them grow bigger and bigger.MongoDB maintains lists of deleted blocks within the datafiles when objects or collections are deleted. This space is reused by MongoDB but never freed to the operating system.

To shrink the amount of physical space used by the datafiles themselves, by reclaiming deleted blocks, we must rebuild the database by using the command “db.repairDatabase( )” . repairDatabase copies all the database records to new files.

We will need enough free disk space to hold both the old and new database files while the repair is running, the repairDatabase will take a long time to complete.Also rather than compacting an entire database,

you can compact just a single collection by using “db.runCommand({compact:’collectionmname;})”

This does not shrink any datafiles,however; it only defragments deleted space so that larger objects might reuse it.

The compact command will never delete or shrink database files, and in general requires extra space to do its work.

Thus, it is not a good option when you are running critically low on disk space.

Make your websites run faster, automatically — try mod_pagespeed for Apache

Tuesday, October 30th, 2012 | Posted in Cloud computing

Google just released the first stable version of mod_pagespeed, the company’sopen-source module for Apache that can automatically optimize your web pages to improve download and rendering speeds. With this release, Google is declaring this tool ready for broader adoption, though it’s worth noting that a number of large hosting providers like DreamHost, Go Daddy and content delivery network EdgeCast have already been using it in production for quite a while now.

“mod_pagespeed” speeds up your site and reduces page load time. This open-source Apache HTTP server module automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.

FEATURES:-

1. Automatic website and asset optimization

2. Latest web optimization techniques

3. 40+ configurable optimization filters

4. Free, open-source, and frequently updated

5. Deployed by individual sites, hosting providers, CDN’s

How does mod_pagespeed speed up web-sites?

“mod_pagespeed” improves web page latency and bandwidth usage by changing the resources on that web page to implement web performance best practices. Each optimization is implemented as a custom filter in mod_pagespeed, which are executed when the Apache HTTP server serves the website assets. Some filters simply alter the HTML content, and other filters change references to CSS, JavaScript, or images to point to more optimized versions.

“mod_pagespeed” implements custom optimization strategies for each type of asset referenced by the website, to make them smaller, reduce the loading time, and extend the cache lifetime of each asset. These optimizations include combining and minifying JavaScript and CSS files, inlining small resources, and others. mod_pagespeed also dynamically optimizes images by removing unused meta-data from each file, resizing the images to specified dimensions, and re-encoding images to be served in the most efficient format available to the user.

“mod_pagespeed” ships with a set of core filters designed to safely optimize the content of your site without affecting the look or behavior of your site. In addition, it provides a number of more advanced filters which can be turned on by the site owner to gain higher performance improvements.

“mod_pagespeed” can be deployed and customized for individual web sites, as well as being used by large hosting providers and CDN’s to help their users improve performance of their sites, lower the latency of their pages, and decrease bandwidth usage.

Installing mod_pagespeed on CentOS (cPanel/WHM)

root@server1# cd /usr/src
root@server1[/usr/src]# mkdir mod_pagespeed/
root@server1[/usr/src]# cd mod_pagespeed
root@server1[/usr/src/mod_pagespeed]# wget https://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_x86_64.rpm
root@server1[/usr/src/mod_pagespeed]# rpm2cpio mod-pagespeed-beta_current_x86_64.rpm | cpio -idmv
root@server1[/usr/src/mod_pagespeed]# cp usr/lib/httpd/modules/mod_pagespeed.so /usr/local/apache/modules
root@server1[/usr/src/mod_pagespeed]# chmod 755 /usr/local/apache/modules/mod_pagespeed.so
root@server1[/usr/src/mod_pagespeed]# mkdir -p /var/mod_pagespeed/{cache,files} —–> Create pagespeed directories.
root@server1[/usr/src/mod_pagespeed]# chown nobody:nobody /var/mod_pagespeed/*
root@server1[/usr/src/mod_pagespeed]# /usr/local/apache/bin/apxs -c -i /home/cpeasyapache/src/httpd-2.2.22/modules/filters/mod_deflate.c —-> Enable mod_deflate (required for mod_pagespeed)
root@server1[/usr/src/mod_pagespeed]# vim /usr/local/apache/conf/pagespeed.conf —>edit the mod_pagespeed configuration file

In this file include

1. LoadModule pagespeed_module modules/mod_pagespeed.so

2. LoadModule deflate_module modules/mod_deflate.so

3. ModPagespeedFileCachePath “/var/mod_pagespeed/cache/”

4. ModPagespeedGeneratedFilePrefix “/var/mod_pagespeed/files/”

And then open /usr/local/apache/conf/includes/pre_main_global.conf and add:

Include conf/pagespeed.conf

# Rebuild Apache config and restart apache.

/scripts/buildhttpdconf

/etc/init.d/httpd restart

Once your web server fires up, it’ll be mod_pagespeed-enabled.

You can verify it by using any web-page test tool. Here I am using Pingdom tool. I have share the screenshots of with and without mod_pagespeed module.

Website without mod_pagespeed module

Website with mod_pagespeed module

Openstack Cloud Software

Wednesday, October 17th, 2012 | Posted in Cloud computing

OpenStack : The Mission

“ To produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable.”

OpenStack is a collection of open source software projects that enterprises/service providers can use to setup and run their cloud compute and storage infrastructure.Rackspace and NASA are the key initial contributors to the stack. Rackspace contributed their “Cloud Files” platform (code) to power the Object Storage part of the OpenStack, while NASA contributed their “Nebula” platform (code) to power the Compute part. OpenStack consortium has managed to have more than 150 members including Canonical, Dell, Citrix etc.

There are 5 main service families under OpenStack

• Nova – Compute Service

• Swift – Storage Service

• Glance – Imaging Service

• Keystone – Identity Service

• Horizon – UI Service

Open Stack Compute Infrastructure (Nova)

Nova is the Computing Fabric controller for the OpenStack Cloud. All activities needed to support the life cycle of instances within the OpenStack cloud are handled by Nova. This makes Nova a Management Platform that manages compute resources, networking, authorization, and scalability needs of the OpenStack cloud. But, Nova does not provide any virtualization capabilities by itself; instead, it uses libvirt API to interact with supported hypervisors. Nova exposes all its capabilities through a web services API that is compatible with the EC2 API of Amazon Web Services.

Functions and Features:

• Instance life cycle management

• Management of compute resources

• Networking and Authorization

• REST-based API

• Asynchronous eventually consistent communication

• Hypervisor agnostic : support for Xen, XenServer/XCP, KVM, UML, VMware vSphere and Hyper-V

OpenStack Storage Infrastructure (Swift)

Swift provides a distributed, eventually consistent virtual object store for OpenStack. It is analogous to Amazon Web Services – Simple Storage Service (S3). Swift is capable of storing billions of objects distributed across nodes. Swift has built-in redundancy and fail-over management and is capable of archiving and media streaming. It is extremely scalable in terms of both size (several petabytes) and capacity (number of objects).

Functions and Features

• Storage of large number of objects

• Storage of large sized objects

• Data Redundancy

• Archival capabilities – Work with large datasets

• Data container for virtual machines and cloud apps

• Media Streaming capabilities

• Secure storage of objects

• Backup and archival

• Extreme scalability

OpenStack Imaging Service (Glance)

OpenStack Imaging Service is a lookup and retrieval system for virtual machine images. It can be configured to use any one of the following storage backends:

• Local filesystem (default)

• OpenStack Object Store to store images

• S3 storage directly

• S3 storage with Object Store as the intermediate for S3 access.

• HTTP (read-only)

Functions and Features

• Provides imaging service

OpenStack Identity Service (Keystone)

Keystone provides identity and access policy services for all components in the OpenStack family. It implements it’s own REST based API (Identity API). It provides authentication and authorization for all components of OpenStack including (but not limited to) Swift, Glance, Nova. Authentication verifies that a request actually comes from who it says it does. Authorization is verifying whether the authenticated user has access to the services he/she is requesting for.

Keystone provides two ways of authentication. One is username/password based and the other is token based. Apart from that, keystone provides the following services:

• Token Service (that carries authorization information about an authenticated user)

• Catalog Service (that contains a list of available services at the users’ disposal)

• Policy Service (that let’s keystone manage access to specific services by specific users or groups).

Openstack Administrative Web-Interface (Horizon)

Horizon the web based dashboard can be used to manage /administer OpenStack services. It can be used to manage instances and images, create keypairs, attach volumes to instances, manipulate Swift containers etc. Apart from this, dashboard even gives the user access to instance console and can connect to an instance through VNC. Overall, Horizon

Features the following:

• Instance Management – Create or terminate instance, view console logs and connect through VNC, Attaching volumes, etc.

• Access and Security Management – Create security groups, manage keypairs, assign floating IPs, etc.

• Flavor Management – Manage different flavors or instance virtual hardware templates.

• Image Management – Edit or delete images.

• View service catalog.

• Manage users, quotas and usage for projects.

• User Management – Create user, etc.

• Volume Management – Creating Volumes and snapshots.

• Object Store Manipulation – Create, delete containers and objects.

• Downloading environment variables for a project.

INSTALLATING OPEN STACK

We can install open stack ESSEX very easily using StackGeek script. Login to your box and install git with apt-get. We’ll become root and do an update first.

sudo su
apt-get update
apt-get install git

Now checkout the StackGeek scripts from Github:

git clone git://github.com/StackGeek/openstackgeek.git   
cd openstackgeek

Install the Base Scripts

Be sure to take a look at the scripts before you run them. Keep in mind the scripts will periodically prompt you for input, either for confirming installation of a package, or asking you for information for configuration.

Start the installation by running the first script:

./openstack_base_1.sh

When the script finishes you’ll see instructions for manually configuring your network. You can edit the interfaces file by doing a:

vim /etc/network/interfaces

Copy and paste the network code provided by the script into the file and then edit:

auto eth0 
iface eth0 inet static
  address 192.168.1.48		
  network 192.168.1.0		
  netmask 255.255.255.0
 broadcast 192.168.1.255
  gateway 192.168.1.124			
  dns-nameservers 8.8.8.8  
auto eth1

Change the settings for your network configuration and then restart networking and run the next script:

/etc/init.d/networking restart

Then run the second script :

./openstack_base_2.sh

After the second script finishes, you’ll need to set up a logical volume for Nova to use for creating snapshots and volumes. Nova is OpenStack’s compute controller process.

Here’s the output from the format and volume creation process:-

root@manager-System-Product-Name:/openstackgeek# fdisk /dev/sda
Device contains neither a valid DOS partition table,nor Sun,SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xb39fe7af.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p Partition number (1-4, default 1): 3  
First sector (2048-62914559, default 2048): 
 Using default value 2048 Last sector,(2048-62914559,default 62914559): 
Using default value 62914559 
Command (m for help): w The partition table has been altered! 
Calling ioctl() to re-read partition table. Syncing disks.
root@manager-System-Product-Name:/openstackgeek# pvcreate -ff /dev/sda3
 Physical volume "/dev/sda3" successfully created
root@manager-System-Product-Name:/openstackgeek# vgcreate nova-volumes /dev/sda3
 Volume group "nova-volumes" successfully created

Note: Your device names may vary.

Installing MySql

The OpenStack components use MySQL for storing state information. Start the install script for MySQL by entering the following:

./openstack_mysql.sh

You’ll be prompted for a password used for each of the components to talk to MySQL:
Enter a password to be used for the OpenStack services
to talk to MySQL (users nova, glance, keystone): redhat
Note(Here “redhat” is the password given to nova,glance,keystone)

During the installation process you will be prompted for a root password for MySQL. In our install example we use the same password, ‘redhat’. At the end of the MySQL install you’ll be prompted for your root password again.

mysql start/running, process 8796
################################################################################ 
Creating OpenStack databases and users. 
Use your database password when prompted. 
 Run './openstack_keystone.sh' when the script exits. 
################################################################################
Enter password:
After MySQL is running, you should be able to login with any of the OpenStack 
users and/or the root admin account by doing the following:

mysql -u root -predhat
mysql -u nova -predhat nova
mysql -u keystone -predhat keystone
mysql -u glance -predhat glance

Installing Keystone

Keystone is OpenStack’s identity manager. Start the install of Keystone by doing:

./openstack_keystone.sh

You’ll be prompted for a token, the password you entered for OpenStack’s services, and your email address. The email address is used to populate the user’s information in the database.

Enter a token for the OpenStack services to auth wth keystone: redhattoken 
Enter the password you used for the MySQL users (nova, glance, keystone):redhat 
Enter the email address for accounts(nova,glance,keystone):user@company.com
You should be able to query Keystone at this point. 
You’ll need to source the“stackrc” file before you talk to Keystone:
 . ./stackrc   
 keystone user-list    
 Keystone should return a list of users:
+----------------------------------+---------+------------------------+--------+
|                id                | enabled |         email          |  name  |
+----------------------------------+---------+------------------------+--------+
| b32b9017fb954eeeacb10bebf14aceb3 | True    | user@company.com       | demo   |
| bfcbaa1425ae4cd2b8ff1ddcf95c907a | True    | user@company.com       | glance |
| c1ca1604c38443f2856e3818c4ceb4d4 | True    | user@company.com       | nova   |
| dd183fe2daac436682e0550d3c339dde | True    | user@company.com       | admin  |
+----------------------------------+---------+------------------------+--------+

Installing Glance

Glance is OpenStack’s image manager. Start the install of Glance by doing:

./openstack_glance.sh

The script will download an Ubuntu 12.04 LTS cloud image from StackGeek’s S3 bucket.Once it’s done, you should be able to get a list of images:

glance index

Here’s the expected output:

ID              :- 71b8b5d5-a972-48b3-b940-98a74b85ed6a 
Name            :- Ubuntu 12.04 LTS
Disk Format     :- qcow2 
Container Format:- ovf 
Size            :- 226426880

Installing Nova

We’re almost done installing! The last component is the most important one as well. Nova is OpenStack’s compute and network manager. It’s responsible for starting instances, creating snapshots and volumes, and managing the network. Start the Nova install by doing:

./openstack_nova.sh

You’ll immediately be prompted for a few items, including your existing network interface’s IP address, the fixed network address, and the floating pool addresses:

######################################################
The IP address for eth0 is probably 192.168.1.48.
Keep in mind you need an eth1 for this to work.
######################################################
Enter the primary ethernet interface IP: 192.168.1.48
Enter the fixed network (eg. 10.0.2.32/27): 192.168.1.0/24
Enter the fixed starting IP (eg. 10.0.2.33): 192.168.1.1
############################################################################
The floating range can be a subset of your current network. 
Configure your DHCP server to block out the range before you choose it here. 
An example would be 10.0.1.224-255
############################################################################
Enter the floating network (eg. 10.0.1.224/27):  
Enter the floating netowrk size (eg. 32):

The fixed network is a set of IP addresses which will be local to the compute nodes. Think of these addresses as being held and routed internally inside any of the compute node instances.

The floating network is a pool of addresses which can be assigned to the instances you are running. For example, you could start a web server and map an external IP to it for serving a site on the Internet.

Finish Installing Nova

Nova should finish installing after you enter all the network information. When it’s done, you should be able to get a list of images from Glance via Nova:

 nova image-list

And get the expected output we saw earlier from Glance:

root@manager-System-Product-Name:/openstackgeek# nova image-list
+--------------------------------------+------------------+--------+--------+
|                  ID                  |       Name       | Status | Server |
+--------------------------------------+------------------+--------+--------+
| 71b8b5d5-a972-48b3-b940-98a74b85ed6a | Ubuntu 12.04 LTS | ACTIVE |        |
+--------------------------------------+------------------+--------+--------+

Installing Horizon

Horizon is the UI and dashboard controller for OpenStack. Install it by doing:

./openstack_horizon.sh

When it’s done installing, you’ll be given a URL to access the dashboard. 
You’ll be able to login with the user ‘admin’ 
and whatever you entered earlier for your password. 
If you’ve forgotten it, simply grep for it in your environment:

env |grep OS_PASSWORD

The URL will be : http://192.168.1.48

You can login the Openstack dashboard by the following credentials

USER : admin

PASSWORD : redhat

Managing Multiple Branches in Subversion

Friday, May 11th, 2012 | Posted in Cloud computing

Subversion (SVN) is a centralized storage for sharing information. The Subversion repository is like a time machine. It keeps all the records of every changes ever committed and allows you to explore this history by examining previous versions of files and directories as well as the metadata that accompanies them. With a single Subversion command, you can check out the repository exactly as it was at any date or revision number in the past.

At the core of the Subversion is a repository, which is the central store of that system’s data. Any number of Users connect to the repository, and then read or write to these files according to their permissions. The first Subversion tool we will use is svnadmin. This tool is for administration tasks, like creating repositories, making backup dumps, and the like.

Creating the Repository

To create a repository, open the command line, change the current directory to where you want to create it, and run svnadmin:

[shell]$ cd /usr/local
$ svnadmin create repo[/shell]

Where,
repo = repository name.You can call it whatever you like.

Now the SVN URL is like “svn://localhost:/usr/local/repo“. Subversion uses this repository to store information about your projects, like file revisions.

Importing Projects

The svn import command is a quick way to copy an unversioned tree of files into a repository, creating intermediate directories as necessary. svn import doesn’t require a working copy, and your files are immediately committed to the repository. You typically use this when you have an existing tree of files that you want to begin tracking in your Subversion repository. Each project have a recognizable project root in the repository, a directory under which all of the versioned information for that project—and only that project—lives.To create a project under our repository use the following command:

[shell]$ svn mkdir myproject1 svn://localhost:/usr/local/repo[/shell]

Most projects have a recognizable “main line”, or trunk, of development; some branches, which are divergent copies of development lines; and some tags, which are named, stable snapshots of a particular line of development.

Creating trunk,branches and tags

we suggest that each project root contain a trunk subdirectory for the main development line, a branches subdirectory in which specific branches (or collections of branches) will be created, and a tags subdirectory in which specific tags (or collections of tags) will be created.

[shell]$ svn mkdir svn://localhost:/usr/local/repo/myproject1/trunk -m “trunk created”
$ svn mkdir svn://localhost:/usr/local/repo/myproject1/branches -m “branches created”
$ svn mkdir svn://localhost:/usr/local/repo/myproject1/tags -m “tags created”[/shell]

Here “-m” option is using to add the label. Commonly the label is related to the changes or bug fixes which we made in that commit. It is very important to recognize that particular change which we made in each commit. In the above commands we are creating directories, so we gave the label as “trunk created”.

Then we can list the myproject as showing below:

[shell]$ svn list svn://localhost:/usr/local/repo/myproject1[/shell]

Output:

[shell]trunk/
branches/
tags/[/shell]

We can import our project code to the repository like:

[shell]$ svn import mycode svn://localhost:/usr/local/repo/myproject1/trunk -m “Initial Import”[/shell]

Output:

[shell]Adding mycode/file1
Adding mycode/file2
Adding mycode/file3
Adding mycode/file4

Committed revision 1.[/shell]

Checking out the code

Most of the time, you will start using a Subversion repository by performing a checkout of your project. Checking out a directory from a repository creates a working copy of that directory on your local machine. We can check out the repository by using the following command:

[shell]$ svn checkout svn://localhost:/usr/local/repo/myproject/trunk[/shell]

Output:

[shell]A trunk/file1
A trunk/file2
A trunk/file3
A trunk/file4

Checked out revision 1.[/shell]

Subversion chose to create a working copy in a directory named for the final component of the checkout URL. This occurs only as a convenience to the user when the checkout URL is the only bit of information provided to the svn checkout command. Subversion’s command-line client gives you additional flexibility, though, allowing you to optionally specify the local directory name that Subversion should use for the working copy it creates.

[shell]$ svn checkout svn://localhost:/usr/local/repo/myproject/trunk code[/shell]

Output:

[shell]A code/file1
A code/file2
A code/file3
A code/file4

Checked out revision 1.[/shell]

If there are more developers working in our project, it is better to use the branches. This allows you to save your not-yet-completed work frequently without interfering with others’ changes and while still selectively sharing information with your collaborators.

To Create branches

Creating a branch is very simple. You make a copy of your project tree in the repository using the svn copy command. Since your project’s source code is rooted in the /repo/myproject/trunk directory, it’s that directory that you’ll copy. You can give the branch name as you wish. Here we are using names of the users.

[shell]$ svn copy svn://localhost:/usr/local/repo/myproject/trunk svn://localhost:/usr/local/repo/myproject/branches/sam -m “branch 1.0.0″[/shell]

Output:

[shell] Committed revision 2[/shell]

This command causes a near-instantaneous commit in the repository, creating a new directory in revision 2. The new directory is a copy of /repo/myproject/trunk. So each user can copy the /repo/myproject/trunk as their own branch and checkout it like

[shell]$ svn checkout svn://localhost:/usr/local/repo/myproject/branches/sam mycode-sam[/shell]

Output:

[shell]A    mycode-sam/file1
A    mycode-sam/file2
A    mycode-sam/file3
A    mycode-sam/file4

Checked out revision 2.[/shell]

There’s nothing special about this working copy; it simply mirrors a different directory in the repository. When you commit changes, however, the other users won’t see them when they updates, because they are using their own working copy. Whenever someone needs to make a long-running change that is likely to disrupt the trunk, a standard procedure is to create a private branch and commit changes there until all the work is complete. Subversion gives you the ability to selectively “copy” changes between branches. And when you’re completely finished with your branch, your entire set of branch changes can be copied to the staging branch and checkout this to the staging server for testing. If the code is ready to move to the production server then merge into the production branch and then to production server. If we are doing this first time there is no staging and production branches, so we need to copy the trunk as staging and production branches. After that we have to move the changes in the user branches.

Merging

In Subversion terminology, the general act of replicating changes from one branch to another is called merging, and it is performed using the command svn merge.

cd "directory you merge into"

[shell]$ svn switch svn://localhost:/usr/local/repo/myproject/staging
$ svn up
$ svn merge -rAA:BB svn://localhost:/usr/local/repo/myproject/branches/sam
$ svn ci -m ‘Merged branch sam into staging'[/shell]

Where,

AA = the revision number of the staging branch

BB = the revision number of the user branch (sam)

Now the staging branch is updated with the user branch, and we can test this code by checking out to the staging server. For this we need to login to the staging server and checkout the code in the correct location.

[shell]$ svn co svn://localhost:/usr/local/repo/myproject/staging /path/to/code[/shell]

Now the code in the staging branch is checked out to the staging server. We have to test this and if all are fine then merge the staging branch to the production branch and then checkout it to the production server as we done with the staging. Also update the trunk with the latest code in the production by merging it with the production branch, because we need to create branches from trunk for the next release. once we have merged the trunk we can retire from the user branch.

Taging

Now the trunk is same as the branch “production” repository. Then we have to tag this for the future reference. The tag is used to save the releases which are going to the production. So we can easily track the revisions which we updated to the production.

[shell]$ svn copy svn://localhost:/usr/local/repo/myproject/trunk svn://localhost:/usr/local/repo/myproject/tags/first-release[/shell]

You can give the tag name as you wish. Here we gave the tag name as first-release. This is the flow of branching, you have to repeat these process on the future releases too. Because of using the tag we can easily track the releases. So if any issue occurs it is easy to revert the production.

Some useful commands in svn are given below

svn add – To add files and directories to the repository

svn mkdir – Create a directory in the repository

svn copy – Copy from one location to other

svn move – To rename

svn delete – Delete from the repository

svn ci – Commit the changes in the working copy

svn diff – Shows line-level details of a particular change

svn log – Shows you broad information: log messages with date and author information attached to revisions and which paths changed in each revision

svn cat – Retrieves a file as it existed in a particular revision number and displays it on your screen

svn list – Displays the files in a directory for any given revision

svn log – Examine merge-sensitive history

Subversion has commands to help you maintain parallel branches of your files and directories. It allows you to create branches by copying your data, and remembers that the copies are related to one another. The main advantage of branching is that the users can work independently with their own repository and so it will not affect the other users and also easy to maintain.

From CAP, Puppet Now Chef, Evolution of Configuration Management Tools

Tuesday, May 8th, 2012 | Posted in Cloud computing

CHEF, PUPPET & CAPISTRANO are used basically for two purposes :

Application Deployment is all of the activities that make a software system available for use.

Configuration Management is software configuration management is the task of tracking and controlling changes in the software. Configuration management practices include revision control and the establishment of baselines.

Let me enlighten on how we evolved from the beginning when we were using tools like ssh, scp to the point where we began to abstract and began to equip our-self with these sophisticated yet simple to use tools. Earlier the following tools like

ssh which is used as a configuration management solution for admins.
scp act as a secure channel for application deployment.

The need for any other tools was out of question until things got complicated!!!

HISTORY

Earlier an Application Deployment was just a few steps away such as

scp app to production box
restart server (optional)
profit

And these software refreshing/updates were done

Manual (ssh)
with shell scripts living on the servers
or not done at all

CAPISTRANO
(Introduced by Jamis Buck, written in Ruby, initially for Rails project)

Capistrano is a developer tool for deploying web applications. It is typically installed on a workstation, and used to deploy code from your source code management (SCM) to one, or more servers.In its simplest form, Capistrano allows you to copy code from your source control repository (SVN or Git) to your server via SSH, and perform pre & post-deploy functions like restarting a webserver, busting cache, renaming files, running database migrations and so on.

Nice things cap introduced :

Automate deploys with one set of files
The files don’t have to live on the production server
The language (Ruby) allows some abstraction

Now application deployment step can be coded and tested like rest of the project. It has also become the de facto way to deploy the Ruby on Rails applications. It has also had tools like webistrano build on top of it to provide a graphical interface to the command line tool.

Drawback : The tool seems to be widely used but not well supported.

PUPPET

(Written in Ruby and evolved from cfengine)

Luke Kanies came up with the idea for Puppet in 2003 after getting fed up with existing server-management software in his career as a systems administrator. In 2005 he quit his job at BladeLogic, a maker of data-center management software, and spent the next 10 months writing code to automate the dozens of steps required to set up a server with the right software, storage space, and network configurations. The result: scores of templates for different kinds of servers, which let systems administrators become, in Kanies’s metaphor, puppet masters, pulling on strings to give computers particular personalities and behaviors. He formed Puppet Labs to begin consulting for some of the thousands of companies using the software—the list includes Google, Zynga, and Twitter etc

Puppet is typically used in a client server formation, with all your clients talking to one or more servers. Each client contacts the servers periodically (every half an hour by default), downloads the latest configuration and makes sure it is sync with that configuration.

The Server in Puppet is called Puppet Master.
Puppet Manifests contains all the configuration details which are declarative as opposed to imperative.

The DSL is not Ruby as you are not writing scripts you are writing definitions, Install order is determined through dependencies.
The Puppet Master is idempotent which will make sure the client machines match the definitions.This is good as you can implement changes across machines automatically just by updating the manifest in the Puppet Master.

CHEF
(written in ruby evolved from puppet)

CHEF is an open source configuration management tool using pure-Ruby, the chef domain specific language for writing system configuration related stuff (recipes and cookbook)
CHEF brings a new feel with its interesting naming conventions relating to cookery like Cookbooks (they contain codes for a software package installation and configuration in the form of Recipes), Knife (API tool), Databags (act like global variables) etc

Chef Server – deployment scripts called Cookbooks and Recipes, configuration instructions called Nodes, security details etc. The clients in the chef infrastructure are called Nodes. Chef recipes are imperative as opposed to declarative. The DSL is extended Ruby so you can write scripts as well as definitions. Install order is script order NO dependency checking.

CHEF & PUPPET

Chef and Puppet automatically set up and tweak the operating systems and programs that run in massive data centers and the new-age “cloud” services, designed to replace massive data centers.

Chef Recipes is more programmer friendly as it is easily understood by a developer unlike a Puppet Manifest.

And when it comes to features in comparison to puppet, chef is rather more intriguing .
For example “Chef’s ability to search an environment and use that information at run time is very appealing.

Knife is Chef’s powerful command line interface. Knife allows you to interact with your entire infrastructure and Chef code base. Use knife to bootstrap a server, build the scaffolding for a new cookbook, or apply a role to a set of nodes in your environment. You can use knife ssh to execute commands on any number of nodes in your environment. knife ssh + search is a very powerful combination.

The part of defining dependencies in Puppet was overly verbose and cumbersome. With Chef, order matters and dependencies would be met if we specified them in the proper order.

We can deploy additional software applications on virtual machine instances without dealing with the overhead of doing everything manually,” Stowe explains. “We can do it with code — recipes that define how various applications and libraries are deployed and configured.” According to Stowe, creating and deploying a new software image now takes minutes or hours rather than hours or weeks. They call this technique DevOps because it applies traditional programming techniques to system administration tasks. “It’s just treating IT operations as a software development problem, – Stowe, CEO of Cycle Computing, a Greenwich, Connecticut-based start-up that uses Chef to manage the software underpinning the online “supercomputing” service it offers to big businesses and academic outfits. “Before this, there were ways of configuring servers and managing them, but DevOps has gotten it right.”

Lets CATEGORIZE

Let me help you to know onto which buckets does the above tools fell into and other similar tools…

App Deploy	Capistrano, ControlTier, Fabric, Fun, mCollective
SysConfig	Chef, Puppet, cfengine, Smart Frog, Bcfg2
Cloud/VM	Xen, Ixc, openVZ, Eucalyptus, KVM
OS Install	Kickstart, Jumpstart, Cobbler, OpenQRM, xCAT

DevOps on EC2 using Capistrano

Thursday, April 5th, 2012 | Posted in Cloud computing

DevOps is the combination of development and operation processes. Cloud with your DevOps offers some fantastic properties. The ability to leverage all the advancements made in software development around repeatability and testability with your infrastructure. The ability to scale up as need be real time and among other things being able to harness the power of self healing systems.

The process piece of devops is about taking the principles behind Agile to the entire continuous software development process. The obvious step is bringing Agile ideas to the operations team, which is sorely needed. Traditionally in the enterprise, the application development team is in charge of gathering business requirements for a software program and writing code. The development team tests their program in an isolated development environment for quality assurance which is later handed over to the operations team. The operations team is tasked with deploying and maintaining the program. The problem with this paradigm is that when the two teams work separately, the development team may not be aware of operational roadblocks that prevent the program from working as anticipated.

Capistrano

Capistrano is a developer tool for running scripts on multiple servers, mainly used for deploying web applications on to the servers. It is typically installed on a workstation, and used to deploy code from your source code management to one, or more servers. Capistrano is originally called “SwitchTower”, the name was changed to Capistrano in March 2006 because of some trademark conflict. It is a time saving command line tool and it is very useful to AWS/EC2 servers because we can deploy the code to 1000’s of aws servers by using a single command. For the security of servers we are commonly using aws ssh key authentication. In capistrano we use this aws ssh key to deploy the web applications to the aws servers.

In Cloud Computing, deploying applications to production/live servers is always a delicate task. The whole process needs to be quick to minimize downtime. Automating the deployment process helps running repetitive tasks minimizing the possibility human error. It is also a good idea to have a proven and easy way to rollback to a previous version if something goes wrong.

It is a standalone utility that can also integrate nicely with Rails. We simply provide Capistrano with a deployment “recipe” or “formula” that describes our various servers and their roles. It is a single-command deployment. it even allows us to roll a bad version out of production and it revert back to the previous release very easily.

Capistrano Deployment

The main functionality of the Capistrano is to Deploy the rails application which we have already developed and we are using the “SVN” or “GIT” to manage the code. It will transfer all the files of our rails application which we have developed in our local host to aws servers directly by simply executing a simple command in our command prompt.

Steps to deploy a rails application

[shell]gem install capistrano[/shell]

Now,we need to capistranize our rails application using the following commands

[shell]capify .[/shell]

It will create two files

[shell]

config/deploy.rb
capfile .

[/shell]

How to set up deploy.rb file

[shell]

require ‘rubygems’
require ‘activesupport’
set :application, “<application name>”
set :scm_username/ “<username>”
set :use_sudo, false
set :repository, “http://#{scm_username}@www.example.com/svn/trunk”
set :deploy_to, “/var/www/#{application}”
set :deploy_via, :checkout
set :scm, :git
set :user, “root”
role :app, “<domain_name>”
role :web, “<domain_name>”
rold :db, “<domain_name>”, :primary => true
namespace :migrations do
desc “Run the Migrations”
task :up, :roles => :app do
run “cd #{current_path}; rake db:auto:migrate;”
end
task :down, :roles => :app do
run “cd #{current_path}; rake db:drop; rake
db:create”
end
end

[/shell]

where,

‘scm_username’ is your user name
‘application’ is an arbitrary name you create to identify your application on the server
‘use_sudo’ specifies to capistrano that it does not need to append ‘sudo’ before all the commands it will run
‘repository’ identifies where your subversion repository is located

If we aren’t deploying to server’s default path, we need to specify the actual location by using the ‘deploy_to’ variable as given below

[shell]
set :deploy_to, “/var/www/#{application}”
set :deploy_via, :checkout
[/shell]

If we are using the git to manage our source code, specify the SCM by using the ‘scm’ variable as given below

[shell]
set :scm, :git
set :user, “root”
role :app, “<domain_name>”
role :web, “<domain_name>”
rold :db, “<domain_name>”, :primary => true
[/shell]

Since most rails users will have the same domain name for their web,app and database, we can simply use our domain variable we set earlier.

[shell]
namespace :migrations do
desc “Run the Migrations”
task :up, :roles => :app do
run “cd #{current_path}; rake db:auto:migrate;”
end
task :down, :roles => :app do
run “cd #{current_path}; rake db:drop; rake
db:create”
end
end

[/shell]

After completion of our settings in the deploy.rb file, we need to commit the application by using “svn commit” command if we use svn.

Then we need to run the following command:

[shell]

cap deploy:setup

[/shell]

It is used to create the directory structure in server.

[shell]cap deploy:check[/shell]

It checks all the dependencies/things like directory permission and necessary utilities to deploy the application by using capistrano.

If everything is successful, you should see a message like:
“You appear to have all necessary dependencies installed”
And finally deploy the application by using the following command:

[shell]cap deploy[/shell]

Command finished successfully

To Clean up the releases directory, leaving the five most recent releases

[shell]Cap cleanup[/shell]

Prints the difference between what was last deployed, and what is currently in our repository

[shell]cap diff_from_last_deploy[/shell]

To Rolls back to the previously deployed version

[shell]cap deploy:rollback:code[/shell]

Amazon’s EC2 cloud cuts the requisition time of the order & delivery stages down to just minutes. This is already a 75% savings in deployment time! But, without automated deployment, you’ll still need a week to get your application installed.

DevOPS on AWS Cloud using Opscode Chef

Friday, January 13th, 2012 | Posted in Cloud computing

‘Rule the Cloud‘ with Chef
Chef is Infrastructure as Code,an API for your entire infrastructure. Assuming that you are well versed with cloud if not still you should have atleast heard of cloud computing and it is still an evolving paradigm and Cloud computing companies are the newest buzz in the IT sector. Chef is used in conjunction with cloud from cloud providers say Amazon’s AWS. If a software thats being developed is a mix of technology which is interdependent and works in perfect harmony then why not the people behind it, this thought has led to the emergence of a new cultral trend called DevOPS. Now if you setup a number of instances on the cloud then whats next – new instances on cloud are just like bare metal server and the configuration has to be done from scratch and it would be feasible to do so manually for couple of them what if the count just got bigger say 100 live instances with different unix distros, although a script could be written but still it will not suffice, in the long run considering management too. Here the CHEF comes into play

“chef is sysadmin robot performing configuration tasks automatically and much more quickly than a single admin could ever hope to” – Jesse Robbins, Opscode CEO.

CHEF is an open source configuration management tool using pure-Ruby,the chef domain specific language for writting system configuration related stuff (recipes and cookbook)

CHEF brings a new feel with its interesting naming conventions relating to cookery like Cookbooks (they contain codes for a software package installation and configuration in the form of Recipes), Knife (API tool), Databags (act like global variables) etc

Although there are many configuration management tools prevailing in the industry CHEF was able to secure its position in the race.

“CHEF take a step farther passes puppet and cfengine — like doing “LIVE SEARCH” within configuration management like loadbalancer can call out to get a list of the app servers you need to balance or an applicaton server can call out, get a reference to the master database server etc …..the centralised chef server is indexing all the information about your infrasturctre so that you could search in the command line using knife you know in real time so that application could lever that data..” by Seth Chisamore from the OPSCODE.

A techonology peak that isnt fluffy – Cloud
For those folks new to cloud- Its a whole bunch of activites which began as an innovation, recently given out as products and now they have become so widespread and so feature complete that they became suitable for utility services.

So if you dont want cloud in your business its like saying you dont want to use the electricity instead you built your own generator and use it according to your need. Now what do we loose if we continue with that is the competitive edge ie you get the pressure to keep your stuff upgraded inorder to find your place relative to the others in the ecosystem.

Cloud is API oriented, everything you see in cloud is ulitmately programmable.

Virtualization is the foundation of Cloud but virtualization is not Cloud by itself. It certainly enables many of the things we talk about when we talk Cloud but it is not necessary sufficient to be a cloud. Google app engine is a cloud that does not incorporate virtualization. One of the reasons that virtualization is great is because you can automate the procurement of new boxes.

A Culture thats on path to revolutionize IT – DevOPS
Devops is something that orginated in webshops predominantly and it require a kind of tools thats really not available except for home grown tools which the big webshops built over and over again. So the organisation who wanted to use devops started using the tools that enable this transition as most organisations depends on web as a source of revenue in a variety of different ways, even the enterprise desire to be as agile as the webshops. This has begun a revolution from the website permeate into the enterprise base more frequently.

Considering a real life example for Devops say facebook, the most popular social networking site here the developers/QA/operations – there is alot of communications, cross talk happening between them like the developers has to write codes, QA who has to make sure the good code goes out, the operations team has to make sure its up and running. Finally all of these has to be in records which altogether seems to be inefficient, this led to the evolving of the entire system. According to the conventional practices where the developers writes the code and throws it off to the testing. Once the testing is done then it moves to the operations etc. Contrary to that the developers , operations team are all involved in the entire lifecycle of the project as a team. This creates a symbiotic relationship. Now the operations people could understand what the engineers needs the most and the developers are able to see the value that operation people brings as they make architecture decisions.

Cloud with your DevOps offers some fantastic properties. The ability to leverage all the advancements made in software development around repeatability and testability with your infrastructure. The ability to scale up as need be real time (autoscaling) and among other things being able to harness the power of self healing systems. DevOps better with Cloud.

Configuration management say CHEF is one of the most fundamental elements allowing DevOps in the cloud. It allows you to have different VMs that have just enough OS that they can be provisioned, automatically through virtualization, and then through configuration management can be assigned to a distinct purpose within the cloud. The CM system handles turning the lightly provisioned VM into the type of server that it is intended to be.

DevOps & Chef
DevOps is nonthing but a cultural movement where everybody say the developers, QA, Operations, Testing etc get along. A project group formation with a mixed skillset that blurs the line between say a developer and sysadmin. This helps the project to meet its deadlines
and avoid unexpected situations. Cloud computing act like a catalyst to this movement. Thereby the CHEF also hops in.

Chef forms a critical layer in the Devops stack.Thanks to the concept of infrastructure as code and virtualization, we can define and build our infrastructure based on text files. Those files can be version-controlled and tested like regular code. The artifact (ami, image), can then be deployed on an infrastructure. The following image gives you an overview on the similarities.

Inadvertently the issues like “what if the application” or “what if the infrasturcture” are resolved, the fact is that application is the infrastructure and infrastructure is the application and we are here to enable business, also it helped bring peoples in the team into better alignment across the board.

Chef configuration is written in pure ruby.

Devops == Ruby

For those who think Bash is enough as a scripting language – Bash becomes a liability not an asset once your script exceeds 100 lines and a total nightmare if you need to parse or output HTML, CSV, XML, JSON, etc. A significant point to be noted is that Chef uses Ruby in its recipes unlike puppet where it uses its own configuration language that is based on Ruby although chef is heavily inspired from puppet. If you chose chef then you are effectively scripting your infrastructure with ruby.

Though Chef was only released on January 15th , 2009 it has gotten rapid adoption and gained a large number of contributors. According to the Opscode wiki there are 545 approved contributors to Opscode projects and 106 companies. Beyond that the #chef IRC channel is typically attended by over 100 users and Opscode staff, signs of a healthy, growing open source community.

Springsource division of VMware have signed on to contribute to the project. They are even being very public about it as seen in this endorsement:

“We are excited about the open source contributions the Springsource Division of VMware has made to Opscode Chef.” said Javier Soltero, CTO of Springsource Management Products at VMware. “Chef is an important tool for automating infrastructure management and we look forward to its continued growth and success.”

Moreover on my experience of using chef I really enjoyed the quick response I could get from the Opscode Support Team for all my queries and they had always being able to direct me towards a solution.

Automation Using Chef to create an Instance on Amazon Cloud Service Provider with Apache webserver configured in it.

Memo
chef-workstation – is the place where we customize our cookbooks and maintains the chef-repo
chef node – is the management node that we create using chef, it configures itself based on its runlist and downloaded cookbooks

The really cool thing with Chef is that you can rerun cookbooks against a node and it will not do anything it has already done i.e it will not change the end result on the target node as defined by the recipes being run against it. So you will always get the same outcome no matter what state the node and actions will not be taken if already done (and conversely run if detected it has not been run). When reading about Chef you will see this described as being idempotent (There I’ve saved you looking it up).

Prerequisites – an AWS account, EC2 API configured, OS – Ubuntu.

1. Sign up an account at http://www.opscode.com/hosted-chef/# , Here we use the OHC (opscode hosted chef) where we get to create upto 5 nodes for free!!

2.Verify your opscode account.

3.Download the files

Create an organization in the Console page at www.manage.opscode.com, and then download the following files:

Your Organization validation key. This is used to automatically register new Chef Clients (like servers you manage).
The Knife configuration file.
Your User key. This is used to authenticate your user with Hosted Chef.
Edit knife.rb to add aws access key and secret access key
knife[:aws_access_key_id] = “Your AWS Access Key”
knife[:aws_secret_access_key] = “Your AWS Secret Access Key”

At this stage I have a chef ready user environment, an OpsCode organisation set up and now I want to start by spinning up an ec2 instance. I will not be going into any depth regarding the ec2 specifics as that would make this post far too long.

4.Setting Up chef-Workstation

Install Ruby and Development Tools

#sudo apt-get update
#sudo apt-get install ruby ruby-dev libopenssl-ruby rdoc ri irb build-essential wget ssl-cert git-core
#sudo gem update –system

Install RubyGems

#cd /tmp
#wget http://production.cf.rubygems.org/rubygems/rubygems-1.8.10.tgz
#tar zxf rubygems-1.8.10.tgz
#cd rubygems-1.8.10
#sudo ruby setup.rb –no-format-executable

Install Chef

#sudo gem install chef

5.To verify chef installation

#chef-client -v

6.Build the chef repository

#cd ~
#git clone https://github.com/opscode/chef-repo.git

Knife reads configuration files in .chef. so we need to create those as well

#mkdir -p ~/chef-repo/.chef

Copy the keys and knife configuration you downloaded earlier into this directory:

#cp USERNAME.pem ~/chef-repo/.chef
#cp ORGANIZATION-validator.pem ~/chef-repo/.chef
#cp knife.rb ~/chef-repo/.chef

Run the following command to confirm knife is working with the Hosted Chef API.

#cd ~/chef-repo
#knife client list

output : “ORGANIZATION-validator”

7.Now i need to download the apache2 cookbook on to my workstation, customize if required and then upload it to my account on the opscode platform

#knife cookbook site install apache2

this will notify git and also pulls down the desired cookbook

8.Upload the cookbook using the following command

#knife cookbook upload apache2

9.Enter the following command, sit back and enjoy the show!!!

#knife ec2 server create -G default -I ami-1212ef7b -f m1.small -S <aws ssh key id> -i <ssh identity file> -x root -r ‘recipe[apache2]’

Before proceeding it would probably be a good idea to take time out and read the Opscode Chef Recipe wiki which has a nice clear explanation on cookbook name spaces. Also remind yourself of the components that make up a cookbook it’s worth noting that recipes manage resources and those resources will be executed in the order they occur.

Migrate to Cloud

Installation of MongoDB and its performance test

Why MongoDB?

Mongo data model

Install MongoDB on Ubuntu 10.04

Configure Package Management System (APT)

Install Packages

Configure MongoDB

Controlling MongoDB

Starting MongoDB

Stopping MongoDB

Restarting MongoDB

Controlling `mongos`

Mongodb performance test :-

Managing Multiple Branches in Subversion

Creating the Repository

Importing Projects

Creating trunk,branches and tags

Checking out the code

To Create branches

Merging

Taging

Case studies

Cloud based Voip solution enjoying 50% increase in performance

PostgreSQL migration to Amazon EC2

Cpanel migration to the cloud

Categories

Archives

Recent posts from the blog

Why MongoDB?

Mongo data model

Install MongoDB on Ubuntu 10.04

Configure Package Management System (APT)

Install Packages

Configure MongoDB

Controlling MongoDB

Starting MongoDB

Stopping MongoDB

Restarting MongoDB

Controlling mongos

Mongodb performance test :-

How does mod_pagespeed speed up web-sites?

OpenStack : The Mission

There are 5 main service families under OpenStack

Open Stack Compute Infrastructure (Nova)

Functions and Features:

OpenStack Storage Infrastructure (Swift)

Functions and Features

OpenStack Imaging Service (Glance)

Functions and Features

OpenStack Identity Service (Keystone)

Openstack Administrative Web-Interface (Horizon)

Features the following:

INSTALLATING OPEN STACK

Install the Base Scripts

Installing MySql

Installing Keystone

Installing Glance

Installing Nova

Finish Installing Nova

Installing Horizon

Creating the Repository

Importing Projects

Creating trunk,branches and tags

Checking out the code

To Create branches

Merging

Taging

Case studies

Categories

Archives

Controlling `mongos`