• Call: +1 (858) 429-9131

Posts Tagged ‘Server & Database Software’

It’s Phab! That makes your life easier

We have been using plenty of different tools for tracking bugs/product management/project management/to do lists/code review; such as ClearCase, ClearQuest, Bugzilla, Github, Asana, Pivotal Tracker, Google Drive etc. We found Phabricator as a “Too Good To Be True” software engineering web application platform originally developed at Facebook. It has code review, wiki, repository browsing,tickets and a lot more to make Phab more fabulous.

Phabricator is an open source collaboration of web applications which help software companies to build better software. It is a suite of applications. Following are the most important tools in phabricator :
Maniphest – Bug tracker/task management tracker
Diffusion- source code browser
Differential – code review tool that allows developers to easily submit reviews to one another via command line tool when they check in code using Git or Subversion
Phriction – wiki tool

How to setup and configure the code review and project management tool – Phabricator

Installation

Server – 4GB Digital ocean droplet
OS – Ubuntu 14.04

1. Install dependencies

apt-get install mysql-server apache2 dpkg-dev php5 php5-mysql php5-gd php5-dev php5-curl php-apc php5-cli php5-json

2. Get code

#cd /var/www/codereview

git clone https://github.com/phacility/libphutil.git

git clone https://github.com/phacility/arcanist.git

git clone https://github.com/phacility/arcanist.git

3. Configure virtual host entry

#add below lines

#######################################################################

DocumentRoot /var/www/codereview/webroot
RewriteEngine on
RewriteRule ^/rsrc/(.*) – [L,QSA]
RewriteRule ^/favicon.ico – [L,QSA]
RewriteRule ^(.*)$ /index.php?__path__=$1 [B,L,QSA]
Order allow,deny
allow from all
#######################################################################
4. Enable the virtual host entry for phabricator.

# a2ensite phabricator.conf
# service apache2 reload

5. Configure the MySQL database configuration for phabricator

– create database
# /var/www/codereview/phabricator/bin/config set mysql.user mysql_username
# /var/www/codereview/phabricator/bin/config get mysql.pass mysql_password
# /var/www/codereview/phabricator/bin/config get mysql.host mysql_host
# /var/www/codereview/phabricator/bin/config storage upgrade
-tweak mysql

Open /etc/mysql/my.cnf and add the following line under [mysqld] section:

sql-mode = STRICT_ALL_TABLES

#service mysql restart

Set the Base URI of Phabricator install

# /var/www/codereview/phabricator/bin/config set phabricator.base-uri

(eg: phabricator.your-domain.com)

Configure Outbound Email – External SMTP (Google Apps)

Set the following configuration keys using /var/www/codereview/phabricator/bin/config set value

– metamta.mail-adapter -> PhabricatorMailImplementationPHPMailerAdapter
– phpmailer.mailer -> smtp
– phpmailer.smtp-host -> smtp.gmail.com
– phpmailer.smtp-port -> 465
– phpmailer.smtp-user -> Your Google apps mail id
– phpmailer.smtp-password -> set to your password used for authentication
– phpmailer.smtp-protocol -> ssl

Start the phabricator daemons

You can start all the phabricator deamons using the script
# /var/www/codereview/phabricator/bin/phd start
To start daemons at the boot time, add this entry to the file /etc/rc.local

/var/www/codereview/phabricator/bin/phd start

Diffusion repository hosting with git

1. Install git

#apt-get install git

2. Create a local repository directory:

#mkdir -p /data/repo

3. Edit the repository.default-local-path key to the new local repository directory.

Go to the Config -> Repositories -> repository.default-local-path

4. Configure System user accounts

Phabricator uses as many as three user accounts. These are system user accounts on the machine Phabricator runs on, not Phabricator user accounts.

* daemon-user – The user the daemons run as

We will configure the root user to run the daemons

* www-user – The user the web server run as

We will use www-data to be the web user

* vcs-user – The user that users will connect over SSH as

We will configure git user to the vcs-user

To enable SSH access to repositories, edit /etc/sudoers file using visudo to contain:

#includedir /etc/sudoers.d
git ALL=(root) SETENV: NOPASSWD: /usr/bin/git-upload-pack, /usr/bin/git-receive-pack, /usr/bin/git

Since we are going to enable SSH access to the repository, ensure the following holds good.

– Open /etc/shadow and find the line for vcs-user, git.

The second field (which is the password field) must not be set to !!. This value will prevent login. If it is set to !!, edit it and set it to NP (“no password”) instead.

– Open /etc/passwd and find the line for the vcs-user, git.
The last field (which is the login shell) must be set to a real shell. If it is set to something like /bin/false, then sshd will not be able to execute commands. Instead, you should set it to a real shell, like /bin/sh.

– Use phd.user as our daemon user;
# /var/www/phab/phabricator/bin/config phd.user root
# /var/www/phab/phabricator/bin/config set diffusion.ssh-user git

5. Configuring SSH

We will move the normal sshd daemon to another port, say 222. We will use this port to get a normal login shell. We will run highly restrictive sshd on port 22 managed by Phabricator.

Move Normal SSHD

– make a backup of sshd_config before making any changes.

#cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup

– Update /etc/ssh/sshd_config, change the port to some othert port like 222.

Port 222

– Restart sshd and verify that you are able to connect to the new port

ssh -p 222 user@host

Configure and start Phabricator SSHD

We now configure and start a second SSHD instance which will run on port 22. This instance will use special locked down configuration that uses Phabricator to handle the authentication and command execution.

– Create a phabricator-ssh-hook.sh file

– Create a sshd_phabricator config file

– Start a copy of sshd using the new configuration

Create phabricator-ssh-hook.sh: Copy the template in phabricator/resources/sshd/ phabricator-ssh-hook.sh to somewhere like /usr/lib/phabricator-ssh-hook.sh and edit it to have the correct settings

##############################################################

#!/bin/sh

# NOTE: Replace this with the username that you expect users to connect with.
VCSUSER=”git”

# NOTE: Replace this with the path to your Phabricator directory.
ROOT=”/var/www/codereview/phabricator”

if [ “$1” != “$VCSUSER” ];
then
exit 1
fi

exec “$ROOT/bin/ssh-auth” $@
##############################################################

Make it owned by root and restrict editing;

#sudo chown root /usr/lib/phabricator-ssh-hook.sh
#chmod 755 /usr/lib/phabricator-ssh-hook.sh

Create sshd_config for Phabricator: Copy the template in /phabricator/sshd/sshd_config.phabricator.example to somewhere like /etc/ssh/sshd_config.phabricator

Start Phabricator SSHD

#sudo /usr/sbin/sshd -f /etc/ssh/sshd_config.phabricator

Note:-
Add this entry to the /etc/rc.local to start the daemon on startup.

If you did everything correctly, you should be able to run this;

#echo {} | ssh git@phabricator.your-company.com conduit conduit.ping

and get a response like this;

{“result”:”phab-server”,”error_code”:null,”error_info”:null}

You should now be able to access your instance over ssh on port 222 for normal login and administrative purposes. Phabricator SSHD runs on port 22 to handle authentication and command execution.

6. To create a git repository

Go to Diffusion -> New Repository -> Create a New Hosted Repository

Upgrade Phabricator

Since phabricator is under development, you should update frequently. To update phabricator:

– Stop the web server
– Run git pull in libphutil/, arcanist/, and phabricator.
– Run phabricator/bin/storage upgrade.
– Restart the web server.
Also you can use a script similar to this one to automate the process:
http://www.phabricator.com/rsrc/install/update_phabricator.sh

Installation of MongoDB and its performance test

Why MongoDB?

  • Document-oriented
    • Documents (objects) map nicely to programming language data types
    • Embedded documents and arrays reduce need for joins
    • Dynamically-typed (schemaless) for easy schema evolution
    • No joins and no multi-document transactions for high performance and easy scalability
  • High performance
    • No joins and embedding makes reads and writes fast
    • Indexes including indexing of keys from embedded documents and arrays
    • Optional streaming writes (no acknowledgements)
  • High availability
    • Replicated servers with automatic master failover
  • Easy scalability
    • Automatic sharding (auto-partitioning of data across servers)
    • Reads and writes are distributed over shards
    • No joins or multi-document transactions make distributed queries easy and fast
    • Eventually-consistent reads can be distributed over replicated servers

Mongo data model

  • A Mongo system (see deployment above) holds a set of databases
  • A database holds a set of collections
  • A collection holds a set of documents
  • A document is a set of fields
  • A field is a key-value pair
  • A key is a name (string)
  • A value is a
    • basic type like string, integer, float, timestamp, binary, etc.,
    • a document, or
    • an array of value

    Mongo query language

  • To retrieve certain documents from a db collection, you supply a query document containing the fields the desired documents should match. For example, {name: {first: 'John', last: 'Doe'}} will match all documents in the collection with name of John Doe. Likewise, {name.last: 'Doe'} will match all documents with last name of Doe. Also, {name.last: /^D/} will match all documents with last name starting with ‘D’ (regular expression match).
  • Queries will also match inside embedded arrays. For example, {keywords: 'storage'} will match all documents with ‘storage’ in its keywords array. Likewise, {keywords: {$in: ['storage', 'DBMS']}} will match all documents with ‘storage’ or ‘DBMS’ in its keywords array.
  • If you have lots of documents in a collection and you want to make a query fast then build an index for that query. For example, ensureIndex({name.last: 1}) or ensureIndex({keywords: 1}). Note, indexes occupy space and slow down updates a bit, so use them only when the tradeoff is worth it.

Install MongoDB on Ubuntu 10.04

Configure Package Management System (APT)

The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Create a /etc/apt/sources.list.d/10gen.list file and include the following line for the 10gen repository.

deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen

Now issue the following command to reload your repository:

sudo apt-get update

Install Packages

Issue the following command to install the latest stable version of MongoDB:

sudo apt-get install mongodb-10gen

When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.

Configure MongoDB

These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.

This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.

Note

If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

Controlling MongoDB

Starting MongoDB

You can start the mongod process by issuing the following command:

sudo service mongodb start

You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.

Stopping MongoDB

As needed, you may stop the mongod process by issuing the following command:

sudo service mongodb stop

Restarting MongoDB

You may restart the mongod process by issuing the following command:

sudo service mongodb restart

Controlling mongos

As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

Using MongoDB

Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:

mongo
> show dbs (); —> To show your databases
> use <databasename> —-> To switch database
> db.createCollection(“collectionname”) —> To create collection
> db.collectionname.find(); —> To see the contents in the collection
> db.addUser(“theadmin”, “anadminpassword”) —> To create user and password

Mongodb performance test :-

To monitor database system we can use Mongotop

Mongotop tracks and reports the current read and write activity of a MongoDB instance.
Mongotop provides per-collection visibility into use.
Use mongotop to verify that activity and use match expectations.
Mongotop returns time values specified in milliseconds (ms.)
Mongotop only reports active namespaces or databases, depending on the –locks option.
If you don’t see a database or collection, it has received no recent activity.

By default mongotop connects to the MongoDB instance running on the localhost port 27017. However,mongotop can optionally connect to remote mongod instances

Next, we can use Mongostat

Mongostat captures and returns counters of database operations. Mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use  Mongostat to understand the distribution of operation types and to inform capacity planning.
The Mongostat utility provides a quick overview of the status of a currently running mongod or Mongos instance. Mongostat is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and Mongos instances.

Use  db.serverStatus()
It provides an overview of the database process’s state.

Then REST interface

MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default configurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017

These are a few basic tips on making your application better/faster/stronger without knowing anything about indexes or sharding.

Connecting

Connecting to the database is a (relatively) expensive operation. Try to minimize the number of times you connect and disconnect: use persistent connections or connection pooling (depending on your language).

there are some  side effects with the PHP connection code.

$connection = new Mongo ( );

$connection->connect( );

In this code it appears the user wants to create a new connection. However, under the hood the following is happening:

The constructor connects to the database.
connect( ) sees that you’re already connected, assumes you want to reset the connection.
Disconnects from the database.
Connects again.

The result is that you have doubled your execution time.

ObjectIds

ObjectIds seem to be uncomfortable, so they convert their ObjectIds into strings. The problem is, an ObjectId takes up 12 bytes but its string representation takes up 29 bytes (almost two and a half times bigger).

Numbers vs. Strings

MongoDB is type-sensitive and it’s important to use the correct type: numbers for numeric values and strings for strings.

If you have large numbers and you save them as strings (“1234567890″ instead of 1234567890), MongoDB may slow down as it strcmps the entire length of the number instead of doing a quicker numeric comparison. Also, “12″ is going to be sorted as less than “9″, because MongoDB will use string, not numeric, comparison on the values. This can lead to some errors.

Driver-specific
Find out if you’re driver is particularly weaknesses (or strengths). For instance, the Perl driver is one of the fastest drivers, but it is not good at decoding Date types (Perl’s DateTime objects take a long time to create).
MongoDB adopts a documented-oriented format, so it is more similar to RDBMS than a key-value or column oriented format.

MongoDB operates on a memory base and places high performance above data scalability.Mongo DB uses BSON for data storage

Mongo uses memory mapped files, which means that a lot of the memory reported by tools such as top may not actually represent RAM usage. Check mem[“resident”], which tells you how much RAM Mongo is actually using.

“mem” : {
    “resident” : 2,
    “virtual” : 2396,

    “supported” : true,
    “mapped” : 0
},

Backup

There are basically two approaches to backing up a Mongo database:

Mongodump and Mongorestore are the classic approach. Dumps the contents of the database to files. The backup is stored in the same format as Mongo uses internally, so is very efficient. But it’s not a point-in-time snapshot.
To get a point-in-time snapshot, shut the database down, copy the disk files (e.g. with cp) and then start mongod up again. Alternatively, rather than shutting mongod down before making your point-in-time snapshot, you could just stop it from accepting writes:

> db._adminCommand({fsync: 1, lock: 1})
{
        “info” : “now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock”,

        “ok” : 1
}

To unlock the database again, you need to switch to the admin database and then unlock it

> use admin
switched to db admin
> db.$cmd.sys.unlock.findOne()
{ “ok” : 1, “info” : “unlock requested” }

Replication
Start your master and slave up like this:

$ mongod –master –oplogSize 500

$ mongod –slave –source localhost:27017 –port 3000 –dbpath /data/slave

When seeding a new slave server from master use the –fastsync option.

You can see what’s going on with these two commands:
> db.printReplicationInfo() # tells you how long your oplog will last
> db.printSlaveReplicationInfo() # tells you how far behind the slave is

If the slave isn’t keeping up,Check the mongo log for any recent errors. Try connecting with the mongo
console. Try running queries from the console to see if everything is working. Run the status commands
above to try and find out which database is taking up resources.
Timeout

Connection timeout in milliseconds. Defaults to 20000

Connection::query_timeout.

How many milliseconds to wait for a response from the server. Set to 30000 (30 seconds) by default. -1 waits forever (or until TCP times out, which is usually a long time).

Default pool

The default pool has a maximum of 10 connections per mongodb host. This value is controlled by the variable  “connectionsPerHost” within the class

MongoDB Server Connections

The MongoDB server has a property called “maxConns” that  is the max number of simultaneous connections. The
default number for maxConns is 80% of the available file descriptors for connections. One way to check the number of connections is by opening the mongo shell and executing:

>db.serverStatus() and in the previous mail I have send the screen shot of this.

The standard format of the MongoDB connection URI used to connect to a MongoDB database server.

mongodb://[username:password@]host1[:port1][,host2[:port2],…[,hostN[:portN]]][/[database][?options]]

Finding the Min and Max values in MongoDB

In MongoDB, the min() and max() functions work as limitors – essentially the same as “gte” (>=) and “lt” (<).

To find the highest (maximum) value in MongoDB, you can use this command;

db.thiscollection.find().sort({“thisfieldname”:-1}).limit(1)

This essentially sorts the data by the fieldname in decending and takes the first value.

The lowest (minimum) value can be determined in a similar way.

    db.thiscollection.find().sort({“thisfieldname”:1}).limit(1)

Memory Mapped Storage Engine :-

This is the current storage engine for MongoDB, and it uses memory-mapped files for all disk I/O.  Using this strategy, the operating system’s virtual memory manager is in charge of caching.  This has several implications:

There is no redundancy between file system cache and database cache: they are one and the same.
MongoDB can use all free memory on the server for cache space automatically without any configuration of a cache size.
Virtual memory size and resident size will appear to be very large for the mongod process.

This is benign: virtual memory space will be just larger thanthe size of the datafiles open and mapped; resident size will vary depending on the amount of memory not used by other processes on the machine.

This command shows the memory usage information :- db.serverStatus().mem

For example :-

> db.serverStatus().mem
{
    “bits” : 64,
    “resident” : 31,
    “virtual” : 146,
    “supported” : true,
    “mapped” : 0,
    “mappedWithJournal” : 0
}

We can verify there is no memory leak in the mongod process by comparing the mem.virtual and mem.mapped values (these values are in megabytes).  If you are running with journaling disabled, the difference should be relatively small compared to total RAM on the machine. If you are running with journaling enabled, compare mem.virtual to 2*mem.mapped.   Also watch the delta over time; if it is increasing consistently, that could indicate a leak.

Also we can use to check what percent of memory is being used for memory mapped files by the free command:

Here 2652mb of memory is being used to memory map files

root@manager-desktop:~# free -tm

             total       used       free     shared    buffers     cached
Mem:          3962       3602        359          0        411       2652

-/+ buffers/cache:        538       3423

Swap:        1491        52       1439

Total:        5454       3655   1799

Garbage collection handling :-

When we remove an object from MongoDB collection, the space it occupied is not automatically garbage collected and new records are only appended to the end of data files, making them grow bigger and bigger.MongoDB maintains lists of deleted blocks within the datafiles when objects or collections are deleted.  This space is reused by MongoDB but never freed to the operating system.

To shrink the amount of physical space used by the datafiles themselves, by reclaiming deleted blocks, we must rebuild the database by using  the command “db.repairDatabase( )” . repairDatabase copies all the database records to new files.

We will need enough free disk space to hold both the old and new database files while the repair is running, the repairDatabase  will take a long time to complete.Also rather than compacting an entire database,

you can compact just a single collection by using  “db.runCommand({compact:’collectionmname;})

This does not shrink any datafiles,however; it only defragments deleted space so that larger objects might reuse it.

The compact command will never delete or shrink database files, and in general requires extra space to do its work.

Thus, it is not a good option when you are running critically low on disk space.

Openstack Cloud Software

OpenStack : The Mission

“ To produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable.”

OpenStack is a collection of open source software projects that enterprises/service providers can use to setup and run their cloud compute and storage infrastructure.Rackspace and NASA are the key initial contributors to the stack. Rackspace contributed their “Cloud Files” platform (code) to power the Object Storage part of the OpenStack, while NASA contributed their “Nebula” platform (code) to power the Compute part. OpenStack consortium has managed to have more than 150 members including Canonical, Dell, Citrix etc.

There are 5 main service families under OpenStack

Nova         –   Compute Service

Swift         –    Storage Service

Glance      –    Imaging Service

Keystone  –    Identity Service

Horizon    –    UI Service

Open Stack Compute Infrastructure (Nova)

Nova is the Computing Fabric controller for the OpenStack Cloud. All activities needed to support the life cycle of instances within the OpenStack cloud are handled by Nova. This makes Nova a Management Platform that manages compute resources, networking, authorization, and scalability needs of the OpenStack cloud. But, Nova does not provide any virtualization capabilities by itself; instead, it uses libvirt API to interact with supported hypervisors. Nova exposes all its capabilities through a web services API that is compatible with the EC2 API of Amazon Web Services.

Functions and Features:

• Instance life cycle management

• Management of compute resources

• Networking and Authorization

• REST-based API

• Asynchronous eventually consistent communication

• Hypervisor agnostic : support for Xen, XenServer/XCP, KVM, UML, VMware vSphere and Hyper-V

OpenStack Storage Infrastructure (Swift)

Swift provides a distributed, eventually consistent virtual object store for OpenStack. It is analogous to Amazon Web Services – Simple Storage Service (S3). Swift is capable of storing billions of objects distributed across nodes. Swift has built-in redundancy and fail-over management and is capable of archiving and media streaming. It is extremely scalable in terms of both size (several petabytes) and capacity (number of objects).

Functions and Features

• Storage of large number of objects

• Storage of large sized objects

• Data Redundancy

• Archival capabilities – Work with large datasets

• Data container for virtual machines and cloud apps

• Media Streaming capabilities

• Secure storage of objects

• Backup and archival

• Extreme scalability

OpenStack Imaging Service (Glance)

OpenStack Imaging Service is a lookup and retrieval system for virtual machine images. It can be configured to use any one of the following storage backends:

• Local filesystem (default)

• OpenStack Object Store to store images

• S3 storage directly

• S3 storage with Object Store as the intermediate for S3 access.

• HTTP (read-only)

Functions and Features

• Provides imaging service

OpenStack Identity Service (Keystone)

Keystone provides identity and access policy services for all components in the OpenStack family. It implements it’s own REST based API (Identity API). It provides authentication and authorization for all components of OpenStack including (but not limited to) Swift, Glance, Nova. Authentication verifies that a request actually comes from who it says it does. Authorization is verifying whether the authenticated user has access to the services he/she is requesting for.

Keystone provides two ways of authentication. One is username/password based and the other is token based. Apart from that, keystone provides the following services:

• Token Service (that carries authorization information about an authenticated user)

• Catalog Service (that contains a list of available services at the users’ disposal)

• Policy Service (that let’s keystone manage access to specific services by specific users or groups).

Openstack Administrative Web-Interface (Horizon)

Horizon the web based dashboard can be used to manage /administer OpenStack services. It can be used to manage instances and images, create keypairs, attach volumes to instances, manipulate Swift containers etc. Apart from this, dashboard even gives the user access to instance console and can connect to an instance through VNC. Overall, Horizon

Features the following:

• Instance Management – Create or terminate instance, view console logs and connect through VNC, Attaching volumes, etc.

• Access and Security Management – Create security groups, manage keypairs, assign floating IPs, etc.

 • Flavor Management – Manage different flavors or instance virtual hardware templates.

 • Image Management – Edit or delete images.

 • View service catalog.

 • Manage users, quotas and usage for projects.

 • User Management – Create user, etc.

 • Volume Management – Creating Volumes and snapshots.

 • Object Store Manipulation – Create, delete containers and objects.

 • Downloading environment variables for a project.

INSTALLATING OPEN STACK

We can install open stack ESSEX very easily using StackGeek script. Login to your box and install git with apt-get. We’ll become root and do an update first.

sudo  su
apt-get update
apt-get install git

Now checkout the StackGeek scripts from Github:

git clone git://github.com/StackGeek/openstackgeek.git   
cd openstackgeek

Install the Base Scripts

Be sure to take a look at the scripts before you run them. Keep in mind the scripts will periodically prompt you for input, either for confirming installation of a package, or asking you for information for configuration.

Start the installation by running the first script:

./openstack_base_1.sh

When the script finishes you’ll see instructions for manually configuring your network. You can edit the interfaces file by doing a:

vim /etc/network/interfaces

Copy and paste the network code provided by the script into the file and then edit:

auto eth0 
iface eth0 inet static
  address 192.168.1.48		
  network 192.168.1.0		
  netmask 255.255.255.0
 broadcast 192.168.1.255
  gateway 192.168.1.124			
  dns-nameservers 8.8.8.8  
auto eth1

Change the settings for your network configuration and then restart networking and run the next script:

/etc/init.d/networking restart

Then run the second script :

./openstack_base_2.sh

After the second script finishes, you’ll need to set up a logical volume for Nova to use for creating snapshots and volumes. Nova is OpenStack’s compute controller process.

Here’s the output from the format and volume creation process:-

root@manager-System-Product-Name:/openstackgeek# fdisk /dev/sda
Device contains neither a valid DOS partition table,nor Sun,SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xb39fe7af.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p Partition number (1-4, default 1): 3  
First sector (2048-62914559, default 2048): 
 Using default value 2048 Last sector,(2048-62914559,default 62914559): 
Using default value 62914559 
Command (m for help): w The partition table has been altered! 
Calling ioctl() to re-read partition table. Syncing disks.
root@manager-System-Product-Name:/openstackgeek# pvcreate -ff /dev/sda3
 Physical volume "/dev/sda3" successfully created
root@manager-System-Product-Name:/openstackgeek# vgcreate nova-volumes /dev/sda3
 Volume group "nova-volumes" successfully created 

Note: Your device names may vary.

Installing MySql

The OpenStack components use MySQL for storing state information. Start the install script for MySQL by entering the following:

./openstack_mysql.sh

You’ll be prompted for a password used for each of the components to talk to MySQL:
Enter a password to be used for the OpenStack services
to talk to MySQL (users nova, glance, keystone): redhat
Note(Here “redhat” is the password given to nova,glance,keystone) 

During the installation process you will be prompted for a root password for MySQL. In our install example we use the same password, ‘redhat’. At the end of the MySQL install you’ll be prompted for your root password again.

mysql start/running, process 8796
################################################################################ 
Creating OpenStack databases and users. 
Use your database password when prompted. 
 Run './openstack_keystone.sh' when the script exits. 
################################################################################
Enter password:
After MySQL is running, you should be able to login with any of the OpenStack 
users and/or the root admin account by doing the following:

mysql -u root -predhat
mysql -u nova -predhat nova
mysql -u keystone -predhat keystone
mysql -u glance -predhat glance

Installing Keystone

Keystone is OpenStack’s identity manager. Start the install of Keystone by doing:

./openstack_keystone.sh

You’ll be prompted for a token, the password you entered for OpenStack’s services, and your email address. The email address is used to populate the user’s information in the database.

Enter a token for the OpenStack services to auth wth keystone: redhattoken 
Enter the password you used for the MySQL users (nova, glance, keystone):redhat 
Enter the email address for accounts(nova,glance,keystone):user@company.com
You should be able to query Keystone at this point. 
You’ll need to source the“stackrc” file before you talk to Keystone:
 . ./stackrc   
 keystone user-list    
 Keystone should return a list of users:
+----------------------------------+---------+------------------------+--------+
|                id                | enabled |         email          |  name  |
+----------------------------------+---------+------------------------+--------+
| b32b9017fb954eeeacb10bebf14aceb3 | True    | user@company.com       | demo   |
| bfcbaa1425ae4cd2b8ff1ddcf95c907a | True    | user@company.com       | glance |
| c1ca1604c38443f2856e3818c4ceb4d4 | True    | user@company.com       | nova   |
| dd183fe2daac436682e0550d3c339dde | True    | user@company.com       | admin  |
+----------------------------------+---------+------------------------+--------+

Installing Glance

Glance is OpenStack’s image manager. Start the install of Glance by doing:

./openstack_glance.sh

The script will download an Ubuntu 12.04 LTS cloud image from StackGeek’s S3 bucket.Once it’s done, you should be able to get a list of images:

glance index

Here’s the expected output:

ID              :- 71b8b5d5-a972-48b3-b940-98a74b85ed6a 
Name            :- Ubuntu 12.04 LTS
Disk Format     :- qcow2 
Container Format:- ovf 
Size            :- 226426880

Installing Nova

We’re almost done installing! The last component is the most important one as well. Nova is OpenStack’s compute and network manager. It’s responsible for starting instances, creating snapshots and volumes, and managing the network. Start the Nova install by doing:

./openstack_nova.sh

You’ll immediately be prompted for a few items, including your existing network interface’s IP address, the fixed network address, and the floating pool addresses:

######################################################
The IP address for eth0 is probably 192.168.1.48.
Keep in mind you need an eth1 for this to work.
######################################################
Enter the primary ethernet interface IP: 192.168.1.48
Enter the fixed network (eg. 10.0.2.32/27): 192.168.1.0/24
Enter the fixed starting IP (eg. 10.0.2.33): 192.168.1.1
############################################################################
The floating range can be a subset of your current network. 
Configure your DHCP server to block out the range before you choose it here. 
An example would be 10.0.1.224-255
############################################################################
Enter the floating network (eg. 10.0.1.224/27):  
Enter the floating netowrk size (eg. 32):

The fixed network is a set of IP addresses which will be local to the compute nodes. Think of these addresses as being held and routed internally inside any of the compute node instances.

The floating network is a pool of addresses which can be assigned to the instances you are running. For example, you could start a web server and map an external IP to it for serving a site on the Internet.


Finish Installing Nova

Nova should finish installing after you enter all the network information. When it’s done, you should be able to get a list of images from Glance via Nova:

 nova image-list

And get the expected output we saw earlier from Glance:

root@manager-System-Product-Name:/openstackgeek# nova image-list
+--------------------------------------+------------------+--------+--------+
|                  ID                  |       Name       | Status | Server |
+--------------------------------------+------------------+--------+--------+
| 71b8b5d5-a972-48b3-b940-98a74b85ed6a | Ubuntu 12.04 LTS | ACTIVE |        |
+--------------------------------------+------------------+--------+--------+

Installing Horizon

Horizon is the UI and dashboard controller for OpenStack. Install it by doing:

./openstack_horizon.sh

When it’s done installing, you’ll be given a URL to access the dashboard. 
You’ll be able to login with the user ‘admin’ 
and whatever you entered earlier for your password. 
If you’ve forgotten it, simply grep for it in your environment:

env |grep OS_PASSWORD

The URL will be : http://192.168.1.48

You can login the Openstack dashboard by the following credentials

USER : admin

PASSWORD : redhat

Deploying a load balanced e-commerce portal in Amazon EC2

Update: NFS should not be used as that will be a SPOF. One should use S3 or other object stores. An alternative could be multi-node GlusterFS if someone needs volumes shared across nodes.

When building an infrastructure for an eCommerce portal on Cloud, it is important to note that it should be available all the time, that it is fail safe with outages like the one we had recently in AWS EU and U.S. East Regions, survive Hardware failure or any other issues like bug in the system or deployment errors. We built an infrastructure on AWS Cloud that address all these issues with LAMP using various AWS Cloud services like EC2, S3, RDS, EBS etc. It is described in detail below:

 

Achieving High Availability & Fail over across Datacenters

Elastic Load Balancer (ELB)

The Elastic Loadbalancer ( ELB ) service provided by AWS tries to achieve the following:

(i) Spans across Datacenters: Loadbalance traffic across mulitple datacenters (AZ )thus providing high availability even if one datacenter goes down. So you should always make sure that when you launch instances under an ELB, you should launch it in different Availability zones. You can also launch instances in the same AZ but by default ELB will redirect request across multiple AZ in a Round Robin way.

(ii) Failover: ELB will periodically monitor the health of the instances and if any of the instance or monitored service ( e.g. Http ) goes down, ELB will stop redirecting requests to that instance and all the request will be redirected to the remaining number of instances registered under ELB. When the instance comes backup, it will again start redirecting requests to that instance.

(iii) Handling root domain ( apex / main domain ) and subdomains: ELB can loadbalance only those requests coming to alias / subdomain( www ). It cannot handle request coming to root domain. This is because when you configure DNS for enabling ELB, you can only set CNAME to ELB for subdomains. There are 2 reasons for this. One is when you configure ELB, you will only get a Public DNS name for the ELB like the following instead of a Public IP.

[bash]Test-1736333854.us-east-1.elb.amazonaws.com [/bash]

This is because AWS changes the Public IP of the ELB periodically for providing scalability for ELB itself. Another reason why you cannot redirect main domain request to ELB is that DNS protocol itself restricts the usage of CNAME or anything other than “A” record for a root domain. So you cannot CNAME root domain to ELB DNS name.

So for serving root domain requests with ELB , there are only work arounds like mentioned below:

a) We have to assign an elastic IP for an instance under ELB. But what if this instance goes down? Set heartbeat to switch EIP? This is a bit complicated setup as switching EIP to instances present across AZ takes time.

b)The other option is to have the root domain point to the IP addresses of the destination by configuring one or more “A” records (address records) for root domain. You can do that if you know the destination IP addresses are fixed, such as if you are using EC2 Elastic IP addresses. We wouldn’t recommend this because IP addresses will be cached at the client end for long time even if you set low value of TTL at the nameservers. This is because TTL value can also be configured at the the client end overriding the TTL value provided by the nameserver of the domain. e.g. with nscd ( Nameserver Caching Daemon) you can set the TTL value manually in its configuration file.

c) You can keep a separate web server not under ELB with a Redirect Rule for redirecting root domain requests to www. You should make sure that this webserver is highly available as well.

d) A better solution is to go for Domain Registrars ( DNS service providers ) who provide this feature of redirecting root domain requests to www. So this can be handled at the DNS itself. The DNS service provided by AWS “Route53” can be used for this ‘Zone apex’ ( root domain ) redirection.

(iv) SSL Termination

There is support for “SSL termination” in ELB which means you can use ELB to loadbalance HTTPS requests too. You just need to buy the SSL certificate and simply upload it to ELB. ELB will redirect all the HTTPS request to the backend servers. So you can make an eCommerce portal highly secure and highly available with ELB.

(v) Persistent Session

You can enable Sticky Session with ELB but the problem is users will be logged out if any of the instance / webserver goes down and ELB will redirect the subsequent requests from the same user to a different instance and it will prompt the user to login again. To tackle this there were few options we had considered –
a)You can either setup distributed failover memcached server or
b)You can use RDS for storing Session.

We went for RDS as our Session Management store since RDS is an excellent choice for Database Administration as well if you are using MySQL as the Database.

Your application must be configured to write session data to an RDS database. So when an instance / webserver goes down and when the ELB redirects the user request to a different instance, the user will not be asked to login again as all servers are reading session data from the same place that is RDS. The user won’t notice anything at all, even though they’ve now started talking to another server. We recommend using a Multi-AZ RDS instance and write session data into this. So if one of your EC2 instances goes down, the other instances will still have access to the RDS database, and likewise if an RDS zone goes down, Amazon fail this over to the second AZ internally, transparently to you and your application.

So the easiest and most reliable way to share sessions for failover on a multi-server environment is to use RDS, since Amazon handle the database layer’s failover for you.

So basically you can achieve two things by using RDS – Session management and Database Management.

 

AutoScaling

The Autoscaling service provided by AWS allows you to scale horizontally up / down with CPU usage, RAM, Disk I/O etc.

Ideally you should use a Base AMI with Autoscaling that will pull the required packages from a Centralized location like Chef Platform and code from the Version Control System or S3. You can write a startup script to run on instance bootup for this purpose. So when Autoscaling launches a new instance it will pull all the latest updated versions of the packages, code and also any other required custom configurations from a centralized location. This will also make it easier to manage all the configuration details, code updates from a centralized location using tools like Chef Platform, Version Control System or S3 respectively.

Apart from Centralised Configuration / Code management, the reason for using Base ami with Autoscaling is that it is not possible to change the ami configured with Autoscaling service dynamically.

 

Storage for Application Files

We came across lot of options for storing the application files. However you have to consider your priorities before you select a storage service for the code. Following are the points to consider for your application file storage system:

(i)Latency issues: All shared storage systems like NFS / GlusterFS / EBS / S3 etc have latency issues when compared to Instance store (Ephemeral Storage)

(ii)High availability: If you are using a shared storage service like NFS, it should never go down for the entire system to be available all the time.

(iii)Access to the code: How to get the latest code during incremental roll out of a new instance because if you are using a shared storage, it becomes difficult to gives access to the shared storage system when a new instance is launched

We went for instance store / ephemeral store that gives you better I/O performance. You can keep your own highly available SVN repository or go for publicly available Version Control Systems like GitHub. At the same time you can also keep a copy in S3 and sync to it whenever there is a code update. This will make it more redundant.

The problem with using shared storage service like NFS / GlusterFS with EBS / S3 is it becomes difficult to avoid single point failure for NFS / GlusterFS service. But if your site doesn’t have much hits and your priority becomes redundancy, you can go for mounting S3 as filesystem using tools like s3cmd and use that as a shared storage with NFS for multiple instance. The problem with S3 is that it is not intended to be used as a filesystem and there have been issues reported with speed and caching. Or you can use EBS volume for code storage if you have only a single instance serving the request. Even using NFS with EBS volumes ( with frequent snapshots to S3 ) gives better performance than using S3 as shared storage for files.

Not only does instance store gives you better performance, error rates very rare. with EBS volumes error rates are reported frequently. Recent outages with AWS EU & US East Regions shows that the down time was made worse due to increase in time taken to recover from EBS errors.

 

Code Deployment

For automating code deployment, you can configure deployment tools like Capistrano. This will become very handy when you have multiple servers to update simultaneously. Capistrano uses Ruby language and is built for Ruby code deployment but with little changes, you can automate deployment of PHP / Perl / Python / JAVA based application.

chef-deploy is another tool that comes with chef for automating code deployment. Continuous Integration tools like Hudson / Cruise Control are excellent tools when you want to automate the Build, Deployment, Test and Rollback process.

For code deployment, we follow a Release Management process where we keep a staging environment that is an exact replica of the production environment. We push code to the production environment only when it’s been completely tested in the staging environment and approved by the Release Manager. This will further reduce the errors / bugs / and downtime time caused due to the code release.

 

Database Server

We went for RDS across AZ for High availability. AWS will take care of Redundancy, Performance Optimisation, Scalability and Backup. You can avoid the hassle of managing a Database Server by using RDS. RDS is as an excellent distributed highly available Session Management System. You can also take regular backup from RDS and keep it in S3.

You can also use Master–Slave Replication setup instead of RDS. This is also a good option for achieving high availability for Database server. The challenging part will be to manually configure failover for both master and slave servers, achieving scalability with traffic, backup configuration and performance optimization with increasing load. With RDS, all these will be managed by AWS.

 

Log handling

Keep all the important logs like Application logs, Syslogs, SSH log etc in EBS volume. You can either schedule regular snapshots of these EBS volume to S3 or you can even sync these log files to an S3 bucket periodically using tools like s3sync.

 

Configuration Management

If you have more than one server or are planning to scale up in future or would like to automate a lot of administration / coding stuffs, you should definitely use one of the Open Source freely available Configuration Management tools like Chef / puppet / Cfengine

Chef is new and has default support for AWS / EC2. We use Chef extensively for managing our infrastructure in AWS. Chef provide a lot of readily available cookbooks ( recipes / roles ) for LAMP, JAVA app, Cassandra, Hadoop, Nagios etc which can be used readily ( or with minimum customization ) to automate the infrastructure setup and configuration. Chef also comes with a tool called Chef-deploy for automating deployment of code.

So using Chef along with tools like Hudson / Cruisecontrol, you can automate the entire setup from infrastructure setup to configuration management to building, deployment and testing of your application.

 

Performance

To improve performance you can implement the following:

(i)Use caching mechanisms like Memcache(DB scaling) / aiCache / Varnish.

(ii)CDN ( Content Delivery Network ) is a must if you want to provide better end-user response time. There are lot of CDN providers but we recommend AWS CloudFront or Akamai for serving static files and images. For start-up and small business, CDN might be costly but as your target audience grows larger and becomes more global, a CDN is necessary to achieve fast response times.

 

Monitoring & Alert

For monitoring, go for open source monitoring tools along with a SaaS based monitoring application.

(i)There are lot of free and open source option available in the market – Nagios, Zenoss,Zabbix etc. This can be automated with Chef in such a way that when a new server is launched in to the cluster, it will be automatically added to the Nagios list of monitored servers.

(ii)You can also use excellent SaaS based monitoring apps like Pingdom, mon.itor.us, site24x7.com etc for monitoring and alerting via email, SMS or Twitter.

(iii)Custom scripts or tools like Munin & Monit for monitoring and restarting services if it crashes.

 

Backup

You can keep copies of code in an S3 Bucket and sync it with tools like s3sync with every update. For DB Backup, in addition to automated RDS Backup, you can take periodical standard DB backups using mysqldump and store it in S3 bucket.You can also use EBS volumes for keeping replica of code and DB Backup with periodical snapshots to S3.

An important thing to note about S3 storage is it is only a Highly available Storage System. It is not backed up automatically. That means if you delete anything manually from s3, it will be forever gone unless you have manually backed it up with multiple copies in S3. So make sure that you have enough backups available in S3.