• Call: +1 (858) 429-9131

Archive for February, 2013

Migrate Mysql database to Mongodb

In recent years, we have seen a growing interest in database management systems that differ from the traditional relational model. At the heart of this is the concept of NoSQL, a term used collectively to denote database software that does not use the Structured Query Language (SQL) to interact with the database. One of the more notable NoSQL projects out there is MongoDB, an open source document-oriented database that stores data in collections of JSON-like documents. What sets MongoDB apart from other NoSQL databases is its powerful document-based query language, which makes the transition from a relational database to MongoDB easy because the queries translate quite easily.

This new class of databases seems to solve many of the bottlenecks in MySql and other relational databases. It will give you shear performance, self replication and scalability at not cost because it open source. MongoDB has plenty of drivers for other scripting and high-level languages I use PHP so I download the PHP driver. You can see the supported list here: http://www.mongodb.org/display/DOCS/Drivers. In this blog I convert  a MySQL database using PHP to MongoDB.

First you install MongoDB, you can do it by checking the previous blog.  Check this link

Then we run the script to convert a Mysql DB to Mongodb.

create a new file called MySqltoMongodb.php , In that file please copy paste the below contants (please give your Mysql DB details as well as your Mongodb details)

  1. <?php
  2. // mysql settings
  3. $mydb = “database”;
  4. $myconn = mysql_connect(‘localhost’,’user’,’password’);
  5. $setmydb = mysql_select_db( $mydb );
  6. $mytables = getMyTables( $mydb );
  7.  //mongo db settings
  8. $modb = “database”;
  9. $moConnect=”mongodb://user:password@localhost”;
  10.  function getMyTables( $dbname ) {
  11. $tables = array();
  12. $sql = mysql_query(“SHOW TABLES FROM $dbname “) or die(“Error getting tables from $dbname”);
  13.  if( mysql_num_rows( $sql ) > 0 ) {
  14. while( $table = mysql_fetch_array( $sql ) ) {
  15. $explain = explainMyTable( $table[0] );
  16. $tables[$table[0]] = $explain;
  17. }
  18. }
  19. return $tables;
  20. }
  21.  function explainMyTable( $tbname ) {
  22. $explain = array();
  23. $sql = mysql_query(“EXPLAIN $tbname”) or die(“Error getting table structure”);
  24. $i = 0;
  25.  while( $get = mysql_fetch_array( $sql ) ) {
  26. array_push( $explain, $get[0] );
  27. $i++;
  28. }
  29. return $explain;
  30. }
  31.  function checkEncode($string) {
  32. if( !mb_check_encoding($string,’UTF-8′)) {
  33. return mb_convert_encoding($string,’UTF-8′,’ISO-8859-1′);
  34. } else {
  35. return $string;
  36. }
  37.  }
  38. try {
  39. $moconn = new Mongo($moConnect);
  40. $modb = $moconn->selectDB( $modb );
  41. } catch(MongoConnectionException $e) {
  42. die($e.”Problem during mongodb initialization. Please start mongodb server.”);
  43. }
  44.  foreach( $mytables as $table => $struct ) {
  45. $sql = mysql_query(“SELECT * FROM $table LIMIT 0 , 500000″) or die( mysql_error() );
  46. $count = mysql_num_rows( $sql );
  47.  // Starts new collection on mongodb
  48. $collection = $modb->$table;
  49.  // If it has content insert all content
  50. if( $count > 0 ) {
  51. while( $info = mysql_fetch_array( $sql, MYSQL_NUM )) {
  52. $infosize = count( $info );
  53. $mosql = array();
  54.  for( $i=0; $i < $infosize; $i++ ) {
  55. if(!empty($struct[$i]))
  56. $mosql[$struct[$i]] = checkEncode($info[$i]);
  57. }
  58.  $collection->insert($mosql);
  59. }
  60. // Only create a new entry empty
  61. } else {
  62.  for( $i=0; $i < $infosize; $i++ ) {
  63. if(!empty($struct[$i]))
  64. $mosql[$struct[$i]] = ”;
  65.  }
  66. $collection->insert($mosql);
  67. }
  68. }
  69. echo “Done! Please, check your MongoDB collection!”;
  70. ?>

Now fire up your browser and launch the page. If all goes well you should see
“Done! Please, check your MongoDB collection!”

After running this script check your Mongo db collection, in that you can see your Mysql Table.  However we haven’t done it on a large system, we are planning to do the same on a huge Postgres Sql system soon.

Installation of MongoDB and its performance test

Why MongoDB?

  • Document-oriented
    • Documents (objects) map nicely to programming language data types
    • Embedded documents and arrays reduce need for joins
    • Dynamically-typed (schemaless) for easy schema evolution
    • No joins and no multi-document transactions for high performance and easy scalability
  • High performance
    • No joins and embedding makes reads and writes fast
    • Indexes including indexing of keys from embedded documents and arrays
    • Optional streaming writes (no acknowledgements)
  • High availability
    • Replicated servers with automatic master failover
  • Easy scalability
    • Automatic sharding (auto-partitioning of data across servers)
    • Reads and writes are distributed over shards
    • No joins or multi-document transactions make distributed queries easy and fast
    • Eventually-consistent reads can be distributed over replicated servers

Mongo data model

  • A Mongo system (see deployment above) holds a set of databases
  • A database holds a set of collections
  • A collection holds a set of documents
  • A document is a set of fields
  • A field is a key-value pair
  • A key is a name (string)
  • A value is a
    • basic type like string, integer, float, timestamp, binary, etc.,
    • a document, or
    • an array of value

    Mongo query language

  • To retrieve certain documents from a db collection, you supply a query document containing the fields the desired documents should match. For example, {name: {first: 'John', last: 'Doe'}} will match all documents in the collection with name of John Doe. Likewise, {name.last: 'Doe'} will match all documents with last name of Doe. Also, {name.last: /^D/} will match all documents with last name starting with ‘D’ (regular expression match).
  • Queries will also match inside embedded arrays. For example, {keywords: 'storage'} will match all documents with ‘storage’ in its keywords array. Likewise, {keywords: {$in: ['storage', 'DBMS']}} will match all documents with ‘storage’ or ‘DBMS’ in its keywords array.
  • If you have lots of documents in a collection and you want to make a query fast then build an index for that query. For example, ensureIndex({name.last: 1}) or ensureIndex({keywords: 1}). Note, indexes occupy space and slow down updates a bit, so use them only when the tradeoff is worth it.

Install MongoDB on Ubuntu 10.04

Configure Package Management System (APT)

The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Create a /etc/apt/sources.list.d/10gen.list file and include the following line for the 10gen repository.

deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen

Now issue the following command to reload your repository:

sudo apt-get update

Install Packages

Issue the following command to install the latest stable version of MongoDB:

sudo apt-get install mongodb-10gen

When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.

Configure MongoDB

These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.

This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.

Note

If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

Controlling MongoDB

Starting MongoDB

You can start the mongod process by issuing the following command:

sudo service mongodb start

You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.

Stopping MongoDB

As needed, you may stop the mongod process by issuing the following command:

sudo service mongodb stop

Restarting MongoDB

You may restart the mongod process by issuing the following command:

sudo service mongodb restart

Controlling mongos

As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

Using MongoDB

Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:

mongo
> show dbs (); —> To show your databases
> use <databasename> —-> To switch database
> db.createCollection(“collectionname”) —> To create collection
> db.collectionname.find(); —> To see the contents in the collection
> db.addUser(“theadmin”, “anadminpassword”) —> To create user and password

Mongodb performance test :-

To monitor database system we can use Mongotop

Mongotop tracks and reports the current read and write activity of a MongoDB instance.
Mongotop provides per-collection visibility into use.
Use mongotop to verify that activity and use match expectations.
Mongotop returns time values specified in milliseconds (ms.)
Mongotop only reports active namespaces or databases, depending on the –locks option.
If you don’t see a database or collection, it has received no recent activity.

By default mongotop connects to the MongoDB instance running on the localhost port 27017. However,mongotop can optionally connect to remote mongod instances

Next, we can use Mongostat

Mongostat captures and returns counters of database operations. Mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use  Mongostat to understand the distribution of operation types and to inform capacity planning.
The Mongostat utility provides a quick overview of the status of a currently running mongod or Mongos instance. Mongostat is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and Mongos instances.

Use  db.serverStatus()
It provides an overview of the database process’s state.

Then REST interface

MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default configurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017

These are a few basic tips on making your application better/faster/stronger without knowing anything about indexes or sharding.

Connecting

Connecting to the database is a (relatively) expensive operation. Try to minimize the number of times you connect and disconnect: use persistent connections or connection pooling (depending on your language).

there are some  side effects with the PHP connection code.

$connection = new Mongo ( );

$connection->connect( );

In this code it appears the user wants to create a new connection. However, under the hood the following is happening:

The constructor connects to the database.
connect( ) sees that you’re already connected, assumes you want to reset the connection.
Disconnects from the database.
Connects again.

The result is that you have doubled your execution time.

ObjectIds

ObjectIds seem to be uncomfortable, so they convert their ObjectIds into strings. The problem is, an ObjectId takes up 12 bytes but its string representation takes up 29 bytes (almost two and a half times bigger).

Numbers vs. Strings

MongoDB is type-sensitive and it’s important to use the correct type: numbers for numeric values and strings for strings.

If you have large numbers and you save them as strings (“1234567890″ instead of 1234567890), MongoDB may slow down as it strcmps the entire length of the number instead of doing a quicker numeric comparison. Also, “12″ is going to be sorted as less than “9″, because MongoDB will use string, not numeric, comparison on the values. This can lead to some errors.

Driver-specific
Find out if you’re driver is particularly weaknesses (or strengths). For instance, the Perl driver is one of the fastest drivers, but it is not good at decoding Date types (Perl’s DateTime objects take a long time to create).
MongoDB adopts a documented-oriented format, so it is more similar to RDBMS than a key-value or column oriented format.

MongoDB operates on a memory base and places high performance above data scalability.Mongo DB uses BSON for data storage

Mongo uses memory mapped files, which means that a lot of the memory reported by tools such as top may not actually represent RAM usage. Check mem[“resident”], which tells you how much RAM Mongo is actually using.

“mem” : {
    “resident” : 2,
    “virtual” : 2396,

    “supported” : true,
    “mapped” : 0
},

Backup

There are basically two approaches to backing up a Mongo database:

Mongodump and Mongorestore are the classic approach. Dumps the contents of the database to files. The backup is stored in the same format as Mongo uses internally, so is very efficient. But it’s not a point-in-time snapshot.
To get a point-in-time snapshot, shut the database down, copy the disk files (e.g. with cp) and then start mongod up again. Alternatively, rather than shutting mongod down before making your point-in-time snapshot, you could just stop it from accepting writes:

> db._adminCommand({fsync: 1, lock: 1})
{
        “info” : “now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock”,

        “ok” : 1
}

To unlock the database again, you need to switch to the admin database and then unlock it

> use admin
switched to db admin
> db.$cmd.sys.unlock.findOne()
{ “ok” : 1, “info” : “unlock requested” }

Replication
Start your master and slave up like this:

$ mongod –master –oplogSize 500

$ mongod –slave –source localhost:27017 –port 3000 –dbpath /data/slave

When seeding a new slave server from master use the –fastsync option.

You can see what’s going on with these two commands:
> db.printReplicationInfo() # tells you how long your oplog will last
> db.printSlaveReplicationInfo() # tells you how far behind the slave is

If the slave isn’t keeping up,Check the mongo log for any recent errors. Try connecting with the mongo
console. Try running queries from the console to see if everything is working. Run the status commands
above to try and find out which database is taking up resources.
Timeout

Connection timeout in milliseconds. Defaults to 20000

Connection::query_timeout.

How many milliseconds to wait for a response from the server. Set to 30000 (30 seconds) by default. -1 waits forever (or until TCP times out, which is usually a long time).

Default pool

The default pool has a maximum of 10 connections per mongodb host. This value is controlled by the variable  “connectionsPerHost” within the class

MongoDB Server Connections

The MongoDB server has a property called “maxConns” that  is the max number of simultaneous connections. The
default number for maxConns is 80% of the available file descriptors for connections. One way to check the number of connections is by opening the mongo shell and executing:

>db.serverStatus() and in the previous mail I have send the screen shot of this.

The standard format of the MongoDB connection URI used to connect to a MongoDB database server.

mongodb://[username:password@]host1[:port1][,host2[:port2],…[,hostN[:portN]]][/[database][?options]]

Finding the Min and Max values in MongoDB

In MongoDB, the min() and max() functions work as limitors – essentially the same as “gte” (>=) and “lt” (<).

To find the highest (maximum) value in MongoDB, you can use this command;

db.thiscollection.find().sort({“thisfieldname”:-1}).limit(1)

This essentially sorts the data by the fieldname in decending and takes the first value.

The lowest (minimum) value can be determined in a similar way.

    db.thiscollection.find().sort({“thisfieldname”:1}).limit(1)

Memory Mapped Storage Engine :-

This is the current storage engine for MongoDB, and it uses memory-mapped files for all disk I/O.  Using this strategy, the operating system’s virtual memory manager is in charge of caching.  This has several implications:

There is no redundancy between file system cache and database cache: they are one and the same.
MongoDB can use all free memory on the server for cache space automatically without any configuration of a cache size.
Virtual memory size and resident size will appear to be very large for the mongod process.

This is benign: virtual memory space will be just larger thanthe size of the datafiles open and mapped; resident size will vary depending on the amount of memory not used by other processes on the machine.

This command shows the memory usage information :- db.serverStatus().mem

For example :-

> db.serverStatus().mem
{
    “bits” : 64,
    “resident” : 31,
    “virtual” : 146,
    “supported” : true,
    “mapped” : 0,
    “mappedWithJournal” : 0
}

We can verify there is no memory leak in the mongod process by comparing the mem.virtual and mem.mapped values (these values are in megabytes).  If you are running with journaling disabled, the difference should be relatively small compared to total RAM on the machine. If you are running with journaling enabled, compare mem.virtual to 2*mem.mapped.   Also watch the delta over time; if it is increasing consistently, that could indicate a leak.

Also we can use to check what percent of memory is being used for memory mapped files by the free command:

Here 2652mb of memory is being used to memory map files

root@manager-desktop:~# free -tm

             total       used       free     shared    buffers     cached
Mem:          3962       3602        359          0        411       2652

-/+ buffers/cache:        538       3423

Swap:        1491        52       1439

Total:        5454       3655   1799

Garbage collection handling :-

When we remove an object from MongoDB collection, the space it occupied is not automatically garbage collected and new records are only appended to the end of data files, making them grow bigger and bigger.MongoDB maintains lists of deleted blocks within the datafiles when objects or collections are deleted.  This space is reused by MongoDB but never freed to the operating system.

To shrink the amount of physical space used by the datafiles themselves, by reclaiming deleted blocks, we must rebuild the database by using  the command “db.repairDatabase( )” . repairDatabase copies all the database records to new files.

We will need enough free disk space to hold both the old and new database files while the repair is running, the repairDatabase  will take a long time to complete.Also rather than compacting an entire database,

you can compact just a single collection by using  “db.runCommand({compact:’collectionmname;})

This does not shrink any datafiles,however; it only defragments deleted space so that larger objects might reuse it.

The compact command will never delete or shrink database files, and in general requires extra space to do its work.

Thus, it is not a good option when you are running critically low on disk space.