Thursday 11 September 2014

Enable Replication on MongoDB Database with very little downtime

In this blog, I wanted to share my experience on how we handled replication of 3.5TB MongoDB database with little downtime. You may wonder why can't I just enable replication by enabling 'replicaset' on the MongoDB configuration, doing which will have no downtime. The fact is that, if enabled, synchronization will never succeed due to the huge data size and oplog in MongoDB. So our last option was to manually synchronize the database files, bring up the secondary server and then attempt the replication. Stopping the primary server for copy would require a huge downtime due to size of the data base and the bandwidth. So we had to first build a strategy to minimize the database downtime during the copy process.

We had to choose a right tool for copying database files. We tried multiple possibilities such as NFS, RSYNC etc but finally chose SCP as the best tool. We first created list of all the files in the source folder and created 4 scripts to copy file-by-file. We named these scripts as scpcp1.sh, scpcp2.sh, scpcp3.sh and scpcp4.sh with SCP commands in it. 4 scripts were created so that 4 concurrent instances of SCP can be run, or just simply for increasing the threads.

So the procedure is as follows. Before you begin, you need to make sure that primary MongoDB server is running in standalone mode. To do so comment out #replicaset /etc/mongod.conf 

Here is one example SCP command in these scripts. 

 scp -r -p -c arcfour128 root@192.168.1.150:/mongodb/database/mgdbfile.1 /mongodb/database/.

-p preserves the file created, modified and accessed date. Without this argument, next rsycn will copy all the files again.
-c arcfour128 This change the Ciphers hence drastically speeds up the copy.

Please note that you will need to add the ssh keys to the source server to avoid password authentications. You can do it by following these steps.

Once the SSH keys are added, we can run the scripts as background process. Use ampersand symbol to start it in the background process. And then disown the process. 
$ sh scpcp1.sh &
$ sh scpcp2.sh &
$ sh scpcp3.sh &
$ sh scpcp4.sh &
$ disown -a

While the copy is in process, we had to plan for RSYNC. I am always not in favor of RSYNC because it may make changes to the source files. We had to be sure that there are no writes to the source while doing RSYNC. To overcome this concern, we decided to create a new user in the source server with read-only access to the source path. 

First we created a normal user in source server as rsyncuser.
$ useradd rsyncuser
$ passwd rsyncuser

Since the source path owner is mongod user and the permission on the source path is 664, the newly created rsyncuser had only read access. Now we are ready to do RSYNC.

Copy of 3.5TB through SCP took around 8 hours to complete with 1Gbps network.

Now we issued RSYNC dry run command to see the modified files at source in these 8 hours.

rsync -avzn --del -e "ssh -c arcfour128" --log-file="/home/rsyncuser/rsync2.log" rsyncuser@192.168.1.150:/mongodb/database/ /mongodb/  

This command listed the files that were modified at source after while the copy was in progress. Remember we started copying the database files while MongoDB was running. This was to avoid big downtimes.
Once RSYNC DryRun listed the file names, we copied them individually using scp (command given above). We did this repeatedly for 2 times to have minimum level. Once the number of files came down, we intiated db.fsynclock() to mongoDB so that it stops writes to the data base. Caution: This command will lock database and make it nonoperational. Hence please be sure you plan this downtime. You need to keep the mongo shell open until the rsync completes.

rsync -avz --del -e "ssh -c arcfour128" --log-file="/home/rsyncuser/rsync2.log" rsyncuser@192.168.1.150:/mongodb/database/ /mongodb/  

Once the RSYNC is completed, you may now start the monogDB service in replicaset mode in the secondary server. And then issue db.fsyncUnlock() in the source server of which the shell is already open.
Check the replication status by issuing rs.status() in any of the node.
Please like my post if this helped you.

No comments:

Post a Comment