Wednesday 19 October 2016

Web Based (Django) Password Change tool for Microsoft Active Directory

pyadselfservice (A short for Python Active Directory Self Service) is a software created using Python 3.10.12 & Django 5.0. The project aims to help IT Support teams in automating AD password change by providing a self service portal to end users. There are many commercial tools out there in the market, but this is a free alternative.

The tool authenticates users based on 2 factor authentication. The first factor being the secret and the second factor being OTP.

How it works:
The secrets is any information stored in the AD attributes and the second factor is One Time Password (OTP). After successful validation of First Factor, a OTP is sent to the Email address of the user.
Only after successful authentication against these 2 factors, a user can change his password.

For support and feedback on this software, please create a issue on GIT

Just to give you overview of this software before you begin with the installation, below are the few screenshots from the tool with successful deployment.

1. Home Page. 



2. First factor authentication using a secret stored in the AD attribute. Captcha is enabled for security.
In this example I have configured 'mobile' AD attribute to use user's mobile number as a secret. You can choose any other attribute that you deem fit and configure it in settings.py




3. Second factor authentication is based on OTP.



4. Below is a example email of the OTP. By default, the OTP is sent to the email address defined under "mail" AD attribute. If you have MS Exchange or any other email platform integrated with AD authentication then the email may not be accessible to user. In such implementations, you might want to send the OTP to users alternate email. You may need to repurpose any existing attribute to populate alternate email ID or add an additional attribute in the AD for storing alternate email. Once you decide on the right attribute to store alternate email, change PYADSELFSERVICE_ATTR2 in the settings.py

5. After successful 2 factor validation, the final page for entering new password.


6. After successful password change, below confirmation page will appear.



Installation & Configuration
Section 1:
The tool is tested in Ubuntu 22.04.3 LTS operating system with Python 3.10.12 & Django 5.0.  And MS Active Directory on Windows 2022 Server.

sudo apt-get update
sudo apt-get install apache2 libapache2-mod-wsgi-py3 python3-pip git
sudo pip install django ldap3 django-simple-captcha PyCrypto pyotp
sudo git clone https://github.com/kanayak123/pyadselfservice.git /opt/pyadselfservice

Section 2:

You will need to create a user account, with minimal access required to performing password reset and account unlock. After the user account is created, assign it with necessary permission required to perform password reset and account unlock. Below are the steps:
  • Open 'Active Directory Users and Computers'
  • Go to View - Click on "Advanced Features"
  • Right-click on Domain or Organization Unit that you want to grant this permission to.
  • Click the Security tab.
  • Click Advanced .
  • Click Add , and then click 'Select Principal' and specify the user account that you created above
  • Select "Type: Allow" and "Applies to: Descendant User objects"
  • In "Permissions", check 'Read all properties', 'Reset password', 'Read lockoutTime' and 'Write lockoutTime'. Click OK.





Important: MS Active Directory by design does not allow writes into any attributes unless the request is sent over LDAPs (636). You will need to enable SSL over LDAP on your domain controller if you haven't done it before. Please enable it referring this or you may use alternative methods.


Section 3:
Next, configure settings.py for authenticating Active Directory Domain Controller.

sudo vi /opt/pyadselfservice/pyadselfservice/settings.py

The configuration that you will need to edit as as below:
STATIC_URL = 'static/'
OTP_LOGIN_URL= '/' #Redirect page if OTP fails.

SMTP relay configuration for sending OTP emails. Note: If you are using gmail and have enabled 2FA, then please use the app password. Ref
EMAIL_USE_TLS = True
EMAIL_HOST = 'smtp.gmail.com'
EMAIL_HOST_USER = 'urID@gmail.com'
EMAIL_HOST_PASSWORD = 'Your Password'
EMAIL_PORT = 587
DEFAULT_FROM_EMAIL="(Sender) Sender Name <urID@gmail.com>"
PYADSELFSERVICE_DCFQDN= #IP/FQDN of the Domain Controller
PYADSELFSERVICE_DCPORT = '636' #It should always be LDAPs.
PYADSELFSERVICE_DOMAINFQDN= #FQDN of the Active Directory Domain
PYADSELFSERVICE_USERNAME= #Username from section2 above
PYADSELFSERVICE_PASS= #Password for the same username
PYADSELFSERVICE_BASEDN= #Base DN of the domain.Ex: DC=exmaple,DC=local
PYADSELFSERVICE_ATTR2 = 'mail'
In this example, the email ID updated in 'mail' attribute will be used as recipient address for the OTP email trigger. If you have MS Exchange or any other email platform integrated with AD authentication, then you might want to send the OTP to users alternate email accessible to users such as thier Gmail or Yahoo. You may need to repurpose any existing attribute to populate alternate email ID or add an additional attribute in the AD for storing alternate email.

PYADSELFSERVICE_LOGPATH='/var/log/pyadselfservice/'
PYADSELFSERVICE_STOUT= #Session time-out in seconds. DO NOT include quotes.
PYADSELFSERVICE_STOUT= #Session time-out in seconds. DO NOT include quotes.
PYADSELFSERVICE_ATTR3 = #AD Attributes of your choice for validation ex: postalCode or mobile
Note: The information stored in this AD attribute is used for first factor authentication.


Please create the log path manually and assign necessary permissions. Run this command
sudo mkdir /var/log/pyadselfservice/ && sudo chown -R www-data:www-data /var/log/pyadselfservice/

Lastly, before you run the server, please run this command.
sudo python3 /opt/pyadselfservice/manage.py migrate


Section 4:
You may use the django run server command to run the start the server.
sudo python3 /opt/pyadselfservice/manage.py runserver 0.0.0.0:8000


Or 

You may configure Apache with mod-wsgi

You can also use Apache as web server. Please refer to Django documentation on How to use Django with Apache and mod_wsgi. However, you may use below configuration file as reference.
sudo vi /etc/apache2/sites-enabled/000-default.conf
WSGIScriptAlias / /opt/pyadselfservice/pyadselfservice/wsgi.py
WSGIPythonPath /opt/pyadselfservice/

<VirtualHost *:80>
        Alias /static/ /opt/pyadselfservice/pyadselfservice/static/
        <Directory /opt/pyadselfservice/pyadselfservice/static/>
            Require all granted
        </Directory>
        <Directory /opt/pyadselfservice/pyadselfservice/>
           <Files wsgi.py>
             Require all granted
           </Files>
        </Directory>
        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

</VirtualHost>

Restart Apache service and you are good to go. Also, always restart apache service after doing any changes in settings.py

You are done and good to go....

How to analyze the logs:
There are 2 log files in the path configured for logs i., debug.log and django_request.log.
The debug.log stores all LDAP transaction logs whereas django_request.log stores http exceptions.

During the password reset process in pyadselfservice, If you get a error message that says "Your password could not be changed. The password you entered  does not comply with the password policy. Please go back, enter a valid password and try again", then please go through debug.log.


1. The debug.log shows  2017-02-27 12:51:13,411 DEBUG log 7701 139735603336960 log PROTOCOL:MODIFY response <[{'description': 'unwillingToPerform', 'type': 'modifyResponse', 'referrals': None, 'dn': '', 'result': 53, 'message': '0000052D: SvcErr: DSID-031A12D2, problem 5003 (WILL_NOT_PERFORM), data 0\n\x00'}]> received via <ldaps://10.xx.xx.xxx:636 - ssl - user: prevuser@domain.local - not lazy - bound - open - <local: 10xx.xx.xx:44486 - remote: 10.xxx.xx.xx:636> - tls not started - listening - SyncStrategy - internal decoder>
The error 'unwillingtoperform' appears if a user tried to submit a password which does not comply with the password policy applied on your AD through Group Policy Object.
Or if the user account for which the password being changed has more privileges than the account configured in section 4 above Ex:- when you try to change password for an administrator account with this tool but the user account configured in Section 4 above does not have administrator privileges. This behavior is by Active Directory design meant for better security.

2. The debug.logs show 2017-02-27 12:50:11,321 DEBUG log 7701 139735720654592 log PROTOCOL:MODIFY response <[{'description': 'constraintViolation', 'type': 'modifyResponse', 'referrals': None, 'dn': '', 'result': 19, 'message': '0000052D: AtrErr: DSID-03191083, #1:\n\t0: 0000052D: DSID-03191083, problem 1005 (CONSTRAINT_ATT_TYPE), data 0, Att 9005a (unicodePwd)\n\x00'}]> received via <ldaps://10.xxx.xx.xx:636 - ssl - user: username@domain.local - not lazy - bound - open - <local: 10xx.xx.xx:44464 - remote: 10.xxx.xx.xx:636> - tls not started - listening - SyncStrategy - internal decoder>
The error 'constraintViolation' appears if a user tries to reuse his existing password and the "Enforce password history" under Password Policy in the AD Group Policy is configured to remember previous passwords. This behavior is by AD design meant for security.

Tuesday 22 March 2016

Active/Passive Cluster with Pacemaker and GFS2 on Centos 7

This time I would like to share with you the procedure that I have followed to setup a Tomcat cluster on Centos 7. This is bit tricky but an hour job if you know the right procedure.

The environment I used was 2 Centos 7 VMs running on VMWare vSphere.
Node1: 192.168.1.10
Node2: 192.168.1.11
VMWare vSphere Server: 192.168.1.15

Here I have used CLVM with GFS2 to store application data that needs to be accesses from both the nodes for successful load balancing or fail-over. For this to function, you will need a shared raw storage such as SAN. However, I don't have a SAN in my test lab hence I used DRBD. You may skip section 2 and 3 if you have shared storage.

Section 1: DNS
Set the host name of the server as per the cluster configuration. Here we use the names as node1 and node2. Set the /etc/hostname with node names in respective servers. Reboot server after change.
Before you begin with cluster setup, make sure the /etc/hosts file is added with the right entries. Pacemaker is highly dependent on name resolution. Therefore, correct entries in the /etc/hosts is a key for the successful configuration. Here is my /etc/hosts looks like.

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.10 node1
192.168.1.10 node2


Section 2: Add additional Hard Disk to both the nodes. These hard disks will be used in DRBD. Do this on both the nodes.


Here is my configuration for the new Hard disk. I am using 16GB in each node.


Section 3: Setup DRBD
Run these commands on both nodes.

$ rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org 
$ rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm 
$ yum install -y kmod-drbd84 drbd84-utils bash-completion
$ fdisk -l
In my case the newly added Hard Disk is detected as /dev/sdb

$ vi /etc/drbd.d/clusterdisk.res
resource clusterdisk {
        protocol C;
        startup {
                become-primary-on both;
        }
        disk {
                fencing resource-and-stonith;
                resync-rate 500M;
        }
        handlers {
                fence-peer              "/usr/lib/drbd/crm-fence-peer.sh";
                after-resync-target     "/usr/lib/drbd/crm-unfence-peer.sh";
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "233hgfghfGHFHGF5665465465465";
                timeout 180;
                ping-int 3;
                ping-timeout 9;
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
        }
        on node1 {
                device /dev/drbd1;
                disk /dev/sdb;
                address 192.168.1.10:7788;
                meta-disk internal;
        }
        on node2 {
                device /dev/drbd1;
                disk /dev/sdb;
                address 192.168.1.11:7788;
                meta-disk internal;
        }
}

$ drbdadm create-md clusterdisk
$ drbdadm up clusterdisk
$ service drbd restart 
Run this command on node1 only.
$ drbdadm primary --force clusterdisk
$ chkconfig drbd on
$ service drbd status
Wait until the status shows UpToDate on both nodes. Note than one node shows "Secondary". This means that you cannot do any file operation in this disk on Node2 until it becomes Primary.
[root@node1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49 build by mockbuild@Build64R6, 2016-01-12 13:27:11
m:res       cs         ro               ds                 p  mounted  fstype
1:clusterdisk  Connected  Primary/Secondary  UpToDate/UpToDate  C
Now go to Node2 and run this command
$ drbdadm primary --force clusterdisk
[root@node1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49 build by mockbuild@Build64R6, 2016-01-12 13:27:11
m:res       cs         ro               ds                 p  mounted  fstype
1:clusterdisk  Connected  Primary/Primary  UpToDate/UpToDate  C

Section 4: Configure Pacemaker and CLVM
Run these commands on both the nodes.
$ yum install -y pacemaker pcs psmisc policycoreutils-python lvm2-cluster gfs2-utils yum install fence-agents-all
It is important that we disable SELinus and IPTables during the setup. Any network obstruction will create problems in cluster setup. Note that below command disables SELinux and IPTables temporarily. You will need to create exceptions or disable it completely. Helps are available in google.
$ setenforce 0
$ iptables --flush
$ systemctl start pcsd.service
$ systemctl enable pcsd.service
Set the password for hacluster account in both the nodes. Keep the same password in both the nodes
$ passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
$ pcs cluster auth node1 node2
Username: hacluster
Password:
node1: Authorized
node2: Authorized
$ pcs cluster setup --name mycluster node1 node2
[root@node1 ~]# pcs cluster start --all
node1: Starting Cluster...
node2: Starting Cluster...

Now you may start creating the resources. First we will be creating a fencing device for STONITH. You may create any fencing device depending on available resources. Here I have created VMWare SOAP fencing device. Run these commands on node1 only:

$ pcs stonith create vmware_soap fence_vmware_soap ipaddr=192.168.1.15 ipport=443 ssl_insecure=1 inet4_only=1 login="root" passwd="vmwareorootpass" action=reboot pcmk_host_list="VM1_node1,VM2_node2" power_wait=3 op monitor interval=60s
In the above command ipaddr is the IP of vSphere Server, login= I have used root login of the vSphere here. It is recommended that you create a seperate user account with minimum permission possible. pcmk_host_list= is the names of the VMs in vSphere.

[root@node1 ~]# pcs status
Cluster name: mycluster
Last updated: Tue Mar 22 16:08:14 2016          Last change: Fri Mar 18 11:33:07 2016 by root via cibadmin on gfs2
Stack: corosync
Current DC: gfs2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ node1 node2 ]

Full list of resources:

 vmware_soap    (stonith:fence_vmware_soap):    Started node1

PCSD Status:
  node1: Online
  node2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
Once the vmware_soap resource is started, you may proceed with creating rest of the resources. Run these commands in node1 only.

$ pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
$ pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true
Run 'pcs status' command on any node and wait until the resources are started. Once dlm and clvmd resource is started on both the nodes, you may create the clustered volume.
$ pvcreate /dev/drbd1
$ vgcreate -Ay -cy cluster_vg /dev/drbd1
$ lvcreate -L5G -n cluster_lv cluster_vg
$ mkfs.gfs2 -j2 -p lock_dlm -t mycluster:fs-data /dev/cluster_vg/cluster_lv
The next steps are based on the purpose of the cluster or the application that you want to configure. There are huge number of applications that are supported on pacemaker. I am using Tomcat in my test lab.
For the below resource make sure you use the available free IP from the same subnet as your server LAN.

$ pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.1.100 cidr_netmask=32 op monitor interval=30s
Here I have created a mount point for storing the variable files of my tomcat application. You can use this mount point to store website home folder in case you are planning to use Apache.
$ pcs resource create fs-data Filesystem device="/dev/cluster_vg/cluster_lv" directory="/var/apphome/application-data/varfiles/" fstype="gfs2" "options=noatime" op monitor interval=10s on-fail=fence clone interleave=true
$ pcs resource create tomcat ocf:heartbeat:tomcat params java_home="/opt/java/jre/" catalina_home="/opt/tomcat/" tomcat_user="tomcat" catalina_pid="/opt/tomcat/work/catalina.pid" op monitor interval="30s" on-fail=fence
In the above command, note on the java_home. You will need to change the paths as per actual in your environment. In case you need tomcat to run on both the nodes then just add 'clone interleave=true ordered=true' at the end of the above command. Ensure that you have installed tomcat and jre prior to creating this resource. Help is available in google.

Now lets create constraints so that the resources are started in right order and on the right node.

$ pcs constraint order start dlm-clone then clvmd-clone
$ pcs constraint order start clvmd-clone then fs-data-clone
$ pcs constraint order start fs-data-clone then tomcat

If you need apache then just run the below command. lets create constraints so that the resources are started in right order and on the right node. In case you need tomcat to run on both the nodes then just add 'clone interleave=true ordered=true' at the end of the below command.

$ pcs resource create Apachehttpd ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf op monitor timeout="1m" interval="10"
Now we will need to tell pacem maker to start Apache service on the same node as tomcat. (Doesn't apply if you are using Active/Active cluster).

$ pcs constraint colocation add Apachehttpd with tomcat
If everything goes well, you should be able to see something like this in 'pcs status'.
$ [root@node1 ~]# pcs status
Cluster name: mycluster
Last updated: Tue Mar 22 16:36:08 2016          Last change: Fri Mar 18 11:33:07 2016 by root via cibadmin on gfs2
Stack: corosync
Current DC: gfs2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 9 resources configured

Online: [ node1 node2 ]

Full list of resources:
 
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started node1
 vmware_soap    (stonith:fence_vmware_soap):    Started node1
 Clone Set: dlm-clone [dlm]
     Started: [ node1 node2 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node1 node2 ]
 Clone Set: JiraFS-clone [JiraFS]
     Started: [ node1 node2 ]
 JiraService    (ocf::heartbeat:tomcat):        Started node1
 Apachehttpd    (ocf::heartbeat:apache):        Started node1
PCSD Status:
  node1: Online
  node2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
The above setup with DRBD is for testing and not for the production. If you are planing to setup for production make sure you have a decent shared storage. The design is not very stable with DRBD. You should get a good decent shared storage.
To use Apache for tomcat you have 2 options. mod_proxy and mod_jk. Based upon your requirement, you can use the right one and configure accordingly. Help is available in google.

Friday 8 January 2016

Automate backup with python script

Automate Backup of a Folder

You may have an important folder in your Server that your may want to backup regularly. In this blog, I will explain how your can automate the backup with a simple python script. I have used Ubuntu Linux in this blog. You can use this method in any flavor of Linux and in Windows as well. For windows most of the steps change but the logic remains same.

Step 1:
Download python script pyCompry.py from https://sourceforge.net/projects/pycompry/

Step2:
You will need Python 3.3. Install Python3 (if it is not installed already) on the server. Type below command in the shell.

apt-get install python3

To confirm if the python is installed correctly, just type "python3 --version" command on the CLI. It should return the python version number.

Step3:
Now locate the folder that you would like to backup regularly. Ensue that you have necessary permission on source and destination path.

python ~/pyCompry.py -h
(This will show the help)

Run below command to manually execute the backup (please change -i and -o with actual paths)

python3 ~/pyCompry.py -i /var/somesourcepath/ -o /mnt/somemountedremotepath/



Step4:
Now you may use crontab to schedule this script to automate the backup. Type:

sudo crontab -e

Now in the crontab, enter the below line. Edit the line to correct the timing you want to schedule and the actual paths.

45 04 * * *python3 ~/pyCompry.py -i /var/somesourcepath/ -o /mnt/somemountedremotepath/

The above line will configure crontab to schedule the script to run at 00:30 every night. So you may want to edit the line with whatever time you want it to run.


You may mount a remote path of your DR server on this server and schedule to take backup. Since this script compress the data, it might help you save some time on the transfer over slow networks.

For more details, refer to WiKi
https://sourceforge.net/p/pycompry/wiki/Home/

Tuesday 16 September 2014

Install OTRS on Centos with Oracle as Backend Database

In this blog, I wanted to share my experience on how I deployed OTRS with Oracle as backend database. Initially during this deployment, I have come across multiple errors and obstructions. But I managed to fix everything by referring to multiple guides and forums. I am writing this blog to share with you the correct method of installing OTRS with Oracle as a backend data base, based on my leanings.

Initially I couldn't get OTRS running. I was getting these errors.

"/var/log/httpd/error_log was recorded with errors as "[error] install_driver(Oracle) failed: Can't load '/usr/local/lib64/perl5/auto/DBD/Oracle/Oracle.so' for module DBD::Oracle: libocci.so.12.1: cannot open shared object file: No such file or directory at /usr/lib64/perl5/DynaLoader.pm line..."

AND

" [error] install_driver(Oracle) failed: Attempt to reload DBD/Oracle.pm aborted.\nCompilation failed in require at (eval 169) line 3.\n\n at /opt/otrs//Kernel/System/DB.pm"

However, after referring to multiple guides and forums, I discovered the correct method of deploying OTRS with Oracle as backend database.

The correct method of deploying OTRS with Oracle Database.

The packages you will need here are Oracle Instant Client 12.1.0.2, OTRS 3.3, httpd and required httpd perl modules. Refer to OTRS deployment guide for information on required httpd modules.

Download and Install Oracle Instant client packages. You will need a active oracle login to download it.

oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm
oracle-instantclient12.1-devel-12.1.0.2.0-1.x86_64.rpm
oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm

To Install these packages, login as root and run this command
$ rpm -Uvh oracle-instantclient12*.rpm

Create oracle.sh under /etc/profile.d/, with below environment variable.
$ vi /etc/profile.d/oracle.sh 
#Add these environment variables.
export ORACLE_HOME=/usr/lib/oracle/12.1/client64
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin
export PATH

Execute this script to set the required environment variables in your current session.
$ sh /etc/profile.d/oracle.sh

Verify if the Oracle Instant Client is functioning.
$ su - root
$ echo $ORACLE_HOME
#It should show you the path /usr/lib/oracle/12.1/client64
$ sqlplus /nolog

If you see a SQL Prompt then Oracle client is working. If otherwise, probably you downloaded a wrong Oracle Instant Client.

Next step is to install DBD::Oracle module.
$ yum install perl-DBI
$ wget http://search.cpan.org/CPAN/authors/id/P/PY/PYTHIAN/DBD-Oracle-1.74.tar.gz
$ tar -xvf DBD-Oracle-1.74.tar.gz
$ cd DBD-Oracle-1.74
$ perl Makefile.PL -V 12.1
$ make install

After the installation of DBD::Oracle, create necessary links and cache for DBD libraries.
$ vi /etc/ld.so.conf.d/oracle.conf
#Insert this line
/usr/lib/oracle/12.1/client64/lib

Then run this command
$ ldconfig -v

Now restart/start httpd
$ /etc/init.d/httpd restart

This should bring up the OTRS. Please like my post if this helped you.

Thursday 11 September 2014

Enable Replication on MongoDB Database with very little downtime

In this blog, I wanted to share my experience on how we handled replication of 3.5TB MongoDB database with little downtime. You may wonder why can't I just enable replication by enabling 'replicaset' on the MongoDB configuration, doing which will have no downtime. The fact is that, if enabled, synchronization will never succeed due to the huge data size and oplog in MongoDB. So our last option was to manually synchronize the database files, bring up the secondary server and then attempt the replication. Stopping the primary server for copy would require a huge downtime due to size of the data base and the bandwidth. So we had to first build a strategy to minimize the database downtime during the copy process.

We had to choose a right tool for copying database files. We tried multiple possibilities such as NFS, RSYNC etc but finally chose SCP as the best tool. We first created list of all the files in the source folder and created 4 scripts to copy file-by-file. We named these scripts as scpcp1.sh, scpcp2.sh, scpcp3.sh and scpcp4.sh with SCP commands in it. 4 scripts were created so that 4 concurrent instances of SCP can be run, or just simply for increasing the threads.

So the procedure is as follows. Before you begin, you need to make sure that primary MongoDB server is running in standalone mode. To do so comment out #replicaset /etc/mongod.conf 

Here is one example SCP command in these scripts. 

 scp -r -p -c arcfour128 root@192.168.1.150:/mongodb/database/mgdbfile.1 /mongodb/database/.

-p preserves the file created, modified and accessed date. Without this argument, next rsycn will copy all the files again.
-c arcfour128 This change the Ciphers hence drastically speeds up the copy.

Please note that you will need to add the ssh keys to the source server to avoid password authentications. You can do it by following these steps.

Once the SSH keys are added, we can run the scripts as background process. Use ampersand symbol to start it in the background process. And then disown the process. 
$ sh scpcp1.sh &
$ sh scpcp2.sh &
$ sh scpcp3.sh &
$ sh scpcp4.sh &
$ disown -a

While the copy is in process, we had to plan for RSYNC. I am always not in favor of RSYNC because it may make changes to the source files. We had to be sure that there are no writes to the source while doing RSYNC. To overcome this concern, we decided to create a new user in the source server with read-only access to the source path. 

First we created a normal user in source server as rsyncuser.
$ useradd rsyncuser
$ passwd rsyncuser

Since the source path owner is mongod user and the permission on the source path is 664, the newly created rsyncuser had only read access. Now we are ready to do RSYNC.

Copy of 3.5TB through SCP took around 8 hours to complete with 1Gbps network.

Now we issued RSYNC dry run command to see the modified files at source in these 8 hours.

rsync -avzn --del -e "ssh -c arcfour128" --log-file="/home/rsyncuser/rsync2.log" rsyncuser@192.168.1.150:/mongodb/database/ /mongodb/  

This command listed the files that were modified at source after while the copy was in progress. Remember we started copying the database files while MongoDB was running. This was to avoid big downtimes.
Once RSYNC DryRun listed the file names, we copied them individually using scp (command given above). We did this repeatedly for 2 times to have minimum level. Once the number of files came down, we intiated db.fsynclock() to mongoDB so that it stops writes to the data base. Caution: This command will lock database and make it nonoperational. Hence please be sure you plan this downtime. You need to keep the mongo shell open until the rsync completes.

rsync -avz --del -e "ssh -c arcfour128" --log-file="/home/rsyncuser/rsync2.log" rsyncuser@192.168.1.150:/mongodb/database/ /mongodb/  

Once the RSYNC is completed, you may now start the monogDB service in replicaset mode in the secondary server. And then issue db.fsyncUnlock() in the source server of which the shell is already open.
Check the replication status by issuing rs.status() in any of the node.
Please like my post if this helped you.

Wednesday 10 September 2014

V2V Migrate Windows VMs from Citrix XenServer to RHEV

In this blog, I have mentioned the the procedure involved in migrating a Windows 2008 R2 guest VM from Citrix XenServer to Red Hat Enterprise Virtualization (RHEV) platform. Basically its Xenserver v2v hence this method can be used for all V2V. Its rather safe method since there are no changes to the source VM. I have used the disk cloning method to transfer the VM from Citrix XenServer to Red Hat Enterprise Virtualization (RHEV) platform. However, after the VM is transferred to Red Hat Enterprise Virtualization (RHEV) platform, Windows Guest VM will intially throw Blue Screen error. I have mentioned the work around to overcome the blue screen error and then run the Windows Guest normally. The tools used here clone is Clonezilla. Clonezilla live CD might have a blank screen issue in Citrix XenServer hence, I have used Archlinux Live CD, which is bundled with Clonezilla.

The step-by-step procedure to convert a Windows 2008 R2 guest VM from Citrix Xenserver to Red hat RHEV is as below.

Step 1: Download the Archlinux Live CD ISO image and upload it into your XenServer ISO storage. Shut down the guest VM you want to migrate and boot it using ArchLinux Live CD. If you don't have a ISO mount created in XenServer, you may follow these steps to add it. Note: Please choose "Boot Arch Linux (i686)" while booting. Booting into x86_64 might fail or end up in blank screen.

Step2: Once booted into Live CD, type Clonezilla and press enter. This will take you through disk cloning process. Follow these steps to create a clone image of your windows guest. Skip the initial steps in the clonezilla guide and start from 'Choose "device-image" option'. It's too tedious map your image external disk in the hypervisor and then mount it in Acrh Linux, so avoid using local disk for storing image. Clonezilla gives you multiple options to store image on the network. I chose ssh-server and saved it in RHEV Manager. It might take several minutes to create image of your disk depending on how the size of disk and the data inside.

Step3: Create a new VM in RHEV. Keep the guest disk size as same as the source guest VM in XenServer. Ex:- If the disk that you clone above is 500GB in size, create 500GB disk in the guest VM that you create in RHEV.

Step4: Upload the same ArchLinux ISO image that you have downloaded above to your ISO repository in RHEV. Select "Run Once" and choose the ArcLinux ISO image and boot from it.

Step5: Once booted into Archlinux Live CD, type Clonezilla and press enter. Follow these steps restore disk image created in step2 to the guest VM newly created in step3. Skip the initial steps in the clonezilla guide and start from 'Choose "device-image" option'. Choose the image that your created in Step2 and restore it. It might take several minutes to restore the image.

Step6: This step is important. After you restore the image and boot Windows Guest in RHEV, it will throw Blue Screen. But dont panic. It's just because the guest VM doesn't have VirtIO SCSI Drivers hence it cant detect the Hard disk.

Now shut down the guest VM in RHEV. You need to start the VM again by clicking on 'Run Once'.


Mount the 'virtio-drvers' floppy image.

Step7: Boot the guest VM into recovery mode using "Launch Startup Repair". If you don't see the option then press immediately F8 after POST and select "Repair Your Computer".

Once booted into recovery mode, select the keyboard layout and click on next.

Step8: In the "System Recovery Options" click on "Load Drivers".
Select the floppy drive and select A:\amd64\Win2008R2\.
This will list all available drivers in the floppy disk. Select viostore.inf

In the next screen, Choose "Red Hat VirtIO SCSI Controller" and click on "Add Drivers...".

This should detect hard disk and show you "Windows Server 2008 R2". Select Restore your computer ......." and click "Next".
Next step will try to detect the system restore image. Ignore the error and Click on "Cancel" to come out of image restore process.
In the next screen, click on "Command Prompt" to open command window.

Step9: In the command type 'diskpart' and press enter. Then type 'list volume' to see the drive letter of your Windows Guest Disk. In the below example the driver letter is C:\
Now you need to load the VirtIO drivers to your Windows Guest. To do it, type the below command and press enter.
Dism /image:E:\ /Add-Driver /driver:A:\amd64\Win2008R2 /recurse
You will see a success message once the drivers are loaded into Windows guest.

Step10: Now exit from "Command Prompt" and reboot Windows Guest VM. After reboot, boot the Windows Guest VM into normal mode.
This workaround worked fine with my environment. I had Windows 2008R2 guest VM with single disk of 100GB running on Citrix XenServer 6.2. This procedural might work for V2V and P2V converstion from any Virtualization Software to Red hat RHEV. This will might also work for all other flavors of Windows 2008 and Windows 2012. You just need to change the path of the drivers in the command in step9.

Please like my post if this helped you.