Recognia, Ottawa, Canada
October 2011 to July 2012
- Self thought the OpenSource technologies used at Recognia in a short span of time. These technologies included Linux CentOS, PostgreSQL, MySQL, pgbouncer, pgfouine, SkyTools (pgq and londiste), Cacti, Munin and Nagios.
- Reviewed the existing monitoring graphs in Cacti and identified some key missing ones. Setup CPU and I/O monitoring for the entire server farm and provided training on Cacti to the development team.
- Produced a number of documents such as Server Inventory, Production Recovery Procedures, PostgreSQL Environment Setup, Database Backup and Recovery, and Setting up Standby Databases. Archived away all the old and obsolete documents.
- Reverse engineered the databases and created the first draft of the company’s data models.
- Identified a major gap in database backup strategy and coded a new set of scripts to eliminate this gap. Deployed these scripts to Production and RC (Release Candidate) with a new weekly schedule.
- Re-vamped the process of copying production databases to RC and Test environments to reduce the down time from 1 day to 1 hour.
- Successfully migrated an old production database server to a new blade server within the 4-hour allowed time window. The blade server was built from scratch and the databases were migrated through a backup and restore operation.
- Put forth a new proposal for warm standby and disaster recovery strategy and got it approved by the architect team. Setup a test version of the solution in RC.
- Performed general DBA task such as diagnosing database errors, analyzing the top SQLs, projecting database growth, resolving database replication issues with londiste, and fine tuning database configuration for fixing various performance problems.
- Played a lead role in migrating the DR databases to the Amazon Cloud. Used RightScale as the cloud management tool and built the cloud servers through RightScripts.
- Provided on-call support on a weekly rotation basis. In a number of occasions helped the team resolving production issues while I was not on-call.