• Installed and configured the Hadoop ecosystem which includes Pig, Hive, HCatalog, Cloudera Manager, Pentaho.
Hands on experience with setting up clusters on different distributions of hadoop such as, Cloudera, Hortonworks, MapR and IBM BigInsights.
• Writing Oozie workflow to schedule jobs according to the business logic and resolving dependencies and also implemented scheduler.
• Kerberized the ssh communication between the nodes in the cluster to prevent unauthorized access to the cluster.
• Installed the monitoring systems such as Nagios to checks the health of the cluster. To keep checks on crucial reports I wrote bash scripts to monitor critical components and deliver instantaneous messages.
• Used to migrate log data from server to Hadoop cluster using Flume.
• Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
• I had the responsibility to filter out the noise from the obtained data and transfer the required fields to the Hive tables