Worked closely with Big Data Platform team and Data Scientists on improving risk modeling.
Technology Used:- Hadoop, Java 8, Spark Web Framework, Dr. Elephant, Python, MySQL, PHP etc.
•Implemented a tool(soon to be open source) for HDFS File system current state analysis.
•Added Error handling and performance tuning in fetching big cluster (around 2500 data nodes)
•Used Java 8 Stream API and lambda functions for parallelism.
•Implemented benchmarking test in place of around 3000 individual JUnit test cases.
•Developed delete framework to abolish empty files and directories in a timely manner.
•Created patch for File Crusher to merge sequential small and tiny files into a big one.
•Developed architecture for making SQL connector for tracking each transaction of Name Node.
•Designed PHP dashboard for fetching details from ServiceNow, Jira, Zabbix using python.
•Automate Kerberos setup for new batch accounts and ETL flow for loading Hadoop Role owners.