I enjoy building scalable and reliable distributed systems, and have extensive experience working with Apache Spark, Apache Flink, Trino, and more. Currently, I'm leading an effort on compute convergence at LinkedIn using Apache Flink.

Experience

LinkedInStaff Software Engineer

2020 — Now

Mountain View, CA

QuboleStaff Engineer

2019 — 2020

QuboleMember Of Technical Staff

2016 — 2019

Mountain View

● Worked on Spark CBO stats estimation fixes for Aggregate and Sort operators. Speed up of almost 2x in select queries like Q83 of TPCDS. Contributed back to open source - (commit id b1857a )

● Fixed deadlock issue in Spark’s UnsafeExternalSorter affecting one of the largest Qubole customer workloads. Contributed back to open source - (commit id 6c4552c6 )

● Spark - S3 Select connector for Qubole Spark to push down projects and filters for CSV and JSON automatically; TPCDS benchmarks Geo-mean - 2.9x ; Max speedup - 5x (Blog - https://www.qubole.com/blog/amazon-s3-select-integration/)

● Serverless Spark on AWS Lambda - Spark executors completely runs as Lambda functions with S3 being the external storage to manage shuffle data (Blog - https://www.qubole.com/blog/spark-on-aws-lambda/)

● Worked on Qubole Spark autoscaling based on stage progress - pluggable, custom auto-scaling policies can be defined.

● Implemented Workload based Scaling limits leveraging Apache YARN’s Fair Scheduler queue limits.

● Implemented HDFS auto-scaling , scales up nodes based on DFS disk capacity and incoming data velocity.

● Mentoring new grads and interns; PR reviews; Spark version upgrades; On-call, Customer issues troubleshooting etc.

MapR TechnologiesSoftware Engineer

2014 — 2015

San Jose

● Worked on a Real time performance monitoring/troubleshooting dashboard for large MapR Hadoop clusters; Contributed from the POC stage

● Worked on Apache projects like Drill, Spark and Hive; Contributed to Apache Drill; Participated in

user/dev groups of other open source projects like Apache Samza.

● Developed Real time Log analysis for MapR Hadoop clusters using ELK stack.

Intel CorporationSoftware Engineer Intern

2013 — 2014

Chandler

● Designed and developed a proof of concept on "Machine Learning as a Service" - Cloud based framework.

● ML as a service framework exposes machine learning algorithms available in various packages (Weka, Scipy, Numpy, Mahout etc) as web services.

● Developed visualizations using D3.js to analyze data samples over a timeline graph

● Set up a single node Apache Hadoop cluster to demonstrate the idea.

Education

Anna University Chennai

Bachelor of Engineering (B.E.)

Arizona State University

Experience+2

Education

Bachelor of Engineering (B.E.)

Masters in Computer Science

Experience