Principal Engineer and data infrastructure leader with 15 years of experience building and scaling high-performance distributed data systems.

Experience

PinterestPrincipal Engineer

2017 — Now

San Francisco Bay Area

Led cross-functional strategy for Pinterest’s Big Data Compute and Storage Platforms: 500+ PBs on S3, 100K+ tables, 400K+ daily compute jobs (20K+ Spark, 1K+ Trino nodes).

Apache Iceberg Adoption (0 to 1)

Championed and led Iceberg platform adoption, securing investment and scaling the team (1 to 5+ engineers) by demonstrating high business value (improved quality/governance).

* Engineered integration tooling: Custom Spark Catalog, Thrift support, and Hive-to-Iceberg auto-migration service.

* Enhanced Iceberg: Added support for two-level Parquet lists/maps and long-running HMS transactions.

* Scalable Deletions: Enabled Row-Level Data Deletions, boosting capacity by 10x and cutting compute cost while ensuring compliance.

Cross-Platform Initiatives & ML Acceleration

* Governance: Led table/workflow governance to reduce cost and improve data lake quality.

* Modernization: Contributed to platform modernization via Moka Project (Spark on EKS) for large-scale data processing efficiency.

* ML Enablement: Contributed Fast Feature Backfill support, drastically accelerating ML feature iterations and model time-to-market.

Spark SQL Platform Leadership & Growth (0 to 70K+ Jobs)

Founded and led the Spark SQL platform (0 to 1), establishing it as the primary data processing engine. Drove adoption to over 70,000 jobs per day and scaled the team (1 to 10+ engineers).

* Engineered E2E infrastructure (Terraform/Puppet, monitoring) with key features: Scalable direct S3 committer, split-splitting for compressed codecs, custom Thrift schemas, auto-tuning, and Apache Livy integration.

* Security: Custom-built high-scale Fine-Grained Access Control (FGAC) on BDP using STS tokens to meet stringent security and scale requirements.

Big Data Platform (BDP) Foundation & Scaling

Founding member of the team that created the in-house BDP, transitioning off third-party vendors. Specifically led the build and scale of the SQL platform (Hive, Parquet, Presto).

The Apache Software FoundationApache Gravitino PMC member

2025 — Now

The Apache Software FoundationApache Sentry committer

2015 — Now

I am a Committer on Apache Sentry, a highly modular system for providing fine grained role based authorization.

ClouderaSoftware Engineer

2014 — 2017

San Francisco Bay Area

Worked on CDH, an open source platform for enabling enterprises meet their Big Data needs. With focus on real time use cases, I mostly worked on Apache Kafka, a distributed pub sub messaging system. Security is of utmost importance when we talk about compliance and enterprise readiness, and that exposed me to Apache Sentry, a role based authorization control system. Partially worked on couple more open source projects like Apache Hive and Apache Parquet.

SchlumbergerSoftware Engineer

2012 — 2014

Adapted high volume seismic data computations for Intel's Xeon Phi architecture.

Evaluated Intel Xeon Phi and Nvidia GPU for adapted computations on Schlumberger's High Performance Computing clusters. Evaluation was primarily focussed on performance gain vs cost of additional hardware.

Developed ToolScope Framework, a pluggable software framework for Schlumberger tools.

Education

The Ohio State University

M.S.

Indian Institute of Technology (Indian School of Mines), Dhanbad

Experience+4

Education

M.S.

B.Tech.

Experience