# Ashish Singh

> Principal Engineer @ Pinterest

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/ashishsingh

Principal Engineer and data infrastructure leader with 15 years of experience building and scaling high-performance distributed data systems. At Pinterest, I spearheaded the development of the Big Data query and storage platforms from inception (0-to-1), leveraging technologies like Apache Iceberg, Apache Spark, Apache Parquet and Trino. I maximize team impact by mentoring and scaling senior technical leads, growing organizational capabilities multiple-fold. An active industry voice and dedicated open-source contributor (Apache Gravitino PMC member/committer), I frequently share insights at major forums, including AWS re:Invent, Iceberg Summit, Subsurface, Data+AI summit, Apachecon, Prestocon and several popular bay area meetups.

## Work Experience
### Principal Engineer @ Pinterest
Jan 2017 – Present | San Francisco Bay Area
Led cross-functional strategy for Pinterest’s Big Data Compute and Storage Platforms: 500+ PBs on S3, 100K+ tables, 400K+ daily compute jobs (20K+ Spark, 1K+ Trino nodes).
---
Apache Iceberg Adoption (0 to 1)

Championed and led Iceberg platform adoption, securing investment and scaling the team (1 to 5+ engineers) by demonstrating high business value (improved quality/governance).
* Engineered integration tooling: Custom Spark Catalog, Thrift support, and Hive-to-Iceberg auto-migration service.
* Enhanced Iceberg: Added support for two-level Parquet lists/maps and long-running HMS transactions.
* Scalable Deletions: Enabled Row-Level Data Deletions, boosting capacity by 10x and cutting compute cost while ensuring compliance.
---
Cross-Platform Initiatives & ML Acceleration

* Governance: Led table/workflow governance to reduce cost and improve data lake quality.
* Modernization: Contributed to platform modernization via Moka Project (Spark on EKS) for large-scale data processing efficiency.
* ML Enablement: Contributed Fast Feature Backfill support, drastically accelerating ML feature iterations and model time-to-market.
---
Spark SQL Platform Leadership & Growth (0 to 70K+ Jobs)

Founded and led the Spark SQL platform (0 to 1), establishing it as the primary data processing engine. Drove adoption to over 70,000 jobs per day and scaled the team (1 to 10+ engineers).
* Engineered E2E infrastructure (Terraform/Puppet, monitoring) with key features: Scalable direct S3 committer, split-splitting for compressed codecs, custom Thrift schemas, auto-tuning, and Apache Livy integration.
* Security: Custom-built high-scale Fine-Grained Access Control (FGAC) on BDP using STS tokens to meet stringent security and scale requirements.
---
Big Data Platform (BDP) Foundation & Scaling

Founding member of the team that created the in-house BDP, transitioning off third-party vendors. Specifically led the build and scale of the SQL platform (Hive, Parquet, Presto).

### Apache Gravitino PMC member @ The Apache Software Foundation
Jan 2025 – Present

### Apache Sentry committer @ The Apache Software Foundation
Jan 2015 – Present
I am a Committer on Apache Sentry, a highly modular system for providing fine grained role based authorization.

### Software Engineer @ Cloudera
Jan 2014 – Jan 2017 | San Francisco Bay Area
Worked on CDH, an open source platform for enabling enterprises meet their Big Data needs. With focus on real time use cases, I mostly worked on Apache Kafka, a distributed pub sub messaging system. Security is of utmost importance when we talk about compliance and enterprise readiness, and that exposed me to Apache Sentry, a role based authorization control system. Partially worked on couple more open source projects like Apache Hive and Apache Parquet.

### Software Engineer @ Schlumberger
Jan 2012 – Jan 2014
Adapted high volume seismic data computations for Intel's Xeon Phi architecture.
Evaluated Intel Xeon Phi and Nvidia GPU for adapted computations on Schlumberger's High Performance Computing clusters. Evaluation was primarily focussed on performance gain vs cost of additional hardware.
Developed ToolScope Framework, a pluggable software framework for Schlumberger tools.

### Graduate Research Assistant @ The Ohio State University
Jan 2010 – Jan 2012
Worked on MVAPICH2 - a high performance MPI implementation over InfiniBand, 10GigE/iWARP and RoCE in Network-Based Computing Laboratory.

Software developed as a part of my research at OSU is part of the MVAPICH/MVAPICH2 Open Source software packages and is installed on some of the largest InfiniBand clusters in the world. My M.S. thesis, Optimizing All-to-all and Allgather Communications on GPGPU Clusters, was featured in hgpu.org.

### Software Engineer @ Samsung India Software Operations
Jan 2010 – Jan 2010 | Bangalore
Worked on LTE protocol stack for Samsung mobile phones.

### Visiting Researcher @ Yuan Ze University
Jan 2009 – Jan 2009 | Zhong Li, Taiwan
Was involved in research and performance study of Relay Stations in MMR Netowrks (IEEE 802.16j).

### Visiting Research Associate @ Curtin University
Jan 2008 – Jan 2008 | Perth, Australia
As a visiting research associate, developed a Research Management System using Automated Literature Review. Parsed research papers to extract opinions of authors about the cited papers. Classified opinions into positive and negative opinions.


## Education
### M.S. in Computer Science and Engineering
The Ohio State University

### B.Tech. in Computer Science and Engineering
Indian Institute of Technology (Indian School of Mines), Dhanbad


## Contact & Social
- LinkedIn: https://linkedin.com/in/singhkashish

---
Source: https://flows.cv/ashishsingh
JSON Resume: https://flows.cv/ashishsingh/resume.json
Last updated: 2026-04-12