Experience
2020 — Now
2020 — Now
Menlo Park, California, United States
Technical lead in the AI Infra:
• Context data services for GenAI apps
• Feature engineering infrastructure
• ML Developer experience
• Data science notebook platform
Technical lead in the data platform team:
• Rearchitected events ingestion pipelines to handle heavy spikes in trading activity
• Evolved Spark execution backends into multi-cluster architecture with autoscaling for high availability, reliability, scalability and cost efficiency
• Built an orchestration service for Spark batch pipelines for better integration, security and to enable infrastructure evolution and migrated the entire batch workload onto this service.
• Onboarded backend teams to use our batch processing infrastructure, with stronger availability and SLAs than typical analytical workloads
• Designed and currently working on moving all Spark workload execution to Kubernetes infrastructure
• Jump started work on building aggregation analytics infrastructure based on Apache Pinot
• Worked on cost savings initiatives and helped significantly reduce Spark infrastructure costs
• Spearheading effort to modernize workflow platform based on Airflow at Robinhood. Presented our work at Airflow Summit 2024
• Led multiple vendor evaluations for infrastructure solutions
2017 — 2020
2017 — 2020
San Francisco Bay Area
Technical lead in data infrastructure team:
● Wrote a self-service streaming compute platform based on Apache Flink and SQL. Subject matter expert in Apache Flink internals and usage recommendations. Presented the work at Flink Forward 2019
● Managed the Apache Druid ecosystem as a backend for self-service metrics framework. Details presented in Airbnb blog post
● Apache Spark infrastructure. Working on unifying batch compute on Apache Spark platform. Delivered query auditing feature. Working on a metrics platform for all Spark jobs, optimizing resource intensive Spark jobs and solutions to enhance data engineer productivity.
2016 — 2017
2016 — 2017
Los Altos, CA
Data infrastructure, web crawling & data extraction
2013 — 2016
Redwood City, CA
Developing and maintaining big and real time data infrastructure.
Projects i have led/worked on include cross cluster Hive dataset replication, authorization, performance instrumentation and improvements in our Hive query infrastructure, writing modeling data pipelines in Spark
Most recently led the development and architecture of Apache Spark based data pipeline framework, from inception & prototype phase to delivery, as its technical lead. The framework is now the main engine powering all data pipelines in our model building infrastructure. Key elements i was responsible for include:
• Defining the roadmap for the project.
• Worked on the prototype and defined overall architecture and design of the infrastructure
• All things performance: identifying all performance bottlenecks and adding features/optimizations in the framework to fix them.
• Writing big parts of the framework as well as write data pipelines on top of the framework.
• Growing and supporting the engineering team that delivered this to production.
• Project milestone planning & tracking towards delivery
• Go to person for all Apache Spark related issues in the overall engineering team.
2007 — 2013
2007 — 2013
Redmond
Core developer in the SQL Server Manageability team, working on various layers of the stack including web services, framework and UI. Has worked on SQL Server and SQL Azure releases.
Implemented web services that provide database management functionalities for SQL Azure in C#. In particular, implemented OData web service protocol, authentication protocol and associated encryption scheme, and worked on service instrumentation and deployment.
Education
Cornell University
BS
Cornell University