Experience
2024 — Now
2024 — Now
San Francisco, California, United States
I help build data infrastructure and ingestion systems that powers Slack's data warehouse, query engines and data lake.
Additionally, I also lead data governance and streaming efforts.
2021 — 2024
2021 — 2024
San Francisco, California, United States
Doing a few different things in this role:
1. Building out the Streaming Platform from scratch - I've written about this here: https://medium.com/p/4c3ee2568a76
2. Leading the Data Lake effort
3. Leading the Data Discovery and Governance effort for all classes of data at Chime
2020 — 2022
2020 — 2022
San Francisco Bay Area
SpaceML is an offshoot of NASA's Frontier Development Lab, focussed on solving the problem of climate change and facilitating space research with AI at scale.
Started by a group of citizen scientists, and industry professionals such as myself, we are solving the problem of auto detecting important weather phenomenon such as hurricanes, wildfires, polar vortexes, ice caps melting, and others from petabytes of unlabelled data.
I lead the team that is working on productionizing a self supervised reverse image search using deep learning, on earth's imagery collected by NASA satellites. The goal is to eventually make this product open source and available for any kind of datasets not limited to earth science.
In the role of a technical advisor and lead, I'm helping design the overall system and providing technical guidance to a group of citizen scientists.
2020 — 2021
San Francisco, California, United States
I work in the Realtime Data Infrastructure team at Netflix. In this role I'm helping build the next generation, self service data movement and processing platform using Kafka, Flink, Mantis, Iceberg.
Currently focussed on pioneering the next generation engine agnostic SQL abstraction over Flink, Mantis and Kafka.
Also working to improve user confidence in the platform through reliability improvements, data quality audit frameworks, end-to-end latency tracking, anomaly detection, error attribution etc.
2017 — 2020
San Francisco, California, United States
I work in the Streaming Platform team at Lyft with a focus on improving Machine Learning through the use of realtime streaming data.
With the hope of making streaming generally accessible across the company for all kinds of use cases, I have been building a self service platform that makes it easy for users to specify complex aggregations on streaming data declaratively. The platform takes care of all the heavy lifting behind the scenes(completely abstracted away from the user) like data discovery, resource provisioning, scale up, scale down, bootstrapping, schema management etc. This platform is currently used mainly for Machine Learning feature generation but there are several use cases that leverage the event driven programming capabilities.
I have presented my work at several conferences including QCon, Flink Forward, Beam Summit, Women Who Code, Scale by the Bay.
Scheduled to speak at Strata New York in September(although because of COVID-19, the conference may be conducted virtually)
Education
University of Florida