• Built a petabyte-scale data lakehouse platform on GCP using Apache Spark, Iceberg, and BigLake, handling hundreds of millions of records daily across thousands of datasets. Serves data scientists, analysts, and product teams across the org.
• Architected a high-throughput stream processing platform on Apache Flink for low-latency workloads at scale. Four teams now run their own pipelines end-to-end with no platform dependency.
• Designed a self-service dataset ingestion platform that cut onboarding from weeks to hours, with schema evolution, deduplication, and late-arriving data handled out of the box.
• Tackled hard distributed systems problems at scale including spatial indexing, stream deduplication, and partition-aware data reconciliation.
• Shipped two Generative AI tools into production, a conversational onboarding agent and an LLM-powered debugging tool for on-call engineers.
• Own the technical roadmap and partner with product, leadership, and cross-functional stakeholders to align platform direction with business priorities.
• Lead design reviews, mentor engineers, and work closely with leadership on hiring and team growth.