• Designed and built Whatnot's PlanetScale CDC pipeline, a real-time Change Data Capture system replicating production MySQL (Vitess) data to Kafka and Snowflake via Debezium and Kafka Connect. Authored the core RFC and technology evaluation, deployed the Strimzi-based Kafka Connect framework on Kubernetes with Helm and Argo CD, designed a declarative Data Contract configuration system embedded in schema definitions, and built connector deployment automation — enabling production tables to be replicated for analytics, search indexing, and business intelligence.
• Engineered an Flink-based real-time event distributor application that routes analytics events to dedicated Kafka topics for downstream consumers, significantly reducing consumer load to only the events they need, eliminating event delivery delays, enhancing reliability through infrastructure-level isolation, and reducing Kafka network egress by 50%.
• Re-acritected analytics events pipeline to shift event preprocessing upstream into a real-time Flink application reducing event ingestion delays into Snowflake by up to 20 minutes while also unifying event preprocessing across all event consumers.
• Scaled realtime event pipelines to support 50x current throughput and reduced kafka infrastructure costs by 80% via implementing message compression and partition scaling.
• Designed and implemented a partitioned, declarative training dataset creation framework enabling efficient, reusable training datasets—reducing compute overhead, eliminating data duplication via Snowflake external tables, and accelerating ML feature iteration.