California, United States
Currently enhancing the Flink SQL ecosystem to enable analysts and ML engineers to easily build and operate streaming pipelines, integrating Kafka and REST catalogs for greater flexibility and expressiveness. Working on improving observability and data lineage tools to strengthen job traceability and debugging. Actively coordinating and assisting in tuning and optimizing large-scale Flink + Iceberg jobs to improve performance and scalability.
Built scalable ML platforms and tools to accelerate anti-fraud development in Trust and Safety team. Designed real-time Flink-based sequence models to detect account takeover patterns in iMessage, optimizing Flink+RocksDB state (~5TB) and creating reusable Flink SQL frameworks for streaming aggregates. Set up JupyterLab environments to improve experimentation and streamlined data access from Snowflake, HDFS, and S3. Led Spark batch job migration from Hive to Iceberg format, achieving faster runtimes through predicate pushdown and optimized partitioning and storage layouts.