Distributed data systems, sports fan.
Experience
2024 — Now
2024 — Now
New York, NY
• Worked on a predicate-level market data indexing platform with day-zero + incremental indexers and a mission-critical gRPC service; deployed across SQL Server (on-prem) and GCP Postgres with multi-region support.
• Led development of Archive read APIs for a bitemporal, versioned data subsystem, enabling current-state and historical access to archived market data and driving migration from legacy systems to a unified next-generation data platform.
• Designed and migrated semantic data quality (DQ) framework to R4B platform, eliminating silent failures and reducing evaluation time for hundreds of SPARQL rules 5x; authored internal best-practices for SPARQL/LMDB optimization and built tooling to support weekly production releases and 24/7 DQ monitoring.
• Built entity-to-entity relationship history indexing system, reducing storage footprint by 80% and enabling advanced historical link analysis previously infeasible under predicate-level indexing.
2022 — 2024
2022 — 2024
New York, New York, United States
Responsible for database schema objects metadata that are used to support SQL queries in CockroachDB, a scalable, highly available, and distributed SQL database.
Key contributor to the development of CockroachDB's next-gen schema change infrastructure that provides online and transactional schema change experience. This is an industry first effort.
2020 — 2021
Austin, Texas, United States
CS347: Data Management (Fall 20), CS343: Artificial Intelligence (Spring 20, Spring 21, Fall 21)
2021 — 2021
2021 — 2021
Built a Spanner-backed datastore for an internal service that makes ML research artifacts discoverable, reusable, and reproducible through effortless lineage tracking.
This storage backend greatly outperformed the existing approach as benchmark tests showed a 2-3 orders of magnitude improvement in query latency (mean, max, stddev, p90 through p99 percentile) while sustaining 40 QPS.
2020 — 2020
2020 — 2020
Seattle, Washington, United States
Automated the process of ensuring data consistency (known as ’backfilling’) for Kindle library series grouping; Reduced operation time for backfilling from an average of 45 minutes to less than 5 seconds.