2026 — Now
San Francisco, CA
Integrations @ Clay
2024 — 2026
San Francisco, CA
Worked on building Ray Data (https://docs.ray.io/en/latest/data/data.html):
Improved batch inference scalability, performance, and cost. Designed and implemented the download expression API (https://www.anyscale.com/blog/ray-data-scalable-data-processing-for-ai-workloads#reducing-block-size-explosion-with-download-expression) to efficiently load data from URIs, added large file chunking for JSONL and Parquet files (providing 2x speedup, 6x reduction in cost https://www.anyscale.com/blog/announcing-anyscale-runtime-powered-by-ray#image-batch-inference), and improved actor pool scaling to support 1000s of actors. More multimodal inference benchmarking here https://www.anyscale.com/blog/ray-data-daft-benchmarking-multimodal-ai-workloads.
Led Ray Data’s contribution to the cross-team Ray Data Dashboard initiative, partnering with the Observability team to design and implement detailed runtime metrics, dataset metadata export APIs, and per-operator performance breakdowns improving data pipeline visibility and reducing debugging time for customers. Product release blog here https://www.anyscale.com/blog/ray-train-data-dashboard.
Designed and implemented the Issue Detection Framework (https://github.com/ray-project/ray/tree/master/python/ray/data/_internal/issue_detection), enabling proactive detection and alerting of performance regressions. This is foundational for future observability features and is now integrated into the Ray Data Dashboard.
Standardized the Anyscale Jobs (https://docs.anyscale.com/platform/jobs/manage-jobs) and Job Schedules (https://docs.anyscale.com/platform/jobs/schedules) SDKs / CLI, adding new project / cloud support, and consistent API semantics.
Collaborated across teams and with enterprise customers, served as release manager for multiple patch releases, and provided on-call and summit support.
2020 — 2022
San Francisco, California, United States
Designed and implemented the internal Remediation Engine. This system allows for the further remediation of healthcare data previously loaded into a Snowflake data warehouse through the use of Snowflake external functions. The external function was connected with an AWS API Gateway to an auto-scaling deidentification service hosted on a Kubernetes cluster.
On-boarded new data source to the Mortality Data product. Added Python typing to existing code and used Python Abstract Base Classes to make a more polymorphic and extensible system. Product revenue grew to $2m ARR with more than half of users receiving the new source.
Created initial infrastructure for new Routing service to fulfill requests for patient charts. Established Azure Kubernetes cluster, Flask server, PostgreSQL database, connections to PySpark, etc. Infrastructure managed with infrastructure as code tools including Terraform and Kubernetes manifests.
Assisted in Datavant Match algorithm development by testing various pre-tokenization transformations of first and last names. Extended the Match application (Java) to compute patient record matches using demographic patient data.
Developed and standardized a debugging interview question. Extended the existing Python question to have both Java and Javascript version. Interviewed 150+ candidates using this and other questions.
Member of the Diversity, Equity, and Inclusion Council and the Culture Committee to provide feedback to the executive team.
Berkeley, California, United States
Served as a co-instructor for Data Structures class (CS 61BL). Hired, managed, and led a team of 27 staff members. Developed and ran the first ever fully online version of the course. Responsible for giving weekly lectures, holding office hours, creating new quizzes / exams, and updating / modifying 24 labs and 4 projects.
Education
2016 — 2020
UC Berkeley College of Engineering
Bachelor of Science (BS)
2016 — 2020
2012 — 2016
Glenelg High School
2012 — 2016