On the Hail (hail.is) team, I contributed to an open source analytical database system for petabyte-scale genomic data. Specifically, I worked on the compute infrastructure to provide scientists with a serverless, multi-tenant platform for analysis that is fast, cheap and easy to use. Some of the fun projects I got to work on include:
•Digging into container runtime internals to achieve <80ms container startup times for interactive pipeline prototyping
•Developing infrastructure as code to allow anyone to deploy the Hail system on both GCP and Azure
•Adding a monitoring stack and establishing practices for performance work on the system
•Integrating the Hail system into data platforms like Terra for easier deployment and broader adoption
•Mentoring co-op students and computational biologists in contributing to our project at all levels of the stack, from monitoring and performance to adding GPUs