Overview:
•Architected and implemented a novel end-to-end data pipeline and visualization platform to analyze off-target gene editing effects of proprietary Cas proteins.
Key Contributions:
•ETL workflows on AWS services to process mutations from CRISPResso2 into clean, actionable data. Established comprehensive data infrastructure using AWS + Terraform.
•Web-based analytics platform using Streamlit, integrating interactive Plotly visualizations and a customized genome browser (Gosling). Used by scientists to analyze off-target effects and editing efficiency metrics, providing insights on amplicons from genomic scale down to base-pair resolution.
•Key support to scientists, including Jupyter workflows and D3 visualizations to track experimental sample metadata.
•Automated deployment workflows using GitHub Actions, ensuring reliable, scalable updates to data processing infrastructure and to maintain code quality.
Technologies Used:
•React/Python/SQL, AWS (S3, Athena, Glue, CloudFormation), Docker, Terraform, GitHub Actions, Streamlit, Material UI, Plotly, D3.js, Gosling, Boto3, Pydantic, CRISPResso2, batch processing, step functions