•Produced 17 pipelines to ingest R&D experimental data /Real World Evidence data, which enabled Roivant to scale the number of ongoing projects without a large increase in the headcount and enabled reducing the number of licenses of a commercial drug discovery database (CDD Vault) from 101 to 23.
•Created a containerized end-user data ingestion tool, automating the ingestion of computational data directly from an on premise high performance computing cluster.
•Created an end-user tool with extensive guardrails to enable new compound registration from generative computational workflows via API calls to a commercial compound registry. New compound registration increased from hundreds per week to several thousand.
•Built an automated cloud-based workflow to ingest experimental assay data to the data warehouse. Solution included automated triggering from new file upload to Google Cloud Storage and Cloud Functions for data processing, curve fitting, and data validation.
•Parsed and managed terabytes of PII and PHI government collected medical records from Centers for Medicare & Medicaid Services (CMS), with service using internally developed dag-workflows pipeline tool and Google BigQuery.
•Designed data pipelines for loading foxed width delimited files into Google Bigquery. Processed terabytes of PII/PHI medical records from Centers for Medicare & Medicaid Services (CMS). Executed using an in-house dag-workflows framework and optimized storage for Google BigQuery.
Ingested and cleaned the GOSTAR structure activity relationship database to Google BigQuery.
•Worked out complicated compound resources such as extracting pooled plate components from Titian Mosaic sample management software through REST API.
•Benchmarked the utility of open-source synthetic feasibility tools in the drug discovery projects.