Develop and improve ETL pipelines for claims and attribution data, handling diverse data sources (including BCDA [Beneficiary Claims Data API]) and data formats such as NDJSON (FHIR), Excel workbooks, CSVs, and fixed-width
•Migrating the existing Airflow based ingestion processes to Databricks using PySpark, employing batch jobs for attribution data and Spark structured streaming for raw claims data
•Collaborated with internal stakeholders and SMEs to integrate a new ACO REACH (Accountable Care Organization Realizing Equity, Access, and Community Health) attribution data ingestion pipeline into an existing MSSP (Medicare Shared Savings Program) pipeline, ensuring the data was captured and interpreted correctly and in alignment with their specific requirements and objectives
•Created a FHIR-inspired (Fast Healthcare Interoperability Resources) intermediate data format to standardize raw data from any payor. Implemented solution as dataclasses with format conversion and validation (Python, PostgreSQL).
•Utilized collaborative visualization software (Miro) for system design and planning, project management, visualizing process challenges, and gathering feedback
•Recommended a more comprehensive QA process to increase scrutiny on deployed work and foster a culture of continuous learning