*** Machine Learning Enablement ***
•Deployed offline machine learning (ML) model into production to improve payment processor's decision making capability
•Re-skilled to learn gRPC API framework, used to deploy online ML models to production
*** Distributed Systems: Cloud Computing Infrastructure and Observability Engineering ***
•Architected and built multi-threaded Java application, run on AWS EMR with Kubernetes, that monitored data freshness to determine bottlenecks in the data pipeline architecture and measure SLOs to determine whether the team’s SLAs were being met.
•Managed AWS EMR assets with Terraform; Infrastructure tooling development and security management with Terraform, Jenkins, and Docker
•Test-driven development with Java, Python, and Ruby
•Cloud Computing & Data Infrastructure Engineering with AWS, Spark, Hive, Hadoop & Airflow (cron job and Airflow operator development)
*** ETL development ***
•Built and tested malleable frameworks for ETL-ing petabytes of data at scale with lambda architecture (mix of batch and streaming)
•Created self-service tooling and abstractions to help analysts, engineers and business users carry out lightweight ETLs, dashboards or monitoring
*** Team Enablement ***
•Mentored and onboarded engineers of varying levels to the data platform domain
•Increased capacity across teams and within Airflow open source community by creating tooling that improved productivity