•Architected and led migration from Mesos/Aurora to serverless Lambda-based orchestration, eliminating operational burden of managing Zookeeper, Mesos, and Aurora while improving system reliability to 99.9% uptime and reducing platform operational incidents by 70%
•Spearheaded comprehensive AWS EMR evaluation initiative involving production benchmarking across 1000+ jobs, and cost-performance analysis that informed the strategic decision not to adopt EMR, saving estimated $75K daily in potential cost increases
•Designed and implemented fallback-to-On-Demand capability that automatically switches from spot to on-demand instances during capacity shortages, protecting 1200+ business-critical clusters from spot market volatility and reducing on-call incidents by 60%
•Established technical mentorship program across distributed teams (Berlin, São Paulo), promoting 3 engineers to senior roles and creating bi-weekly knowledge-sharing sessions that became standard practice across data platform organization