# Sujith Nair > Lead Software Engineer | Data Infrastructure | US πŸ‡ΊπŸ‡Έ Location: Greater Boston, United States Profile: https://flows.cv/sujith I architect data platforms that scale with hypergrowth. Over the past decade, I've built systems that powered Nubank's expansion from 3M to 120M customers across four countries, processing 300,000+ datasets daily. My work centers on making complex infrastructure invisible to users. At Nubank, I've led platform evolution through 40x growth: eliminating Mesos/Zookeeper/Aurora in favor of serverless orchestration, unlocking Python workloads for data scientists through Spark Connect, building self-serve anomaly detection now protecting 100,000+ datasets for 50+ engineering teams., and designing auto-remediation that cut incident resolution from hours to minutes. Before that, I designed data systems supporting vaccine distribution for 50M+ citizens across India, Zambia, Myanmar, and Indonesia. I am a contributor to Apache Spark, Apache Datafusion-Comet, and Apache Iceberg, among other open-source projects. I love to write - fiction & on technology - and I publish some of my writing on my website: sujithjay.com. I also sometimes talk about data infrastructure - a recent talk is on Software Engineering Daily. ## Work Experience ### Lead Software Engineer, Data Platform @ Nubank Jan 2026 – Present | McLean, VA β€’ Led Spark-Connect integration enabling Python-native data processing, architecting remote execution model that allowed data scientists to submit PySpark jobs without local Spark installation while maintaining security isolation and resource governance across 300K+ existing Scala workloads ### Senior Software Engineer, Data Platform @ Nubank Jan 2023 – Jan 2026 | McLean, VA β€’ Architected and led migration from Mesos/Aurora to serverless Lambda-based orchestration, eliminating operational burden of managing Zookeeper, Mesos, and Aurora while improving system reliability to 99.9% uptime and reducing platform operational incidents by 70% β€’ Spearheaded comprehensive AWS EMR evaluation initiative involving production benchmarking across 1000+ jobs, and cost-performance analysis that informed the strategic decision not to adopt EMR, saving estimated $75K daily in potential cost increases β€’ Designed and implemented fallback-to-On-Demand capability that automatically switches from spot to on-demand instances during capacity shortages, protecting 1200+ business-critical clusters from spot market volatility and reducing on-call incidents by 60% β€’ Established technical mentorship program across distributed teams (Berlin, SΓ£o Paulo), promoting 3 engineers to senior roles and creating bi-weekly knowledge-sharing sessions that became standard practice across data platform organization ### Senior Software Engineer, Data Platform @ Nubank Jan 2021 – Jan 2022 | Berlin, Germany β€’ Architected orchestration platform evolution that enabled 6x increase in daily pipeline executions (from 50K to 300K+) while maintaining the span execution time (< 24 hours) for each daily run through optimized resource scheduling and autoscaling strategies β€’ Architected self-serve anomaly detection platform adopted by 50+ teams across 100,000+ daily jobs, reducing data quality incidents by 70% β€’ Designed and implemented diagnosis and auto-remediation system that reduced mean time to resolution for data pipeline failures from 90 mins to 15 mins, improving platform SLAs and enabling the platform to scale without proportional headcount growth β€’ Led cross-functional LGPD (GDPR in Brazil) compliance initiative spanning data platform, security, legal, and product engineering teams, architecting privacy-compliant caching layer affecting 60M+ customer records ### Software Engineer, Data Platform @ Nubank Jan 2019 – Jan 2020 | Berlin, Germany ### Data Engineer II @ Logistimo Jan 2018 – Jan 2018 | Bengaluru, Karnataka, India β€’ Architected and led migration from RDBMS to distributed data lake built on Apache Spark, YARN, and Cassandra, enabling the platform to process immunization supply chain data for 50M+ citizens across 4 countries, and ensuring high data availability for making time-sensitive vaccine distribution decisions. β€’ Collaborated with product and field teams to define data requirements for new markets, translating complex public health needs into technical specifications and ensuring the data platform could adapt to different regulatory requirements across countries. ### Data Engineer @ Logistimo Jan 2016 – Jan 2018 | Bengaluru, Karnataka, India ### Software Engineer @ WeAreHolidays Jan 2014 – Jan 2016 | Gurgaon, India β€’ Co-architected and built full-stack travel marketplace platform from ground up, delivering end-to-end booking system spanning supplier integrations, payment processing, user management, and inventory management within 8 months to first customer transaction. β€’ Collaborated directly with founders on product roadmap and technical strategy, translating business requirements into technical solutions and making pragmatic architectural trade-offs to balance speed-to-market with scalability. ## Education ### Master of Technology in Systems Engineering, Electrical Engineering Indian Institute of Technology (Banaras Hindu University), Varanasi ### Bachelor of Engineering in Computer Engineering University of Mumbai ## Contact & Social - LinkedIn: https://linkedin.com/in/suj1th - Portfolio: https://sujithjay.com - GitHub: https://github.com/sujithjay --- Source: https://flows.cv/sujith JSON Resume: https://flows.cv/sujith/resume.json Last updated: 2026-03-31