• Owned multiple end-to-end Python microservices and APIs, conducted low-level design reviews and delivered across both microservice and monolithic architectures.
• Led large-scale optimization of FCO’s Python microservices (ArangoDB, Kafka, Kubernetes), resolving bottlenecks to increase system scalability from 5K to 50K+ telemetry endpoints and cut infrastructure costs through efficient resource utilization.
• Drove production hardening through observability (Prometheus metrics, health probes, alerting), dedicated debug interfaces, and proactive monitoring—reducing customer-found defects.
• Prototyped AI-driven self-healing agents, demonstrating potential for autonomous monitoring and cost reduction.
• Designed and implemented a relay agent on embedded systems to relay telemetry between devices and the cloud and dispatch the results back to the edge device.
• Increased system load capacity 10× by identifying scale bottlenecks and delivering the first scalable iteration of the core agent. Increase system scalability from 2K to 20K+ telemetry endpoints.