Built a Java Micronaut microservice on AWS that acted as the first stop for OpenTelemetry MELT data into Cisco’s Full Stack Observability platform, handling ~500M telemetry signals a day from ~3,000 enterprise customers.
Helped shape and implement core APIs using contract-first schemas, tuned scalability with HPA, wrote unit/integration/perf tests, and shipped changes through CI/CD while meeting security and operational standards.
Built a fault-tolerant Kafka Streams microservice to process and aggregate distributed trace data at scale, enabling real-time trace analysis and simplifying the downstream trace pipeline.
Led the rollout of an upgraded in-house CI/CD setup for both microservices, moving to engineer-owned releases, reducing cross-team merge conflicts, and cutting bug-fix time-to-prod from hours to minutes.
Developed a custom OpenTelemetry collector that ingests, filters, and routes telemetry from platform services to five different monitoring destinations, improving the efficiency of both internal and customer-facing data flows.
Led the monitoring and alerting strategy across three services using Grafana dashboards, Prometheus metrics, and PagerDuty, helping reduce MTTR for production incidents by ~40%.
Took ownership of customer escalations and production issues, shipping hotfixes and monthly releases with critical bug fixes and enhancements.