I have been with 2 teams within IBM Cloud:
Private DNS:
Took Global Load Balancing with health checks from POC to GA across 21 data centers. Built several internal services (including one that manages VPC infrastructure), designed microservices in Go/gRPC/etcd, and owned CI/CD pipelines that deployed our services worldwide. My work spanned backend dev, DevOps, networking (BGP, systemd, DNS proxies), and production automation.
Cloud Databases (current):
I build and operate the automation that runs tens of thousands of customer databases across 100+ Kubernetes clusters. That includes controllers/operators for renewing certificates, rotating credentials, applying allowlists, managing backups, performing image updates, and coordinating releases at global scale.
I focus heavily on reliability and operations, building observability dashboards, tuning alerting (Prometheus/Alertmanager/PagerDuty), and debugging customer-impacting issues during on-call. I also analyze alert patterns and cluster metrics to reduce noise and improve stability.
On the platform side, I’ve led several multi-cluster engineering efforts:
Profitability & efficiency: Drove initiatives that increased revenue and cut infrastructure costs by $13M+ (autoscaling fixes, worker/node consolidation, removing orphaned storage, and redesigning CPU provisioning models).
Release modernization: Rebuilt our monolithic release system into independent ArgoCD pipelines, reducing release cycles from weeks to hours.
In-place upgrades: Helped architect and deliver the platform that enables in-place DB upgrades, now used for 10K+ upgrades/month.
Compliance & security: Designed credential-rotation and IAM/network-based access control systems to meet PCI 4.0, C5, FSCloud, and ENS-High requirements.
Across both teams, I’ve worn many hats — backend engineer, infrastructure engineer, SRE, and now team lead for platform-level initiatives — always with a focus on reliability, automation, and measurable business impact.