•Architected and deployed a high-availability Thanos stack on Kubernetes (AKS) using Helm charts to aggregate Prometheus metrics, centralizing observability for 7 clusters.
•Created interactive dashboards with Grafana for multi-cluster metrics, enabling engineers to measure latency, error rate, and SLI trends across all regions
•Provisioned Azure Blob Storage into the Thanos stack using Terraform, enabling long-term, scalable, and persistent metric storage that is optimized to meet organizational retention policies
•Hardened multi-region metric egress with zero-trust authentication by utilizing Azure KeyVault CSI and Istio to dynamically rotate and inject Bearer tokens, securing Thanos ingestion across a multi-regional distributed system.