# Zhao (Jack) Zhang > Software Engineer at CTC - Messaging Infrastructure Location: New York, New York, United States Profile: https://flows.cv/zhaojackzhang Passionate about building Cloud Infrastructures and developing Reliable Distributed Web Services. ## Work Experience ### Software Engineer @ Chicago Trading Company Jan 2023 – Present | New York, New York, United States ### Senior Site Reliability Engineer @ Indeed.com Jan 2022 – Jan 2023 | Austin, Texas, United States Cloud Infrastructure and Services (AWS / Terraform / Kubernetes) - Seamlessly transitioned 150+ on-prem services to AWS with zero downtime. Worked through 40 product and infrastructure teams to coordinate delivery. - Built Cloud infrastructures and projects requiring extensive experience with core AWS services(e.g., EC2, VPC, S3, SQS, ElastiCache), configuration/deployment tools (Terraform, Puppet, and Kubernetes), and cloud governance applications to manage Indeed’s AWS Organizations footprint. - Built iterations of production Kubernetes clusters, configured load balancers both on-prem and in AWS, and contributed to automation tools to deploy applications. Multi-region Deployment for Indeed Interview Platform (MongoDB / Atlas) - Designed and delivered better user experience (with higher reliability and multi-region read/write availability) and reduced latency by 50% for APAC clients using a business-critical interview platform. - Wrote a guide, established best practices, and reached a consensus between cross-functional teams for MongoDB geo-sharding in Atlas, for distributed services requiring data migration. AWS Migration Toolkit (Spring boot / React / Vault) - Designed, and led the development of a self-service web app to analyze and migrate deployment and configuration data between data centers in Indeed's distributed configuration system. - Automated and accelerated the process of deploying and migrating systems to AWS from days to within one hour by promoting it as a critical part of the engineering workflow across the entire organization. Engineering Leadership - Mentored other software engineers on technical details, increasing impact and advancing their career goals. - Provided plan leadership for enhancing SRE on-call support, security, and other reliability goals. - Drafted, led, and participated in 50+ design reviews for both product applications and infrastructure solutions. ### Site Reliability Engineer II @ Indeed.com Jan 2021 – Jan 2022 | Austin, Texas, United States Incident Impact Analysis Tool (Python Flask / React / Jira) - Designed and led the development of an interactive web app to help analyze the predicted impact of system outages on various critical business KPIs; - Visualized and accelerated the process from days to minutes by implementing APIs with real-time data from Datadog. CI/CD Improvement and Support - Wrote Docker files, and used automation building tools (Ant and Gradle), and CI/CD pipelines (Jenkins and GitLab) to deploy applications to Kubernetes clusters. - Integrated Datadog synthetic testing into GitLab CI/CD pipeline to catch UI errors earlier in deploying stage for frontend web applications. Operational Support for Indeed’s System Infrastructure and Applications (Terraform / Puppet / Kubernetes) - Managed, monitored, and supported networks (Load Balancers, DNS, and routing rules) for employers.indeed.com and its subdomains. - Provisioned, configured, and iterated on-prem and cloud infrastructures (servers nodes/EC2 instances, in-memory caches – Memcached/Redis, kubernetes clusters, etc.) with Terraform and Puppet. Observability Improvement and Support (Datadog) - Configured a large number of Datadog dashboards, and developed reusable Terraform modules to manage SLOs monitoring for system infrastructures, applications, databases, and message queues. - Implemented and iterated Datadog synthetic tests and developed an internal status page to inform customer service teams of system health. ### Site Reliability Engineer @ Indeed.com Jan 2018 – Jan 2021 | Austin, Texas Area On-call support and Reliability Best Practices (SLOs / Pagerduty) - Contributed to on-call mitigation, investigation, and remediation of major company-wide events. - Guided product teams to create proper SLOs and establish on-call processes by reviewing and improving their reliability checklist and documentation. - Developed a self-service process with Terraform to configure team-specific on-call schedules and escalation policies in Pagerduty. - Maintained operational review weekly with production teams to identify and improve observability and reliability. Chaos Testing - Prepared, executed, and monitored different types of chaos testing. - Identified and resolved generic issues, and verified 20+ critical services’ ability to failover between data centers. Dependency API Development (Spring Boot / MySQL) - Designed, and developed APIs providing insights into transitive service dependencies. - Developed a cron job gathering and saving data in MySQL database over time. ### Co Op @ ProQuest Jan 2016 – Jan 2016 | Ann Arbor -Processed historical newspaper images using openCV to apply automatic decomposition programs. -Implemented and refined a vision algorithm to perform segmentation and classifications of newspaper images. -Implemented an evaluation system to score the segmentation and classifications result comparing to ground truth data. ## Education ### Master's degree in Electrical and Computer Engineering University of Michigan ### Master's degree in Civil Engineering University of Michigan ### Bachelor of Engineering - BE in Traffic Engineering Tongji University ## Contact & Social - LinkedIn: https://linkedin.com/in/zhao-jack-zhang-226a9811a --- Source: https://flows.cv/zhaojackzhang JSON Resume: https://flows.cv/zhaojackzhang/resume.json Last updated: 2026-04-13