Implemented a solution to address the challenge of identifying and notifying stuck jobs within the Apache Kafka content processing pipeline
•
Created an API endpoint using REST architecture and Golang to retrieve stuck job details based on specified parameters, leading to improved monitoring and resolution efficiency
•
Integrated Kubernetes cronjobs and logging mechanisms to regularly assess job status by utilizing Kibana for efficient log analysis, enabled real-time tracking of stuck job counts by orchestrating the generation of metrics and visualizing the data through Grafana, and configured PagerDuty alerts to proactivity notify source system teams when stuck job thresholds were exceeded to ensure timely intervention
•
Worked within the Global Metadata Service team with 10+ engineers