•Architected and operated 6 production Ray clusters (detections, headcount, occupancy, care, room motion,
visualizer) across multi-AZ AWS, supporting 100+ concurrent worker nodes and autoscaling from zero to 1,000
workers via Ray Autoscaler. Deployed Ray Serve for low-latency HTTP inference, running continuously in
production for 2+ years.
•Designed a distributed stream processing framework using Ray’s actor model to ingest NATS message
streams, implement priority-based queuing, and dynamically schedule distributed processors—automatically
scaling concurrent workers to handle data from thousands of devices.
•Built versioned Ray cluster deployment pipelines with templated configs, automated job submission, and
rollback support. Integrated full-stack observability using Prometheus, Ray’s metrics API, Grafana, and
CloudWatch with custom counters, gauges, and latency histograms.
•Developed models for detection, tracking, motion detection, pose estimation and successfully deploying them end-to-end for ~5000 sensors
•Achieved 92% accuracy in person detection and counting using OpenCV for low-resolution thermal data.