# Jeffrey (Yu-Che) Wang > LLM inference @ Anyscale Location: San Francisco, California, United States Profile: https://flows.cv/jeffreyyuchewang Jeffrey is a software engineer focusing on building scalable distributed systems for machine learning. His recent work centers on ML systems internals and system-level optimizations, with experience building low-latency inference microservices and high-throughput offline batch pipelines. ## Work Experience ### Software Engineer @ Anyscale Jan 2025 – Present | San Francisco, California, United States Building Ray (Ray Serve LLM, Ray Serve, Ray Data LLM) and vLLM! ### Open Source Contributor @ Ray Open Source Jan 2024 – Present Actively contributing to the open-source Ray project. Explore some of my contributions: Ray LLM (distributed inference), Ray Core (compiled graphs, NCCL). ### ML Software Engineer II @ Amazon Jan 2024 – Jan 2025 | Seattle, WA Prime Video Personalization ML Platform team 1. Developed low-latency online inference microservices utilizing gRPC, TensorRT, and Triton on EKS with seamless MLOps CI/CD. 2. Architected distributed training cluster to scale out model training with Ray on EKS. 3. Built and scaled high-throughput data and ML processing pipelines with Spark, PyTorch, EMR, SageMaker, and Airflow. 4. Maintained scalable microservices based on AWS Fargate and ECS with 100k+ TPS. ### Software Engineer II @ Microsoft Jan 2023 – Jan 2024 ### Software Engineer @ Microsoft Jan 2021 – Jan 2023 | Redmond, Washington, United States ### Deep Learning Research Assistant @ University of Illinois at Urbana-Champaign Jan 2020 – Jan 2020 | Urbana-Champaign Area • Self-supervised convolutional autoencoder networks for image domain translation. • End-to-end self-supervised deep autoencoder networks with adaptive frontends and attention mechanism for speech enhancement, acoustics matching, and bandwidth extension. ### Software Engineer Intern @ Microsoft Jan 2020 – Jan 2020 | Redmond, Washington, United States ### Deep Learning Research Intern @ National Center for Supercomputing Applications Jan 2019 – Jan 2020 | Urbana-Champaign, Illinois Area • Developed spatial-temporal graph convolutional network to analyze time-series data, investigate the diffusion of mosquito-borne diseases, and predict future disease diffusion paths in human travel networks with Keras and PyTorch. ## Education ### Bachelor's degree in Computer Science University of Illinois Urbana-Champaign ## Contact & Social - LinkedIn: https://linkedin.com/in/jeffrey-yu-che-wang - GitHub: https://github.com/jeffreyjeffreywang --- Source: https://flows.cv/jeffreyyuchewang JSON Resume: https://flows.cv/jeffreyyuchewang/resume.json Last updated: 2026-04-11