Software Engineer specializing in scalable LLM infrastructure and high-throughput serving platforms. I build auto-scaling, asynchronous systems that optimize GPU utilization across self-hosted models (Qwen, Flux, Seedance) and third-party APIs (GPT, Gemini).
Led the end-to-end development of automated manufacturing systems, robotic arm simulations in ROS, and scalable testing programs, significantly reducing costs and boosting production efficiency