LLM Inference
Intra-GPU PD Disaggregation: https://arxiv.org/abs/2507.06608
MoE Inference Expert Load Balancing: https://arxiv.org/pdf/2510.03293
LLM Agent RL System and Training
rLLM: https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31
DeepSWE: https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art-Coding-Agent-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33
LLM Agent Inference
Autellix: An Efficient Serving Engine for LLM Agents as General Programs: https://arxiv.org/abs/2502.13965
Verified Tensor Code Transpilation
Tenspiler: A Verified Lifting-Based Compiler for Tensor Operation: https://arxiv.org/abs/2404.18249