Scaled LLM Reasoning (Loong Project): Engineered an open-source framework for synthesizing and verifying long-form Chain-of-Thought (CoT) data at scale. Successfully applied Reinforcement Learning on logic data to reproduce the state-of-the-art reasoning results of DeepSeek-R1 and Logic-RL, inducing emergent behaviors like self-reflection and verification.
Reinforcement Learning (RL) Involvement: Involved in the development of the ReaL-TG framework, which utilizes RL to optimize language models for explainable link forecasting on temporal graphs. Contributed to designing reward signals that prioritize transparency and logical consistency in model predictions.
Supervised Fine-Tuning (SFT): Leveraged SFT workflows to distill tool-use knowledge into compact models via back-translated traces, enabling high-performance autonomous agent capabilities with significantly reduced inference overhead.
Temporal Planning & Benchmarking: Designed and published the TCP Benchmark, a specialized evaluation suite for measuring LLM performance on temporal constraint-based planning, bridging a critical gap in multi-step reasoning assessment.
Multi-Agent Orchestration: Contributed to the CAMEL-AI open-source ecosystem, focusing on autonomous communication protocols and the deployment of "societies" of LLM agents to solve complex, distributed tasks.