I am committed to pioneering research in large reasoning models(LRMs) and agentic alignment, exploring the progression from AI through AGI toward ASI. My work aims to bridge specialized intelligence with general-purpose, self-evolving agentic systems by developing mathematically grounded ethical frameworks and designing efficient, adaptive superalignment methodologies.

I am advancing research on LRMs and agentic alignment techniques through scalable reinforcement learning methods, aiming to enhance their complex reasoning capabilities. My work focuses on developing advanced algorithms and frameworks that leverage high-quality long chain-of-thoughts(CoTs) data, including R1/O1-related scalable RL alignment algorithms, post-training methods such as reinforcement learning with diverse feedbacks (RLXF), and broader AI alignment strategies. Additionally, I am actively involved in research on multimodal interaction and demonstrate a keen interest in controllable AI-generated content (AIGC).

In my prior work, I have made valuable contributions to reinforcement learning and multi-agent systems, particularly through the development of reward tuning, off-policy and on-policy RL algorithms, and evaluation frameworks, as well as algorithms for cooperative and competitive multi-agent learning. Furthermore, my research, which integrates preference learning, has been widely applied to practical domains such as ranking, pricing, marketing, and recommendation systems.