I am an undergraduate student at Yuanpei College, Peking University. My research interests include AI alignment, value alignment, agent safety, reinforcement learning, and natural language processing.

I work with the PKU Alignment Group on building safe, reliable, and human-aligned AI systems.

🌙 Something To Say

I am a science student who loves Chinese, its music, its silences, and the way a single phrase can hold both moonlight and measurement. I hope to be a researcher with an interesting soul, one who does not merely ask whether to be, but tries to become more awake, more useful, and more humane.

There are more things in intelligence than our present theories can name. So I want to do research I love, and research that is worth loving: to take arms against confusion, to seek value amid uncertainty, and to use the rough magic of machines gently, so that the work may leave the world a little safer, clearer, and kinder.

I often, almost naively, imagine myself inside the closing scene of Romain Rolland’s Jean-Christophe: crossing the river between the long night and the rushing current, carrying on my shoulders a child both heavy and bright, step by step toward the farther shore. And when Christophe asks, “Enfant, qui donc es-tu?”, the Child answers, “Je suis le jour qui va naître.” I want to keep walking toward that newborn tomorrow, reborn for the next battle, doing the research I believe in with love, freedom, value, and an unyielding soul.

🔥 News

2026.06: Our paper A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs was selected as an ACL 2026 Main Conference Oral Presentation.
2026.06: We released SafeMCP, an ACL 2026 Main Conference paper on LLM agent defense.
2026.05: We released MiraBench, a benchmark for action-conditioned reliability in robotic world models.
2026.03: We released VISA and Stable Reasoning, Unstable Responses on arXiv.

📝 Publications

SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning
Lichao Wang, Zhaoxing Ren, Tianzhuo Yang, Jiaming Ji, Chi Harold Liu, Yaodong Yang, Juntao Dai. ACL 2026 Main Conference.
MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models
Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang, Jiayi Zhou, Jiaming Ji, Juntao Dai, Jiawei Chen, Boyuan Chen, Yaodong Yang. arXiv 2026.
Stable Reasoning, Unstable Responses: Mitigating LLM Deception via Stability Asymmetry
Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Lang Qin, Juntao Dai, Yaodong Yang, Jingwei Yi. arXiv 2026.
VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
Jiawei Chen, Tianzhuo Yang, Guoxi Zhang, Jiaming Ji, Yaodong Yang, Juntao Dai. arXiv 2026.
AI Deception: Risks, Dynamics, and Controls
Boyuan Chen, Sitong Fang, Jiaming Ji, Yanxu Zhu, Pengcheng Wen, Jinzhou Wu, Yingshui Tan, Boren Zheng, Mengying Yuan, Wenqi Chen, Donghai Hong, Alex Qiu, Xin Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Borong Zhang, Tianzhuo Yang, et al. arXiv 2025.
A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs
Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Jiaming Ji, Yaodong Yang, Juntao Dai. ACL 2026 Main Conference, Oral Presentation.

📖 Educations

2024 - Present, Undergraduate student, Yuanpei College, Peking University.

💻 Internships

Research Intern, Beijing Academy of Artificial Intelligence (BAAI / 智源研究院), Beijing, China.

Tianzhuo Yang

🌙 Something To Say

🔥 News

📝 Publications

📖 Educations

💻 Internships