Xinpeng Wang

I am a PhD student at the Munich AI & NLP (MaiNLP) research lab at LMU Munich . My supervisor is Prof. Barbara Plank . I'm currently a visiting researcher at New York University advised by Prof. He He .

Previously, I completed my M.Sc. degree in Robitics, Cognition, Intelligence at Technical University of Munich, where I was a student researcher at Visual Computing & AI lab at TUM working on indoor scene synthesis. I was also a teaching assistant of the course Introduction to Deep Learning (IN2346).

My research currently focuses on Human-Centric AI and Alignment.

Email / GitHub / Google Scholar / LinkedIn / Twitter / CV

Research

	Refusal Direction is Universal Across Safety-Aligned Languages Xinpeng Wang, Mingyang Wang, Yihong Liu, Hinrich Schuetze, Barbara Plank Under Review*, 2025 arxiv / Refusal directions in LLMs work across languages, revealing shared jailbreak mechanisms and raising the need for stronger multilingual safety.
	Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation Xinpeng Wang, Chengzhi Hu, Paul Röttger, Barbara Plank ICLR, 2025 arxiv / code / We propose a surgical and flexible approach to mitigate the false refusal in LLMs with minimal effect on performance and inference cost.
	Seeing the Big through the Small: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations? Beiduo Chen, Xinpeng Wang, Siyao Peng, Robert Litschko, Anna Korhonen, Barbara Plank EMNLP Findings, 2024 arxiv / This study proposes to exploit LLMs to approximate human judgment distributions using a small number of expert labels and explanations on NLI.
	The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models Bolei Ma, Xinpeng Wang, Tiancheng Hu, Anna-Carolina Haensch, Michael A. Hedderich, Barbara Plank, Frauke Kreuter EMNLP Findings, 2024 arxiv / We review recent research on evaluating Attitudes, Opinions, and Values in LLMs, highlighting the potential and challenges and offering suggestions for future research.
	Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank COLM, 2024 arxiv / code / We showed that text answers are more robust than first token answer in instruction-tuned language models, even debiased with SOTA first-token debiasing method.
	"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank ACL Findings, 2024 arxiv / code / We showed that the first-token probability evaluation does not match text answers in instruction-tuned language models.
	ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation Xinpeng Wang, Barbara Plank EMNLP, 2023 arxiv / We proposed an active learning framework that utilizes a muli-head model to model individual annotators. We designed different acquisition functions and showed our active learning setup achieved performance comparable to full-scale training while saving up to 70% of the annotation budget.
	How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives Xinpeng Wang, Leonie Weissweiler, Hinrich Schütze, Barbara Plank ACL, 2023 arxiv / code / We showed that using lower teacher layers for pre-loading student model gives significant performance improvement compared to higher layers. We also studied the robustness of different distillation objectives under various initialisation choices.
	Sceneformer: Indoor Scene Generation with Transformers Xinpeng Wang, Chandan Yeshwanth, Matthias Nießner 3DV, 2021 oral arxiv / video / code / We proposed a transformer model for scene generation conditioned on room layout and text description.

Projects

These include coursework and practical course projects.

Domain Specific Multi-Lingually Aligned Word Embeddings

Machine Learning for Natural Language Processing Applications
2021-07
report /

Curiosity Driven Reinforcement Learning

Advanced Deep Learning in Robotics
2021-03
report /

Evaluated and compared the count-based and prediction-based curiosity driven learning on different Atari game environments.

Teaching

Introduction to Deep Learning (IN2346)

SS 2020, WS2020/2021
Teaching Assistant
website /

Design and source code from Jon Barron's website

Xinpeng Wang

Research

Refusal Direction is Universal Across Safety-Aligned Languages

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Seeing the Big through the Small: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives

Sceneformer: Indoor Scene Generation with Transformers

Projects

Domain Specific Multi-Lingually Aligned Word Embeddings

Curiosity Driven Reinforcement Learning

Teaching

Introduction to Deep Learning (IN2346)