XLLM-Reason-Plan

About

We are thrilled to announce the First Workshop on the Application of LLM Explainability to Reasoning and Planning at COLM 2025 to be held on October 10, 2025.

Enabling large language models (LLMs) to reason (e.g., arithmetic reasoning, symbolic reasoning, commonsense reasoning, etc.) and plan (e.g., path-finding, tool use, web navigation, computer use, etc.) has been a popular topic in the past few years. Despite the exciting achievement, there have also been growing concerns about the safety and trustworthiness of these LLM applications, due to our large “unknowns” on how LLMs achieve these capabilities and where they could fail. On the other hand, LLM explainability (broadly including any research explaining or interpreting LLMs) has also attracted increasing attention, but existing research has mostly focused on simplified tasks and hardly yields insights that can be directly applied to realistic reasoning and planning tasks. This discrepancy has consequently raised doubts about the practical meaning of LLM explainability research.

In this workshop, we aim to bring together researchers from various perspectives to discuss the potential and practical applications of model explainability to advancing LLM reasoning and planning. Specifically, the workshop welcomes submissions on the following topics (non-exclusively):

local explanations (e.g., feature attribution, textual explanations, including CoT type) of LLMs in reasoning and/or planning tasks;
global explanations (e.g., mechanistic interpretability) of LLMs in reasoning and/or planning tasks;
applications of explainability to enhance LLM’s effectiveness in reasoning and/or planning tasks;
applications of explainability to enhance LLM’s safety and trustworthiness in reasoning and/or planning tasks;
user interface development driven by LLM explanations;
human-LLM collaboration and teaming driven by explanations; and
explainability-driven, automatic or human-in-the-loop LLM evaluation.

Join our Google Group for workshop updates and Q&A https://groups.google.com/g/xllm-reasoning-planning-workshop, and contact us at xllmreasoningplanningworkshop AT gmail DOT com for other inquiries!

Accepted papers


The Geometry of Self-Verification in a Task-Specific Reasoning Model Andrew Lee, Lihao Sun, Chris Wendler, Fernanda Viégas, Martin Wattenberg
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration Quan Shi, Carlos E Jimenez, Shunyu Yao, Nick Haber, Diyi Yang, Karthik R Narasimhan
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data Jiaming Zhou, Abbas Ghaddar, Ge Zhang, Liheng Ma, Yaochen Hu, Soumyasundar Pal, Bin Wang, Jianye HAO, Mark Coates, Yingxue Zhang
Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing Irina Saparina, Mirella Lapata
Beyond Autocomplete: Designing CopilotLens Towards Transparent and Explainable AI Coding Agents Runlong Ye, Zeling Zhang, Boushra Almazroua, Michael Liut
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu
Rethinking (Human) Preference Evaluation of LLM Rationales Ziang Li, Manasi Ganti, Zixian Ma, Helena Vasconcelos, Qijia He, Ranjay Krishna
HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning Simeng Han, Tianyu Liu, Chuhan Li, Xuyuan Xiong, Arman Cohan
Angular Steering: Behavior Control via Rotation in Activation Space Hieu M. Vu, Tan Minh Nguyen
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation Ziling Cheng, Meng Cao, Leila Pishdad, Yanshuai Cao, Jackie CK Cheung
Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang H Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy
Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao
Are General-Purpose LLMs Ready for Planning? A Large-Scale Evaluation in PDDL Kaustubh Vyas, Damien Graux, Sebastien Montella, Pavlos Vougiouklis, Jeff Z. Pan
ReCalibrate: RL for Uncertainty-Aware Reasoning in LLMs Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Jacob Andreas
Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones Daking Rai, Samuel Miller, Kevin Moran, Ziyu Yao
From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits Karim Saraipour, Shichang Zhang
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, Himabindu Lakkaraju, Shichang Zhang
Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility Shramay Palta, Peter A. Rankel, Sarah Wiegreffe, Rachel Rudinger
Before You 〈think/〉, Monitor: Implementing Flavell's Metacognitive Framework in LLMs Nick Oh
Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models Prahitha Movva

The Geometry of Self-Verification in a Task-Specific Reasoning Model

Andrew Lee, Lihao Sun, Chris Wendler, Fernanda Viégas, Martin Wattenberg

Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration

Quan Shi, Carlos E Jimenez, Shunyu Yao, Nick Haber, Diyi Yang, Karthik R Narasimhan

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

Jiaming Zhou, Abbas Ghaddar, Ge Zhang, Liheng Ma, Yaochen Hu, Soumyasundar Pal, Bin Wang, Jianye HAO, Mark Coates, Yingxue Zhang

Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing

Irina Saparina, Mirella Lapata

Beyond Autocomplete: Designing CopilotLens Towards Transparent and Explainable AI Coding Agents

Runlong Ye, Zeling Zhang, Boushra Almazroua, Michael Liut

Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu

Rethinking (Human) Preference Evaluation of LLM Rationales

Ziang Li, Manasi Ganti, Zixian Ma, Helena Vasconcelos, Qijia He, Ranjay Krishna

HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

Simeng Han, Tianyu Liu, Chuhan Li, Xuyuan Xiong, Arman Cohan

Angular Steering: Behavior Control via Rotation in Activation Space

Hieu M. Vu, Tan Minh Nguyen

Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation

Ziling Cheng, Meng Cao, Leila Pishdad, Yanshuai Cao, Jackie CK Cheung

Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang H Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy

Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction

Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao

Are General-Purpose LLMs Ready for Planning? A Large-Scale Evaluation in PDDL

Kaustubh Vyas, Damien Graux, Sebastien Montella, Pavlos Vougiouklis, Jeff Z. Pan

ReCalibrate: RL for Uncertainty-Aware Reasoning in LLMs

Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Jacob Andreas

Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

Daking Rai, Samuel Miller, Kevin Moran, Ziyu Yao

From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits

Karim Saraipour, Shichang Zhang

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, Himabindu Lakkaraju, Shichang Zhang

Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility

Shramay Palta, Peter A. Rankel, Sarah Wiegreffe, Rachel Rudinger

Before You 〈think/〉, Monitor: Implementing Flavell's Metacognitive Framework in LLMs

Nick Oh

Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models

Prahitha Movva

Invited speakers

Greg Durrett

New York University

Yonatan Belinkov

Technion

Ana Marasović

University of Utah

Jennifer Wortman Vaughan

Microsoft Research

Zining Zhu

Stevens Institute of Technology

Mark Riedl

Georgia Institute of Technology

Huan Sun

The Ohio State University

Schedule

Start Time	Session	Speaker(s)
8:45 AM	Opening Remarks	Workshop Organizers
9:00 AM	Invited Talk 1	Yonatan Belinkov
9:40 AM	Invited Talk 2	Greg Durrett
10:20 AM	Invited Talk 3	Huan Sun
11:00 AM	Coffee Break & Poster Session 1
12:00 PM	Invited Talk 4	Ana Marasović
12:40 PM	Lunch Break
2:00 PM	Invited Talk 5	Mark Riedl
2:40 PM	Invited Talk 6	Jenn Wortman Vaughan
3:20 PM	Coffee Break & Poster Session 2
4:00 PM	Invited Talk 7	Zining Zhu
4:40 PM	Panel Discussion & Closing Remarks
5:15 PM	Workshop End

Call for papers

Important dates

Submission deadline: ~~June 23~~, June 27, 2025, 23:59 AoE
Acceptance notification: July 24, 2025

Submission instructions

We welcome both long (up to 9 pages of main content, plus unlimited references) and short (up to 5 pages of main content, plus unlimited references) paper submissions, following the official template of COLM. The long papers are expected to include completed and full-scope work while the short paper submissions can be preliminary or ongoing work. All submissions will be non-archival. We also allow dual submissions that are under review or have recently been accepted to other venues—for the former, authors should make sure to follow the dual submission policies from the other venue; for the latter, we ask authors to indicate the accepted venue.

Workshop awards

The workshop will announce one Best Paper Award targeting all authors, and one Special Recognition Award targeting papers with junior and/or underrepresented-group authors being the first authors. Authors submitting to our workshop will be requested to clarify the status of the first author(s) for eligibility confirmation.

OpenReview Submission Page

Program committee

Workshop organizers

Daking Rai

George Mason University

Hanjie Chen

Rice University

Mengnan Du

New Jersey Institute of Technology

Shi Feng

George Washington University

Q. Vera Liao

Microsoft Research/University of Michigan

Andreas Madsen

Guide Labs

Abulhair Saparov

Purdue University

Ziyu Yao

George Mason University

Yilun Zhou

Datadog AI Research

Reviewers

We sincerely thank the program committee for their considered and thoughtful reviews!

About

Accepted papers

Invited speakers

Greg Durrett

Yonatan Belinkov

Ana Marasović

Jennifer Wortman Vaughan

Zining Zhu

Mark Riedl

Huan Sun

Schedule

Call for papers

Important dates

Submission instructions

Workshop awards

Program committee

Workshop organizers

Daking Rai

Hanjie Chen

Mengnan Du

Shi Feng

Q. Vera Liao

Andreas Madsen

Abulhair Saparov

Ziyu Yao

Yilun Zhou

Reviewers

Lihao Sun

Maria Chang

Yilin Xia

Haotian Xia

Jiaming Zhou

Haiyan Zhao

Mahdi Dhaini

Aadim Nepal

Ziling Cheng

David Atkinson

Chuhan Li

Jaechul Roh

Ruidi Chang

Siddarth Mamidanna

Hieu M. Vu

Prahitha Movva

Runlong Ye

Yanshu Li

Sushma Anand Akoju

Guangyi Liu

Weikai Li

Boshi Wang

Jinkun Chen

Stepan Shabalin

Ziang Li

Zhouxiang Fang

Preet Jhanglani

Min Cai

Kaustubh Vyas

Shichang Zhang

Oscar Balcells Obeso

Patrick Leask

Hao Yan

Xuansheng Wu

Dennis Wei

Chunyuan Deng

Ruoyu Chen

Zhengzhe Yang

Nikhil Prakash

Prasanth