Similar Items: OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
- VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection
- Abductive Reasoning with Probabilistic Commonsense
- Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning
- The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning
- NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research
- Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners