Similar Items: LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
- Misaligned by Reward: Socially Undesirable Preferences in LLMs
- Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
- Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs
- SERE: Structural Example Retrieval for Enhancing LLMs in Event Causality Identification
- Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
- mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection