Similar Items: Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning
- RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses
- SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
- Abductive Reasoning with Probabilistic Commonsense
- RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
- The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning
- Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners