Similar Items: Reward Hacking in Rubric-Based Reinforcement Learning
- Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning
- Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems
- Discrete Flow Matching for Offline-to-Online Reinforcement Learning
- RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses
- SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
- Learning Multimodal Energy-Based Model with Multimodal Variational Auto-Encoder via MCMC Revision