Similar Items: Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment
- Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization
- Misaligned by Reward: Socially Undesirable Preferences in LLMs
- Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
- PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
- CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation
- Why Expert Alignment Is Hard: Evidence from Subjective Evaluation