Similar Items: Negation Neglect: When models fail to learn negations in training
- A Note on Non-Negative $L_1$-Approximating Polynomials
- Exploration Hacking: Can LLMs Learn to Resist RL Training?
- Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring
- Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
- When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
- Robust and Fast Training via Per-Sample Clipping