Similar Items: Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation
- KL for a KL: On-Policy Distillation with Control Variate Baseline
- Step Rejection Fine-Tuning: A Practical Distillation Recipe
- Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR
- Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
- Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
- Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key