Similar Items: Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR
- Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
- KL for a KL: On-Policy Distillation with Control Variate Baseline
- Step Rejection Fine-Tuning: A Practical Distillation Recipe
- CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation
- Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
- Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning