Similar Items: KL for a KL: On-Policy Distillation with Control Variate Baseline