Text this: KL for a KL: On-Policy Distillation with Control Variate Baseline