Text this: PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning