Text this: Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization