Similar Items: Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization