Similar Items: Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards