Similar Items: Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models