Similar Items: PhyGround: Benchmarking Physical Reasoning in Generative World Models