Text this: PhyGround: Benchmarking Physical Reasoning in Generative World Models