Text this: Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue