Text this: Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation