Text this: VisAULa: Advancing Facial Action Unit Recognition With Token-Level Alignment