Text this: Visually grounded speech models for low-resource languages and cognitive modelling