Text this: Quantifying the human visual exposome with vision language models