Text this: Deep learning–driven image captioning: Progress through transformers and large language models