Text this: A bilingual analysis of multi-head attention mechanism for image captioning based on morphosyntactic information