Text this: Robust Vision-Language Alignment Using Multi-Modal Large Language Models for Open-Vocabulary Semantic Segmentation