Text this: Multimodal vision-language framework for text-guided leukemia classification using advanced deep learning architectures