Text this: Lightweight multimodal large language model enabling efficient one shot industrial visual anomaly detection