Text this: A General Framework for Multimodal LLM-Based Multimedia Understanding in Large-Scale Recommendation Systems