Text this: Multimodal Learning on Low-Quality Data with Conformal Predictive Self-Calibration