Text this: A Multimodal Framework for Vibration Signals via Knowledge-Guided Preprocessing (O-XSTFT) and Reconstruction-Contrastive Tokenization (ReCoFormer)