Text this: Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone