Text this: When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels