Similar Items: A multilingual hallucination benchmark: MultiWikiQHalluA
- The First Token Knows: Single-Decode Confidence for Hallucination Detection
- Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
- Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
- MCJudgeBench: A Benchmark for Constraint-Level Judge Evaluation in Multi-Constraint Instruction Following
- Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments
- mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection