Similar Items: MCJudgeBench: A Benchmark for Constraint-Level Judge Evaluation in Multi-Constraint Instruction Following
- ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
- CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
- Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
- TriBench-Ko: Evaluating LLM Risks in Judicial Workflows
- A multilingual hallucination benchmark: MultiWikiQHalluA
- SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models