Similar Items: CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
- ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
- MCJudgeBench: A Benchmark for Constraint-Level Judge Evaluation in Multi-Constraint Instruction Following
- TriBench-Ko: Evaluating LLM Risks in Judicial Workflows
- Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future
- FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios
- Segmenting Human-LLM Co-authored Text via Change Point Detection