Similar Items: ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
- CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
- WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
- MCJudgeBench: A Benchmark for Constraint-Level Judge Evaluation in Multi-Constraint Instruction Following
- Mapping Discourse Reframing: A Multi-Layer Network Approach to Italian HPV Vaccine Discourse on X (2010-2024)
- TriBench-Ko: Evaluating LLM Risks in Judicial Workflows
- FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios