Similar Items: MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge
- SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
- mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection
- mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection
- UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning
- A multilingual hallucination benchmark: MultiWikiQHalluA
- Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs