Similar Items: MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering
- Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering
- ASTRA-QA: A Benchmark for Abstract Question Answering over Documents
- Context Convergence Improves Answering Inferential Questions
- Pt-HotpotQA: Evaluating Multi-Hop Question Answering on Original and Portuguese-translated Datasets Using LLMs
- Neural at ArchEHR-QA 2026: One Method Fits All: Unified Prompt Optimization for Clinical QA over EHRs
- Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring