Similar Items: CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing
- Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims
- FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios
- Benchmarking Parameter-Efficient Fine-Tuning of Large Language Models for Low-Resource Tajik Text Generation with the Tajik Web Corpus
- A multilingual hallucination benchmark: MultiWikiQHalluA
- Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
- ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks