Channels - CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing :: FRELIP Discovery

Similar Items: CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing

Quick Look
Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims
Quick Look
FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios
Quick Look
Benchmarking Parameter-Efficient Fine-Tuning of Large Language Models for Low-Resource Tajik Text Generation with the Tajik Web Corpus
Quick Look
A multilingual hallucination benchmark: MultiWikiQHalluA
Quick Look
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
Quick Look
ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
Quick Look
GazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning
Quick Look
CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
Quick Look
When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition
Quick Look
MCJudgeBench: A Benchmark for Constraint-Level Judge Evaluation in Multi-Constraint Instruction Following
Quick Look
MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge
Quick Look
Shadow-Loom: Causal Reasoning over Graphical World Model of Narratives
Quick Look
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Quick Look
Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors
Quick Look
Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations
Quick Look
DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models
Quick Look
Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation
Quick Look
Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
Quick Look
Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts
Quick Look
SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models
Quick Look
Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial
Quick Look
Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir
Quick Look
Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF
Quick Look
Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future