Similar Items: Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems
- SynSQL: Synthesizing Relational Databases for Robust Evaluation of Text-to-SQL Systems
- FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
- PolySQL: Scaling Text-to-SQL Evaluation Across SQL Dialects via Automated Backend Isomorphism
- From Intent to Execution: Composing Agentic Workflows with Agent Recommendation
- NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research
- What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design