Channels - Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems :: FRELIP Discovery

Similar Items: Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems

Quick Look
SynSQL: Synthesizing Relational Databases for Robust Evaluation of Text-to-SQL Systems
Quick Look
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
Quick Look
PolySQL: Scaling Text-to-SQL Evaluation Across SQL Dialects via Automated Backend Isomorphism
Quick Look
From Intent to Execution: Composing Agentic Workflows with Agent Recommendation
Quick Look
NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research
Quick Look
What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design
Quick Look
Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents
Quick Look
FINER-SQL: Boosting Small Language Models for Text-to-SQL
Quick Look
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
Quick Look
Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes
Quick Look
Learning CLI Agents with Structured Action Credit under Selective Observation
Quick Look
AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development
Quick Look
SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment
Quick Look
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
Quick Look
An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration
Quick Look
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
Quick Look
To Build or Not to Build? Factors that Lead to Non-Development or Abandonment of AI Systems
Quick Look
Contextual Multi-Objective Optimization: Rethinking Objectives in Frontier AI Systems
Quick Look
Rose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQL
Quick Look
RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses
Quick Look
Characterizing the Consistency of the Emergent Misalignment Persona
Quick Look
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Quick Look
LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis
Quick Look
AI and Open-data Driven Scalable Solar Power Profiling