Similar Items: SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications
- FlowEval: Reference-based Evaluation of Generated User Interfaces
- A Language for Describing Agentic LLM Contexts
- Governing What the EU AI Act Excludes: Accountability for Autonomous AI Agents in Smart City Critical Infrastructure
- Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents
- LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging
- RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution