Similar Items: FlowEval: Reference-based Evaluation of Generated User Interfaces
- MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
- Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation
- RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems
- Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents
- Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning
- SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind