Similar Items: SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
- CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs
- CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents
- Causal Foundations of Collective Agency
- Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation
- Emergent Communication for Co-constructed Emotion Between Embodied Agents via Collective Predictive Coding
- Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition