Similar Items: EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
- SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
- Emergent Communication for Co-constructed Emotion Between Embodied Agents via Collective Predictive Coding
- FitText: Evolving Agent Tool Ecologies via Memetic Retrieval
- Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation
- Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents
- Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games