APA (7th ed.) Citation
(2026). What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design. ArXiv cs.AI Recent Papers.
Chicago Style (17th ed.) Citation
"What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design." ArXiv Cs.AI Recent Papers 2026.
MLA (9th ed.) Citation
"What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design." ArXiv Cs.AI Recent Papers, 2026.
Warning: These citations may not always be 100% accurate.