Search Results - (evolution OR evaluation)

  1. Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents

    Published in ArXiv cs.MA Recent Papers (2026)
    Get full text
    Online Article RSS Article
  2. Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

    Published in ArXiv cs.CL Recent Papers (2026)
    Get full text
    Online Article RSS Article
  3. Distributed Quantum Circuit Optimisation: Evaluating Global and Local encodings

    Published in ArXiv cs.DC Recent Papers (2026)
    Get full text
    Online Article RSS Article
  4. Implementing True MPI Sessions and Evaluating MPI Initialization Scalability

    Published in ArXiv cs.DC Recent Papers (2026)
    Get full text
    Online Article RSS Article
  5. TriBench-Ko: Evaluating LLM Risks in Judicial Workflows

    Published in ArXiv cs.CL Recent Papers (2026)
    Get full text
    Online Article RSS Article
  6. FlowEval: Reference-based Evaluation of Generated User Interfaces

    Published in ArXiv cs.MA Recent Papers (2026)
    Get full text
    Online Article RSS Article
  7. Why Expert Alignment Is Hard: Evidence from Subjective Evaluation

    Published in ArXiv cs.CL Recent Papers (2026)
    Get full text
    Online Article RSS Article
  8. Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning

    Published in ArXiv cs.MA Recent Papers (2026)
    Get full text
    Online Article RSS Article
  9. Ex Ante Evaluation of AI-Induced Idea Diversity Collapse

    Published in ArXiv cs.GT Recent Papers (2026)
    Get full text
    Online Article RSS Article
  10. MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

    Published in ArXiv cs.IR Recent Papers (2026)
    Get full text
    Online Article RSS Article
  11. FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning

    Published in ArXiv cs.DC Recent Papers (2026)
    Get full text
    Online Article RSS Article
  12. SAGE: Scalable Agentic Grounded Evaluation for Crop Disease Diagnosis

    Published in ArXiv cs.MA Recent Papers (2026)
    Get full text
    Online Article RSS Article
  13. From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

    Published in ArXiv cs.CR Recent Papers (2026)
    Get full text
    Online Article RSS Article
  14. Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

    Published in ArXiv cs.DC Recent Papers (2026)
    Get full text
    Online Article RSS Article
  15. PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

    Published in ArXiv cs.CR Recent Papers (2026)
    Get full text
    Online Article RSS Article
  16. An empirical evaluation of clustering processes for early detection of university dropout

    Published in JDSA (2026)
    Get full text
    Online Article RSS Article
  17. Evaluation of cerebrovascular reactivity using transcranial Doppler in patients with influenza

    Published in PLOS ONE (2026)
    Get full text
    Online Article RSS Article
  18. Design and Evaluation of an Integrated BI Solution with Centralized Data Architecture

    Get full text
    Online Article RSS Article
  19. Evaluation of grafting technique in three genotypes of Cedrela odorata L.

    Get full text
    Online Article RSS Article
  20. Design toolkits for campus open spaces from post-occupancy evaluations of federal universities in South-west Nigeria

    Published 2018
    Subjects: “…Post-occupancy evaluation…”
    Full Text Available
    Access Repository
    Article