Similar Items: MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
- CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
- Agentic Vulnerability Reasoning on Windows COM Binaries
- How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection
- Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches
- The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code
- ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models