Similar Items: LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
- CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
- Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
- Re-Triggering Safeguards within LLMs for Jailbreak Detection
- ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
- SoK: Robustness in Large Language Models against Jailbreak Attacks
- LoopTrap: Termination Poisoning Attacks on LLM Agents