Similar Items: Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis
- ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
- AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use
- You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
- Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks
- No More, No Less: Task Alignment in Terminal Agents
- Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution