Models
March 12, 2026

Designing AI agents to resist prompt injection

OpenAI outlines strategies for designing AI agents that resist prompt injection attacks. The guide explains risks, safe system design, and layered defenses that help agents follow user intent instead of malicious instructions.

OpenAI explains how developers can design AI agents that resist prompt injection attacks, a security threat where hidden instructions manipulate an AI to ignore user intent. These attacks often appear in emails, webpages, or documents that the agent processes.

To reduce risk, OpenAI recommends layered defenses such as separating trusted system instructions from external content, restricting tool access, validating inputs, and continuously testing agents through automated red teaming.

The company also uses reinforcement learning based automated attackers to discover new vulnerabilities before they appear in real world deployments. While defenses are improving, prompt injection remains a long term security challenge for AI agents that interact with external data.

#
OpenAI

Read Our Content

See All Blogs
Gen AI

WebMCP and AI orchestration: how the web is finally catching up to enterprise AI agents

Deveshi Dabbawala

March 10, 2026
Read more
Gen AI

OpenAI just released GPT-5.4: here’s what you need to know

Deveshi Dabbawala

March 6, 2026
Read more