Back

Indirect prompt injection: The hidden AI security threat

Deveshi Dabbawala

July 6, 2025
Table of contents

What if the very AI systems designed to make your business more efficient could become the gateway for cybercriminals to access your most sensitive data?

A November 2024 report by The Alan Turing Institute highlights growing risks, stating that 75% of business employees use Gen AI, with 46% adopting it within the past six months. Yet, as organizations rapidly embrace AI-powered applications, a silent threat is emerging that could undermine the entire foundation of AI security of indirect prompt injection attacks.

What is indirect prompt injection?

Indirect prompt injection is a sophisticated cyberattack that exploits AI systems by embedding malicious instructions in external content like websites, documents, or databases. Unlike direct attacks, these hidden commands are disguised within seemingly legitimate information that AI systems process, making them incredibly difficult to detect.

With 75% of business employees now using generative AI according to The Alan Turing Institute's 2024 report, understanding indirect prompt injection has become critical for organizational security.

How does indirect prompt injection work?

Indirect prompt injection attacks follow a simple but dangerous process:

  1. Content poisoning: Attackers embed malicious instructions in external sources your AI accesses
  1. AI processing: Your AI system reads the contaminated content as part of its normal operations
  1. Command execution: The AI unknowingly follows the hidden malicious instructions
  1. Data compromise: Sensitive information gets exposed or unauthorized actions are performed

Common types of indirect prompt injection attacks

1. Hidden web content attacks

Attackers use white text on white backgrounds to hide malicious prompts on websites. While invisible to humans, AI systems can read and execute these commands.

2. Document poisoning

Malicious indirect prompt injection instructions are embedded in PDFs, Word files, or other documents that AI systems process during normal operations.

3. Database contamination

Attackers inject harmful prompts into databases or knowledge repositories that AI systems regularly access for information.

Why is indirect prompt injection dangerous?

OWASP's 2025 AI Security Project ranks prompt injection as the #1 AI security risk because:

  • Stealth nature: Attacks are hidden within legitimate content
  • Persistent effects: Can manipulate AI behavior across multiple sessions
  • Wide Attack surface: Any external content source becomes a potential threat vector
  • Difficult detection: Traditional security tools often miss these attacks

Real cases of indirect prompt injection

Imagine your AI customer service bot pulls information from your company knowledge base. An attacker plants indirect prompt injection commands in a document, disguised as normal text. When the AI processes this document to answer customer questions, it secretly executes the hidden commands, potentially exposing customer data or corporate secrets.

Google Gemini vulnerability (February 2025)

Ars Technica reported vulnerabilities in Google's Gemini AI where indirect prompt injection attacks successfully manipulated the system's long-term memory, demonstrating persistent attack capabilities.

ChatGPT memory exploit (2024)

A persistent indirect prompt injection attack manipulated ChatGPT's memory feature, enabling long-term data theft across multiple conversations.

What are the enterprise risks from indirect prompt injection?

Organizations face serious consequences from successful attacks:

  • Data breaches: Unauthorized access to customer records, databases, and internal documents
  • Operational disruption: AI systems providing incorrect information or making unauthorized changes
  • Compliance violations: Potential regulatory fines for data protection failures
  • Reputation damage: Loss of customer trust when AI systems behave unexpectedly

Remember, some AI systems are more vulnerable to indirect prompt injection attacks. Some examples are modern AI-powered applications with access to:

  • Customer relationship management (CRM) systems
  • Internal databases and knowledge repositories
  • Email systems and communication platforms
  • Financial records and transaction systems
  • External APIs and web services

How to prevent indirect prompt injection attacks?

1. Input validation and sanitization

Implement robust filtering to identify suspicious prompts before they reach your AI system. However, this is challenging since legitimate inputs often resemble potential attack vectors.

2. Limit AI system privileges

Apply the principle of least privilege, give AI systems access only to the minimum data and functions necessary for their purpose.

3. Continuous monitoring

Deploy systems that detect unusual AI behavior patterns, such as unexpected data access requests or anomalous outputs.

4. Regular security audits

Conduct periodic assessments of AI systems to identify potential vulnerabilities and attack vectors.

5. Content source verification

Verify the integrity of external content sources your AI systems access regularly.

Emerging indirect prompt injection attack vectors in 2025

As AI adoption accelerates, indirect prompt injection attacks are becoming more sophisticated and widespread. The threat landscape is evolving rapidly in 2025, with new vulnerabilities emerging in critical AI infrastructure:

Anthropic's MCP inspector vulnerability (july 2025)

Cybersecurity researchers have discovered a critical security vulnerability in artificial intelligence (AI) company Anthropic's Model Context Protocol (MCP) Inspector project that could result in remote code execution (RCE) and allow an attacker to gain complete access to the hosts.  

Enterprise protocol security risks

The increased number of interacting components and potential third-party services expands the attack surface and introduces new security vulnerabilities.  

Ensuring secure authentication, authorization, and data handling across multiple agents and tools is critical. As Google's A2A and Anthropic's MCP protocols gain adoption across enterprise systems, the potential impact of indirect prompt injection attacks multiplies.

Multi-agent system vulnerabilities

The rise of interconnected AI agent systems creates new opportunities for indirect prompt injection attacks.  

Since its launch last November, the Model Context Protocol from Anthropic has gone viral, generating buzz as a simple, standardized way to connect language models with tools and data. Now, Google has introduced another protocol to further agentic AI, called Agent2Agent (A2A), designed to facilitate communication among AI agents.

Cross-platform attack surfaces

An indirect prompt injection vulnerability (also known as cross-domain prompt injection or XPIA) is a security exploit targeting generative AI systems where malicious instructions are embedded in external content, such as documents, web pages, or emails.  

Microsoft's documentation highlights how these attacks are expanding beyond single systems to cross-platform vulnerabilities.

Industry response to indirect prompt injection vectors

Major tech companies are responding with significant security investments:

  • Google's layered defense strategy: Google Gen AI Security Team With the rapid adoption of generative AI, a new wave of threats is emerging across the industry has led to dedicated security teams focusing on prompt injection mitigation.
  • OWASP's updated framework: Indirect prompt injections occur when an LLM accepts input from external sources, such as websites or files. The content may have in the external content data that when interpreted by the model, alters the behavior of the model in unintended or unexpected ways. OWASP's 2025 guidelines reflect the growing sophistication of these attacks.
  • Enhanced risk assessment: Companies are developing new methodologies to estimate and quantify indirect prompt injection risks across AI systems before deployment.

Enterprise security implications for indirect prompt injection vectors

The rapid evolution of indirect prompt injection techniques means traditional security approaches are insufficient. Organizations must prepare for:

  • Multi-vector attacks targeting interconnected AI systems
  • Protocol-level vulnerabilities in foundational AI infrastructure
  • Cross-domain exploitation across different AI platforms and services
  • Persistent threats that can remain dormant in AI systems until triggered

The challenge is no longer just protecting individual AI models, but securing entire ecosystems of interconnected AI agents and protocols

Summary

Indirect prompt injection represents a fundamental shift in cybersecurity. As businesses increasingly rely on AI-powered applications, they must simultaneously invest in understanding and mitigating these sophisticated attack vectors.

The question isn't whether your organization will encounter indirect prompt injection attempts, it's whether you'll be prepared when they occur. By implementing comprehensive security measures and maintaining a proactive security posture, organizations can harness AI's power while protecting against its potential misuse.

  • Indirect prompt injection is the #1 AI security threat according to OWASP
  • Attacks hide malicious commands in legitimate content
  • Any external content source can become an attack vector
  • Multi-layered security approaches are essential for protection
  • Regular monitoring and auditing are crucial for early detection

Ready to secure your AI-powered applications against indirect prompt injection and other emerging threats? Get an executive AI briefing to understand how our AI security practices and guardrails can protect you.