AI agents are being handed AWS access at scale. Infrastructure management, code deployment, operations automation agents are doing it all. But there's a question the industry hasn't answered yet:
"What can this AI agent actually do in your AWS account and how much damage could it cause?"
Existing IAM auditing tools AWS Access Analyzer, IAM policy simulators tell you what permissions are attached. That's not the same thing. An AI agent doesn't just use permissions one at a time. It chains them. It orchestrates. It operates at machine speed across multiple services simultaneously. The blast radius of an AI agent is categorically different from a human using the same role.
This article documents the experiment we ran to answer that question: building the Rogue Agent Impact Visualizer which is the first security tool designed specifically for the era of AI agents with AWS access.
Why AWS MCP server changes everything
In May 2026, AWS made its MCP (Model Context Protocol) Server generally available. This is a managed remote server that gives AI agents standardized access to all AWS APIs every service, every action through a single, IAM-audited interface.
This changes the threat model for AI agents fundamentally. Before MCP, agents needed custom integrations per service. Now a single agent with a single IAM role can reach everything that role allows and it can do so autonomously, recursively, and at scale.
What AWS MCP server actually gives an agent:
• Access to 15,000+ AWS API actions through one tool endpoint
• Built-in SigV4 auth via IAM roles (no credential exposure)
• CloudWatch logging + CloudTrail auditing of every API call
• Agent SOPs - pre-built multi-step AWS task patterns
The security tooling for this access pattern doesn't exist yet. Access Analyzer tells you what a role can do. Nothing tells you what an AI agent operating under that role can orchestrate. That's the gap this experiment targets.
Static permissions vs. Agent blast radius
A developer configures an IAM role thinking: "this agent needs S3 access, Lambda invocation, and DynamoDB reads." They attach the appropriate policies. What they don't account for is how an AI agent actually uses those permissions:
- It calls lambda:CreateFunction then configures the new function to write to S3 via s3:PutBucketPolicy
- It uses iam:SimulatePrincipalPolicy to probe what else it can reach
- It chains sts:AssumeRole if any attached policy allows it escalating to a higher-privilege role
- It disables CloudTrail logging using cloudtrail:StopLogging covering its tracks
None of these individual actions requires admin access. All of them combined produce admin-equivalent impact. The blast radius is the full graph of what the agent can orchestrate not just what the role says.
The Rogue agent impact visualizer
The Rogue Agent Impact Visualizer is a full-stack application that does one thing: takes an IAM Role ARN and produces a complete picture of what an AI agent operating under that role can actually do.
Input → Output
How the agent works
The agent operates in a recursive reasoning loop not a hardcoded sequence of API calls. It thinks, decides, acts, observes, and repeats:
1. User provides a Role ARN via the React frontend
2. FastAPI backend spins up the agent with the target ARN
3. Agent initializes an MCP session (SSE protocol, SigV4 auth)
4. Agent asks Claude: "What should I call next to understand this role?"
5. Claude responds with an AWS CLI command agent executes it via MCP
6. Result returned to Claude reasoning continues until all data gathered
7. Risk scoring engine calculates blast radius across all collected policy data
8. Agent generates a least-privilege replacement policy
9. Results persisted to DynamoDB, full report to S3, frontend displays graph
System design
The system has three layers: a React frontend, a FastAPI backend with the agent core, and AWS services accessed exclusively through the MCP proxy.
Backend Stack
Frontend Stack
MCP protocol implementation
The AWS MCP Server communicates over Server-Sent Events (SSE) a long-lived HTTP connection that streams JSON-RPC messages. This was one of the first real implementation challenges: the mcp-proxy npm package returns responses in SSE format, not standard JSON.
SSE Parsing
Each SSE response contains multiple data: lines. The MCP client must extract these, concatenate them, and parse as JSON-RPC. The session ID is captured from the Mcp-Session-Id response header and sent in all subsequent requests.
Key MCP implementation details:
• Protocol: JSON-RPC 2.0 over SSE (not standard REST)
• Auth: SigV4 signed requests via local proxy (localhost:3000)
• Session: Mcp-Session-Id header must be preserved across calls
• Tool: aws___call_aws single tool for all AWS CLI commands
• Timeout: 30s per tool call (IAM enumeration can be slow)
Model: Claude Sonnet 4.6
We upgraded from Claude Sonnet 4.5 to 4.6 mid-experiment. The inference profile approach on Bedrock is notable: the model self-identifies as Claude Sonnet 4.5 in its responses (training data), but the inference profile ARN routes to the actual 4.6 weights.
How we score blast radius
The risk scoring engine is the core research output. It takes the raw policy data enumerated by the agent and produces a 0–100 score based on three factors:
- Factor 1: Service sensitivity weight (0–10) — IAM and STS score 10, CloudWatch scores 4
- Factor 2: Action scope multiplier — wildcard action (*) doubles the weight
- Factor 3: Resource ARN multiplier — wildcard resource (*) adds 50% to the score
Admin access (Action: *, Resource: *) automatically scores 100 regardless of other factors. The score is normalized across all services and capped at 100.
Why this matters now
The timing is not accidental. AWS MCP Server went GA. Agents now have a standardized, production-grade path to every AWS API. Enterprises are already asking: how do we govern this?
The security tooling hasn't kept pace. Access Analyzer was built for human principals. IAM policy simulators were built for static analysis. Neither was built to answer: "If this AI agent goes rogue or gets compromised what's the actual damage surface?"
That's the question this tool answers. And the answer, as Part 2 will show with real scan data, is almost always larger than what was intended.


.jpg)


