Back

CloudWatch for Agentic AI observability

Sharan Sundar Sankaran

December 3, 2025
Table of contents

I love CloudWatch. I have personally integrated into many of our recent implementations. Quite frankly, there is no reason not to.

In this deep dive, I will discuss three dimensions to CloudWatch Observability that we will use quite a bit at GoML.

  1. Gen AI Observability
  2. CloudTrail and RUM
  3. CloudWatch Investigations

1. Gen AI Observability

Amazon CloudWatch's generative AI observability feature, which includes capabilities for monitoring agentic AI workloads, was launched in preview in July 2025 and became generally available in October 2025

On November 26, 2025, AWS’s Cloud Operations team published its re:Invent top announcements, headlined by generative AI observability in CloudWatch with updates like native LLM tracing, token and latency metrics, and compatibility with AgentCore plus external tools like LangChain, LangGraph, CrewAI, and others.  

The post also previewed MCP servers for CloudWatch and a GitHub Action that pulls observability into PRs.

Why Gen AI Observability?

Conventional monitoring systems provide only surface-level insights—CPU, memory, and p95 latency—leaving developers to manually stitch together logs or build custom instrumentation. These tools fall short in providing the visibility across components for agentic AI. This creates a trade-off between monitoring depth and operational efficiency. They also lack AI-native visibility, missing cues like cross-tool prompt flows, token consumption, hallucination-risk paths, retrieval misses, rate-limit retries, and model-switch decisions.

What is it?

The feature delivers built-in views and full end-to-end tracing for LLMs, agents, knowledge bases, and tools. CloudWatch now gives developers deep visibility into performance and accuracy, with the ability to inspect specific traces and resolve issues across the whole AI workflow.

This was clearly illustrated by Jeff Barr(VP) as a part of INV2026 – Ops in AI Age – re:invent 2025 on 1 Dec.  

[Figure 1] Amazon CloudWatch Generative AI dashboard

Key Features

CloudWatch generative AI observability provides two pre-built capabilities:

  • Model Invocations – a comprehensive dashboard showing model usage and token spend, along with a curated invocation log table that surfaces the exact inputs and outputs for every inference.
  • Amazon Bedrock AgentCore agents – Performance and decision metrics for primitives of Amazon Bedrock AgentCore such as Agents, Memory, Built-in Tools, Gateways, and Identity.

Key metrics available in these dashboards include:

  • Total and average invocations
  • Token usage (total, average per query, input, output)
  • Latency (average, P90, P99)
  • Error rates and throttling events
  • Cost attribution by application, user role, or specific user

Reception and Responses

Since it is relatively new, only a few weeks since GA, there is not a lot of real world experience to draw from.

While the built-in GenAI insights, natural language query generation for logs and metrics are pros, there still seems some apprehensions with the cost at scale (when log volumes and metric counts swell). Similar to AgentCore, the configuration complexity is a challenge for developers with customization taking significant time.  

2. CloudTrail and RUM

In addition to the new generative-AI observability features in CloudWatch, AWS rolled out upgrades to CloudTrail and RUM on November 19–20. CloudTrail now offers five-minute data event rollups and new Insights for anomaly detection, while CloudWatch RUM added mobile support for iOS and Android via OpenTelemetry. Combined, these updates close a major visibility gap for teams running AI in production.

Along with these were two new announcements - Model Context Protocol (MCP) servers for Amazon CloudWatch and CloudWatch Application Signals help AI agents to interact naturally with your observability data.  

MCP servers provide standardized access to metrics, logs, alarms, traces, and service health data, allowing you to build autonomous operational workflows. CloudWatch Application Signals integrates directly into developer workflows with a new GitHub Action that provide observability insights during pull requests and CI/CD pipelines.  

While these developments are not significant, they do reaffirm the direction in which AWS is thinking and going – trace the AI and harden the audit trail.

3. CloudWatch Investigations

CloudWatch investigations, which reached general availability in June 2025, represents a significant leap forward in operational intelligence.  At re:Invent 2025, they have added major AI-powered enhancements – incident report generation and "5 Whys" analysis.

CloudWatch Investigations leverages generative AI to perform sophisticated root cause analysis and offer guided, context-aware troubleshooting—streamlining how DevOps teams diagnose and resolve high-severity incidents. Building on this foundation, the October update expanded this foundation with interactive incident report generation, enabling organizations to move beyond reactive firefighting toward a more systematic, knowledge-led approach to resolving issues and driving continuous improvement.

At re:Invent 2025, Nandini Ramani introduced an integrated AI-powered ‘5 Whys’ analysis workflow that implements the exact systematic methodology AWS teams use internally to drive to root causes. We cannot be more excited about this.

[Figure 5] 5 Whys Analysis in the CloudWatch investigations Incident Report

Strong observability builds trust in agentic systems. I expect many more of our customers will ask about agent performance and observability in 2026 than they did in 2025.

In any case, this is a theme that we continuously speak to our customers about. Here is one of my favorite examples of building in observability from the ground up for a clinical intake automation workflow.

This article is part of our comprehensive guide to AWS AI. Explore the guide to know more about AWS AI tools like Bedrock and SageMaker, AWS AI LLMs like Nova, and why AWS AI infrastructure is the best way to build gen AI based solutions that can scale in production.