Amazon Bedrock adds one-hour prompt caching to boost latency and cost efficiency

Amazon Bedrock now supports one-hour prompt caching, allowing developers to reuse context efficiently, reduce inference latency, and lower costs for repetitive or long-running generative AI workloads.

AWS has enhanced Amazon Bedrock by extending prompt caching duration to one hour, a significant upgrade for developers building production-scale generative AI applications.

Prompt caching enables reuse of previously processed context, reducing repeated computation and improving response latency while lowering inference costs. This is particularly valuable for agentic workflows, RAG systems, and conversational applications with stable system prompts.

The update signals AWS’s focus on inference optimization rather than just model access, positioning Bedrock as a more cost-efficient, enterprise-ready platform for scalable Gen AI deployments.

Bedrock