Back

Building a Production-Grade AI Platform for Healthcare Denial Management

Paushigaa S

April 29, 2026
Table of contents

Denial management in healthcare faces a data problem, not because data is missing, but because systems fail to use it effectively. Teams rely on manual reviews, rigid rule engines, and disconnected tools that do not scale. This post explores an alternative approach built on a production-grade AI system that predicts denials and generates appeals.

The objective goes beyond exposing a model's endpoint. The system operates as a structured, end-to-end.

Architecture at a glance

Stack selection took into account performance and traceability. FastAPI with Uvicorn powers the API level where you get type-safe contracts, native async operations, and high-performance request handling. AI inference is handled via AWS Bedrock leveraging Claude Sonnet at various pipeline steps.

API layer  

 

FastAPI + Uvicorn 

Inference  

 

AWS Bedrock / Claude Sonnet 

Persistence  

 

PostgreSQL (RDS) 

Storage  

 

Amazon S3 

Messaging  

 

Amazon SQS 

Access control  

 

AWS IAM 

Job management is done with PostgreSQL; S3 is leveraged for storing payloads, prompts, and templates separately, ensuring scalability of larger artifacts. The use of SQS allows downstream services such as analytics pipelines and workflow automation to work asynchronously without tying up the core infrastructure.

The two core pipelines

Denial prediction follows a structured flow. A claim payload arrives, passes validation, and creates a job record in PostgreSQL. A background worker picks up the task, retrieves prompt definitions from S3, invokes the model, and validates the output against a strict Pydantic schema that includes decision, probability, confidence, and reasoning.

On success, the system updates the job state to complete and publishes an event. The client polls using a tracking ID and receives one of three states: not found, in progress, or completed.

Appeal generation follows a similar flow but requires deeper signal extraction. The system first normalizes remittance data, including denial codes, adjustment codes, financial context, and service line attributes, into a compact format before any model call runs.

An eligibility check runs first. When an appeal is not needed, the workflow exits early. When the appeal qualifies, the system fetches templates from S3, identifies subtype intent, filters template candidates through a registry, and generates the final letter with associated metadata.

Instead of a single monolithic inference call, both pipelines use a multi-stage strategy: decisioning → classification → generation. This keeps each stage testable, observable and independently tunable.  

Runtime governance over hardcoded logic

The team made a key architectural decision to store prompts and templates in managed S3 storage instead of hardcoding them into the codebase. This setup allows teams to improve appeal quality or expand the template taxonomy without redeploying the service.

A registry layer manages subtype-to-template mappings, detects duplicates, and selects templates consistently. This keeps the system flexible while maintaining predictable behavior.

Reliability and observability

The system handles failures as visible and recoverable events, and not as silent errors. It persists every job state, including failures, and validates model outputs against strict schemas before writing any data. The SQS integration pushes events to downstream systems, so they do not need to poll the core platform.

Observability focuses on end-to-end visibility. The system tracks correlated logs across requests and job execution, measures latency at each stage, and triggers alerts for stalled jobs. Teams treat operational visibility as a core requirement, not secondary concern.

What this demonstrates

AI delivers the most value in production when it operates inside a structured workflow, not as a standalone endpoint. Healthcare denial management demands this approach due to high stakes, large volumes, and frequent edge cases that require clear audit trails.

Asynchronous orchestration, governed prompts, durable job state, and event-driven integration work together to create a system that remains predictable and reliable in real-world operations.

GoML’s AI Matic extends this approach by providing a managed layer for building, orchestrating, and scaling such AI workflows, with built-in support for prompt governance, pipeline automation, and production monitoring.

Explore our website for more technical content in the world of AI and ML engineering.