Business Problem
- Manual Classification Bottleneck: Ledgebrook's underwriting team had to manually classify documents, leading to inefficiencies and delays.
- High Risk of Misclassification: Errors in categorization disrupted underwriting workflows and impacted decision-making.
- Lack of Centralized Tracking: There was no system in place to track document groups and their relationships, making retrieval inefficient.
- Scalability Issues: As the volume of documents increased, manual processing became a bottleneck, limiting operational scalability.
About Ledgebrook
Ledgebrook receives various submission and policy documents via email, which need to be classified as loss run or non-loss run, processed, and stored efficiently for underwriting and claims processing. goML automated the document processing workflow, eliminating manual efforts.
Solution
goML developed an automated document classification and storage system leveraging AWS cloud services:
Document Ingestion & Storage: Documents received via email were automatically stored in an AWS S3 bucket, ensuring scalability and security. The pipeline starts by ingesting files from S3.
Database Integration: Processed document details were stored in a PostgreSQL RDS database, while the documents themselves were vectorized and stored in OpenSearch.
AI-Based Classification: AWS Textract was used to extract text from the files, and AWS Bedrock was leveraged to classify documents into loss run and non-loss run categories.
Search & Retrieval: Indexed metadata in OpenSearch Serverless allowed underwriters to quickly retrieve documents based on classification.
Token-Based Document Grouping: A unique aiDocumentSessionToken was assigned to each group of related files, enabling seamless tracking.
Architecture
- API Layer
FastAPI POST endpoint (/ai/documents/) handles requests.
Generates or retrieves aiDocumentSessionToken.
Returns the aiDocumentSessionToken to the user. - Background Processing
A background task is initiated to process the documents.
Connects to AWS S3 bucket and PostgreSQL Aurora RDS. - Session Management
Checks if aiDocumentSessionToken exists in PostgreSQL RDS:
If Yes → Retrieves existing file information and filters out already processed files.
If No → Creates a new session record in the database. - Document Classification
Uses AWS Textract to extract text.
Uses AWS Bedrock for classification into Loss Run and Non-Loss Run categories.
Updates PostgreSQL RDS with classified files.
Indexes Non-Loss Run documents in AWS OpenSearch. - Integration with Other Services
Triggers the Loss Run Service for extracting key loss run data.
Calls the Liability Code Service to extract Class Codes, Class Code description, NAICS, NAICS description, SIC, and exposure amount.
Sends a document issues webhook if any document processing errors occur. - Storage & Retrieval
Stores documents securely in AWS S3.
Updates metadata in PostgreSQL Aurora RDS.
Enables fast retrieval and analytics through AWS OpenSearch.