Back

Automating Document Classification & Storage for Ledgebrook

Deveshi Dabbawala

February 24, 2025
Table of Content

Business Problem

  • Manual Classification Bottleneck: Ledgebrook's underwriting team had to manually classify documents, leading to inefficiencies and delays. 
  • High Risk of Misclassification: Errors in categorization disrupted underwriting workflows and impacted decision-making. 
  • Lack of Centralized Tracking: There was no system in place to track document groups and their relationships, making retrieval inefficient. 
  • Scalability Issues: As the volume of documents increased, manual processing became a bottleneck, limiting operational scalability.  

About Ledgebrook

Ledgebrook receives various submission and policy documents via email, which need to be classified as loss run or non-loss run, processed, and stored efficiently for underwriting and claims processing. goML automated the document processing workflow, eliminating manual efforts. 

Solution

goML developed an automated document classification and storage system leveraging AWS cloud services: 

Document Ingestion & Storage: Documents received via email were automatically stored in an AWS S3 bucket, ensuring scalability and security. The pipeline starts by ingesting files from S3. 

Database Integration: Processed document details were stored in a PostgreSQL RDS database, while the documents themselves were vectorized and stored in OpenSearch. 

AI-Based Classification: AWS Textract was used to extract text from the files, and AWS Bedrock was leveraged to classify documents into loss run and non-loss run categories. 

Search & Retrieval: Indexed metadata in OpenSearch Serverless allowed underwriters to quickly retrieve documents based on classification.

Token-Based Document Grouping: A unique aiDocumentSessionToken was assigned to each group of related files, enabling seamless tracking. 

Architecture

  • API Layer 
    FastAPI POST endpoint (/ai/documents/) handles requests. 
    Generates or retrieves aiDocumentSessionToken. 
    Returns the aiDocumentSessionToken to the user. 
  • Background Processing 
    A background task is initiated to process the documents. 
    Connects to AWS S3 bucket and PostgreSQL Aurora RDS. 
  • Session Management 
    Checks if aiDocumentSessionToken exists in PostgreSQL RDS: 
    If Yes → Retrieves existing file information and filters out already processed files. 
    If No → Creates a new session record in the database. 
  • Document Classification 
    Uses AWS Textract to extract text. 
    Uses AWS Bedrock for classification into Loss Run and Non-Loss Run categories. 
    Updates PostgreSQL RDS with classified files. 
    Indexes Non-Loss Run documents in AWS OpenSearch. 
  • Integration with Other Services 
    Triggers the Loss Run Service for extracting key loss run data. 
    Calls the Liability Code Service to extract Class Codes, Class Code description, NAICS, NAICS description, SIC, and exposure amount. 
    Sends a document issues webhook if any document processing errors occur. 
  • Storage & Retrieval 
    Stores documents securely in AWS S3. 
    Updates metadata in PostgreSQL Aurora RDS. 
    Enables fast retrieval and analytics through AWS OpenSearch. 

Outcomes

80%
Enhanced Efficiency – Automates the extraction and classification of documents, reducing manual effort and processing time.
90%
Improved Accuracy – Minimizes human errors in document retrieval and classification, ensuring reliable data handling.
70%
Faster Decision-Making – Enables quick access to submission and policy documents, accelerating underwriting and claims processing.