Back

How Ledgebrook used AI for liability code extraction to achieve 95% classification accuracy

Deveshi Dabbawala

June 16, 2025
Table of contents

Ledgebrook underwriters rely on accurate business classification using NAICS, SIC, and Class Codes to assess insurance risk. Extracting this information manually from ACORD forms was time-consuming, error-prone, and ultimately slowed down underwriting.  

To address this, Ledgebrook partnered with GoML to use AI for liability code extraction using AWS.

The problem: slow underwriting and high misclassification risk

As submission volumes grew, underwriters at Ledgebrook spent hours manually extracting business classification data from ACORD forms (125, 126, 827). These forms often came in unstructured formats, requiring repetitive and error-prone human effort to identify relevant fields like NAICS and Class Codes.  

Any mistakes in classification led to inaccurate risk profiling and policy mispricing. The lack of centralized automation slowed down operations, limited scalability, and exposed the company to compliance risks.

The solution: AI liability code extraction engine

GoML developed a fully automated pipeline using AWS services to extract, classify, and store liability code data from ACORD documents.

Automated Text and Data Extraction

  • AWS Textract was used to extract business classification information from scanned.
  • Unstructured ACORD forms, enabling accurate field-level parsing.

RAG-Powered Liability Code Retrieval

  • A Retrieval-Augmented Generation (RAG) pipeline using AWS Bedrock and OpenSearch was designed to extract addresses and business types from the document.  
  • It then matched them against a pre-defined codebook (Excel files in S3) containing NAICS, SIC, and Class Codes.

Structured Data Processing

  • The extracted codes were validated and structured before being stored in PostgreSQL RDS.
  • The system ensured each form’s classification data was consistent and immediately retrievable.

Webhook Execution for Real-Time Integration

  • Once liability code extraction was complete, a webhook triggered downstream systems.  
  • Execution of the updated classification in real time.

On-Demand Data Access

  • Underwriters could query liability code data through a REST API.
  • Using an aiDocumentSessionToken, returning structured responses within seconds.
AI liability code extraction engine

The impact: faster underwriting with AI liability code extraction

  • 80% faster classification process, reducing manual labor significantly
  • 95% accuracy in business classification, minimizing mispriced policies
  • 60% improvement in underwriting efficiency, allowing faster quote generation and risk evaluation

Lessons for other insurance and insure-tech companies

Common pitfalls to avoid

  • Relying on manual reading of unstructured forms for code classification
  • Skipping the use of AI-powered matching pipelines
  • Keeping classification logic outside of a central system

Advice for teams facing similar challenges

  • Use retrieval-based methods to dynamically map form content to codebooks
  • Invest in a structured classification database from the beginning
  • Automate webhook and API workflows to ensure real-time integration

Want to improve classification accuracy by 95%?

Let GoML automate your liability code extraction process, just like Ledgebrook.

Outcomes

80%
Faster underwriting decisions
95%
Accuracy in business classification
60%
Improvement in operational efficiency