Back

Enhancing Data Insights and Efficiency: GroupSolver's Journey with GenAI-Driven Data Summarization

Deveshi Dabbawala

December 9, 2024
Table of contents

Business Problem

GroupSolver faced key challenges in improving its data handling capabilities:

  • Needed AI-enhanced data extraction and summarization to generate efficient insights.
  • Sought to improve data summarization quality to provide more actionable insights.
  • Required a streamlined, efficient information processing solution to enhance research workflows.

Solution

To help GroupSolver overcome these challenges, GoML designed a 5-week Proof of Concept (POC) leveraging AWS infrastructure. Key components of the solution included:

AI-Driven Data Collection: Developed serverless data collection functions using AWS Lambda to fetch relevant data from GroupSolver’s repository, ensuring data accuracy and relevance.

System Integration: Integrated AI models seamlessly within GroupSolver’s AWS environment, enabling smooth operation within their existing platform.

Deployment: Implemented a streaming API for efficient data flow management, with Python for scripting and automation.

LLM Model Development: Fine-tuned Large Language Models (LLMs) using AWS Bedrock with Claude V3 to analyze data and produce insightful summaries.

Testing and Validation: Conducted comprehensive testing to validate the accuracy and effectiveness of AI-generated insights, ensuring high-quality output.

Architecture

  • AWS Cloud Infrastructure: CloudWatch: Monitors and logs application and infrastructure metrics, providing insights to detect performance issues and improve reliability.
  • Networking and Security: VPC (Virtual Private Cloud):
    Public Subnet
    Web Interface: Hosts the publicly accessible web application for end-users.
    API Gateway: Acts as the main entry point for HTTP requests, routing incoming traffic to internal services within the VPC and ensuring secure access control.
    Private Subnet: Contains backend services that are not directly accessible from the public internet, enhancing security and access control.
  • Processing and Data Management:
    Lambda: Executes serverless functions for backend processing, automatically scaling to meet demand while reducing infrastructure management overhead.
    ECR (Elastic Container Registry): Stores Docker container images, enabling seamless deployment of containerized applications and services within the architecture.
    Prompt Engineering Module: Manages and processes input prompts for the AI system to optimize interaction with the language model.
  • AI Model and API:
    Anthropic Claude v3: An advanced large language model (LLM) integrated within the architecture, capable of processing natural language prompts generated by the prompt engineering module, handling complex tasks, and generating responses.
    FastAPI: A lightweight and high-performance Python framework used to build RESTful APIs, enabling interaction between different components of the application.
  • External Integration and Source Control:
    Bitbucket: Manages source code and integrates with the deployment pipeline, enabling version control, collaboration, and streamlined code deployment.
  • Security and Access Control:
    WAF (Web Application Firewall): Protects the application from common web threats like SQL injection and cross-site scripting (XSS) by filtering and monitoring HTTP requests between users and the web interface.

Outcomes

40%
Faster data processing
35%
Improvement in quality
30%
Increased operational efficiency