Business Problem
Genpact faced several challenges in handling large-scale data related to CAT (Catastrophe) modeling:
- Manual Data Processing: Traditional data handling required significant manual effort to clean, transform, and interpret Excel-based datasets.
- Error-Prone Workflows: The risk of human error in data entry, validation, and structuring led to inconsistencies in insights.
- Inefficient Decision-Making: Lack of automation in data visualization and analytics delayed actionable insights.
- Scalability Issues: The existing approach was not scalable, requiring high resource allocation for data management tasks.
About Genpact
Genpact is a global professional services firm specializing in digital transformation, analytics, and AI-driven solutions. With a strong focus on operational efficiency, Genpact partners with businesses to streamline complex processes and drive data-driven decision-making.
Solution
goML partnered with Genpact to develop an automated CAT modeling solution that streamlined data ingestion, processing, visualization, and interactive analysis through AI-driven automation.
Automated Data Preprocessing
To eliminate the need for manual data cleansing, goML implemented automated preprocessing pipelines using Python (Pandas, NumPy) for data transformation, AWS Lambda for serverless execution, and AWS RDS for structured data storage. This ensured standardized, high-quality data processing while reducing human effort.
Efficient Data Validation & Mapping
Automated data validation and mapping processes were implemented using Python (Geopandas), AWS S3, and AWS RDS, ensuring accurate geocoding validation, air_code checks, and standardized data mapping through predefined templates. This significantly improved data accuracy and consistency.
Dynamic Data Visualization
The processed data was converted into interactive, real-time visualizations using Highcharts, allowing analysts to dynamically filter dimensions and measures. This enabled quick data exploration and pattern identification, significantly improving decision-making efficiency.
Seamless Workspace Management
Users could create and manage dedicated workspaces using AWS EC2, React with TypeScript, and Node.js, making it easier to organize, retrieve, and track processed datasets. This enhanced operational efficiency and streamlined data accessibility across multiple projects.
Conversational AI for Data Interaction
A chatbot-driven interface was integrated using GPT-3.5, LangChain, and AWS Lex to assist users in querying and interpreting data through natural language interactions. This eliminated the need for manual report generation and provided real-time, AI-driven insights.
Architecture
- Data Processing & Storage
CRM (Customer Relationship Management System): Source of raw customer data.
AWS Glue: Performs data cleansing and preprocessing.
Amazon S3: Stores the cleaned data for further processing. - Model Training & Deployment
Amazon SageMaker Training: Performs hierarchical clustering to train the model.
Trained Model: Stored for inference after training. - Model Inference & API Integration
API Gateway: Receives customer data and forwards it for processing.
AWS Lambda: Processes API requests and interacts with the inference system.
Amazon RDS: Stores metadata and predicted results from inference.
SageMaker Inference Endpoint: Deploys the trained model for making predictions. - Monitoring & Continuous Learning
SageMaker Monitoring: Observes model performance and triggers re-training.
AWS Glue (Re-Training): Re-trains the model if required based on monitored feedback. - Security & Monitoring
AWS IAM: Manages access control.
AWS KMS: Ensures encryption and security of data.
Amazon CloudWatch: Monitors system logs and performance.
