Facio is a Brazilian fintech focused on micro-loans, serving over 4 million customers with a strong financial data infrastructure. The company processes loan applications using one year of bank statement data per user stored as parquet files on AWS S3, along with bureau scores and customer metadata.
Problem: manual workflows limit scalability of credit scoring model
Manual workflows limited the scalability of the credit scoring model at Facio. The entire pipeline relied on Jupyter notebooks, which led to inconsistent training processes and poor reproducibility. Teams lacked a structured way to compare different versions of the credit scoring model, making it difficult to identify the best performing approach. Monitoring was minimal, so the deployed credit scoring model had limited visibility into data drift and performance degradation.
Manual intervention also increased the time required to retrain and update the credit scoring model, slowing down loan decision cycles. In addition, model governance was weak, with no standardized versioning or approval workflows. These gaps reduced the reliability of the credit scoring model and directly impacted the speed and quality of loan approvals.
Solution: automated MLOps pipeline for credit scoring model
Facio partnered with GoML to build an automated MLOps system for the credit scoring model, covering data ingestion, training, benchmarking, and monitoring. The solution uses GoML’s Data Analytics Accelerator to standardize data pipelines and automate model development.
It converts raw financial data into structured datasets, trains and compares XGBoost credit scoring models, and selects the best version with proper versioning. The system also monitors performance using drift detection and triggers retraining when required, enabling faster and more reliable model updates.
End to end pipeline automation
The system automates the complete workflow for the credit scoring model:
- Data ingestion from S3 parquet files and Athena queries
- Feature engineering from bank transaction data
- Automated dataset merging and temporal train validation test splits
- XGBoost based credit scoring model training and evaluation
Model registration and deployment to SageMaker endpoints
Versioning of all artifacts related to the credit scoring model
The pipeline follows a structured multi step architecture:
- Step 1: Feature extraction from Athena
- Step 2: Data consolidation and splitting
- Step 3: Credit scoring model training with hyperparameter optimization
- Step 4: Model registration in SageMaker Model Registry
- Step 5: Automated performance reporting
This ensures repeatable and scalable development of the credit scoring model.
Model benchmarking and selection
A benchmarking framework enables systematic evaluation of the credit scoring model:
- Automated evaluation metrics generation
- Comparison across multiple credit scoring model versions
- Statistical testing for performance validation
- Hyperparameter tuning for optimizing the credit scoring model
Selection of best performing credit scoring model for deployment
Monitoring and evaluation
The system tracks the health of the credit scoring model in production:
- Data drift detection using statistical metrics
- Population stability index thresholds
- Outcome drift monitoring
- Automated alerts using CloudWatch and SNS
- Explainability reports for the credit scoring model using SHAP
These capabilities improve trust and transparency in the credit scoring model.
Automated retraining loop
The pipeline supports continuous improvement of the credit scoring model:
- Monitoring jobs evaluate drift on a scheduled basis
- Retraining is triggered automatically when thresholds are exceeded
- Latest data is used to retrain the credit scoring model
New versions are registered and evaluated before promotion
Infrastructure and deployment
The solution uses a scalable AWS based architecture:
- Amazon SageMaker for training, hosting, and managing the credit scoring model
- AWS S3 for storing datasets and model artifacts
- Amazon Athena and Glue for feature querying
- AWS Lambda and EventBridge for automation
- CloudWatch and SNS for monitoring and alerts
- Python based pipeline components
The system supports real time inference and batch scoring for the credit scoring model.
Quality assurance
Validation ensures reliability of the credit scoring model:
- End to end pipeline testing
- Data validation and schema checks
- Model evaluation on unseen datasets
- Inference testing for prediction accuracy
- System validation for artifacts and outputs
Impacts
- 60-70% reduction in credit scoring model training time
- 2-3X faster loan decision cycles
- 80% improved accuracy and consistency of the credit scoring model
- Better visibility into model performance through monitoring
- Reduced manual effort for ML teams
About
Before MLops and after MLops
“Facio transformed its credit scoring model pipeline into a scalable system that improves speed, consistency, and decision quality.”
Prashanna Rao, Head of Engineering, GoML
Key takeaways for fintech companies
Common challenges
- Manual workflows slow down credit scoring model updates
- Lack of monitoring reduces trust in credit scoring models
- Difficulty in benchmarking model performance
Practical guidance
- Automate the lifecycle of the credit scoring model
- Implement benchmarking before deployment
- Monitor drift and retrain proactively
- Use model registries for governance
Ready to scale your credit scoring model with automated MLOps?
Partner with GoML and build production grade ML systems using AI Matic.




