Chalo, one of India’s largest transit-tech players, wanted to improve how commuters access real-time bus arrival times across 20+ cities. While Chalo’s live bus tracking and digital ticketing systems had already digitized much of the experience, accuracy in ETA predictions remained a persistent challenge, especially in unpredictable urban environments.
To solve this, Chalo partnered with GoML to build a scalable bus ETA prediction AI powered by AWS-native infrastructure and real-time machine learning. The goal: a city-wide pilot that could scale across regions with high model accuracy, low latency, and seamless integration into existing transit systems.
The problem: inaccurate ETAs and poor scalability across cities
Chalo had already digitized public transport with live tracking and mobile ticketing. But commuters still faced unreliable bus arrival times, especially during peak hours or city-wide disruptions. The challenge grew exponentially with scale, across routes, cities, and diverse conditions.
ETAs often required manual tuning or static models that couldn't adapt to real-world changes. Chalo needed a robust, scalable bus ETA prediction AI system to automate predictions, reduce manual intervention, and deliver real-time insights with minimal latency.
The solution: real-time, ML-powered bus ETA prediction system on AWS
GoML and Chalo co-developed a bus ETA prediction AI system using the following architecture and workflows:
Data ingestion and processing
- Sources: Route definitions, GPS logs, stop times, and trip metadata from Chalo's systems.
- Storage: Amazon RDS and S3 for structured and historical data.
- ETL Pipeline: AWS Glue for data quality checks, normalization, and cleaning.
Machine learning pipeline
- Model training: Machine learning model trained in Amazon SageMaker using over 6 months of city-specific data.
- Features used: 40+ primary and secondary features including:
- Distance, speed, number of stops
- Traffic flow, signal count, road diversions
- Historical and real-time travel time, average speed, wait time
- Model monitoring: Continuous evaluation with SageMaker Model Monitor.
- Auto-retraining: Triggered via AWS Step Functions when performance thresholds are violated.
Real-time inference and API delivery
- SageMaker inference endpoint: Serverless prediction with millisecond latency.
- API Gateway + Lambda: For scalable request handling and preprocessing.
- ElasticCache: Caching for high-frequency queries.
- Output format: Predicted ETA times per stop (e.g., S2: 300s, S4: 400s...).

The impact: accurate, scalable, and real-time transit predictions
- 75%+ ETA accuracy within 60 seconds of actual bus arrival
- Zero manual intervention for route-specific tuning
- Ready-to-scale across all 20+ cities
- Significant reduction in commuter ETA complaints
Lessons for city transit
Common pitfalls to avoid
- Relying on static ETA models in dynamic city environments
- Building ML models without live feedback loops and retraining triggers
- Failing to integrate predictions into real-time commuter touchpoints
- Skipping explainability and validation with transit operations teams
Advice for city transit authorities and tech teams
- Start with a high-impact pilot city but architect for scale across networks
- Use real-world features like traffic, weather, and event data, not just GPS
- Pair model intelligence with systems intelligence like APIs, caching, and ops monitoring
- Think beyond ETAs, optimize for routes, schedules, and fleet productivity
Ready to transform public transit with AI that’s built for the real world?
GoML can help you build production-grade an ETA prediction system with AI, just like we did with Chalo.
Reach out to explore how AI can power smarter, commuter-first operations in your city.