Back

Enabling Master Data Management and Enhancing AI-Driven Insights for RTA

Deveshi Dabbawala

February 21, 2025
Table of Content

Business Problem

  • Inefficient data ingestion and integration across multiple applications.
  • Lack of AI-powered data processing and transformation capabilities.
  • Limited ability to query and analyze structured data using NLP.
  • Need for stringent data security, privacy, and compliance measures.

About RTA

Dubai’s Roads and Transport Authority (RTA) manages a vast transportation network and a complex application ecosystem. To enhance data-driven decision-making, RTA launched a Proof of Concept (PoC) focused on improving data ingestion, preprocessing, and analytics while ensuring security and compliance. The challenge lay in integrating data from diverse sources—processes, applications, contracts, and more. By leveraging AI and cloud technologies, the PoC successfully established relationships between these sources, enabling seamless access and natural language-based insights.

Solution

To address RTA’s data management challenges, goML designed an AI-driven solution focused on efficient data ingestion, transformation, and analytics. The implementation leveraged advanced cloud and machine learning technologies to ensure seamless data processing, security, and real-time insights.

AI-Driven Data Processing: Built a data pipeline using AWS Glue, Python, and SageMaker for automated data ingestion, preprocessing, and transformation. (RELATIONSHIP ESTABLISHMENT)

Data Stewardship: Enables data stewardship and visualization of the data from different data sources.

LLM Used: The Llama 3 large language model (LLM) was used in this case, deployed via AWS SageMaker to enable natural language querying (RAG-based interactions) with structured data. This allowed defning relationships from vast data sources and allowed users to extract insights conversationally while ensuring data security and compliance.

GenAI-Powered Query Engine: Integrated Llama 3 to enable NLP-based data (RELATIONSHIP ESTABLISHMENT AND) querying for intuitive insights.

User-Friendly Interface: Developed an intuitive Plotly dashboard for seamless interaction and data management.

Real-Time Analytics: Utilized OpenSearch to analyze and visualize data dynamically.

Data Security & Compliance: Ensured compliance with Dubai Electronic Security Center (DESC) policies, including role-based access controls and secure data handling.

Architecture

  • On-Premises Components:
    RTA SIEM: Security Information and Event Management system.
    RTA Data Store: Primary data storage for RTA.
    Secondary Data Store: Additional storage for data redundancy and mapped data storage.
    RTA AD Connect: Active Directory integration for user authentication.
    User Access: Users accessing the system through AD Connector.
  • Network & Security Components:
    VPN Gateway: Secure connection between on-premises and cloud infrastructure.
    Guard Duty: Monitors and detects security threats.
    EventBridge: Manages event-driven processes.
    SNS (Simple Notification Service): Sends alerts and notifications.
    Email Integration: Notifies stakeholders of critical events.
  • AWS Cloud Infrastructure:
    Virtual Private Cloud (VPC): Secure cloud environment for application deployment.
    Customer Application Subnet: Hosts application components within the VPC.
  • Data Processing & Storage:
    AWS Glue: ETL (Extract, Transform, Load) processing.
    Detects sensitive data.
    Maps schema.
    Other transformation tasks.
    Relationship establishment
    Stores METADATA, schema of the processed data in S3.
    RDS (Postgres): Relational database for business information and other metadata storage.
  • AI & GenAI Capabilities:
    AWS SageMaker: Machine learning model deployment + LLAMA3 LLM deployment.
    Llama 3 LLM: Enables NLP-based queries and responses.
    OpenSearch (Vector DB): Facilitates semantic search.
    FMEval: Evaluates LLM-generated responses.
    CloudWatch: Monitors application performance and security logs.

Outcomes

70%
Improvement in data processing efficiency
60%
Reduction in manual effort for data mapping
80%
Enhancement in decision-making speed