Back

Enhancing Desktop Activity Descriptions for OpenAdapt.AI

Deveshi Dabbawala

November 19, 2024
Table of Content

Business Problem

  • OpenAdapt.AI needs to automate describing recorded desktop activities, such as videos, screenshots, audio, keystrokes, and mouse movements.
  • Manual processes are time-consuming and inefficient, affecting overall productivity and operational efficiency.
  • Improving data usability and efficient information processing are required to streamline workflows.

Solution

GoML is helping OpenAdapt.AI address these challenges by developing a Proof of Concept (POC) using AWS infrastructure to implement AI-driven description automation for recorded desktop activities:

Fetch data to the GoML repository using AWS S3 and EC2 for data storage and computation.

Use Python for scripting and automation.

Conduct comprehensive testing to validate the effectiveness of AI-generated descriptions.

Fine-tune LLM models with AWS Bedrock (Claude V3) to analyze and describe the collected data.

Integrate AI models with OpenAdapt.AI’s existing desktop applications for seamless operation.

Stream data to the API and principal repository for real-time updates.

Architecture

  • Recording Module: Captures user actions (videos, screenshots, audio, keystrokes, mouse movements) on the OpenAdapt.AI platform.
  • Data Handling: Data is passed directly in the API along with the video recording, not stored.
  • API Gateway: Entry point for user interactions, forwarding requests to the EC2 instance.
  • EC2 Instance: Processes incoming data, performs prompt engineering, and uses the AI model for analysis.
  • Prompt Engineering: Formats data for analysis by the fine-tuned LLM model.
  • AWS Bedrock (Claude V3): Fine-tuned LLM model generating rich descriptions and summaries.
  • S3 Bucket: Stores the output HTML files; users can fetch the desired output HTML by entering the file name.
  • Docker and ECR: Manages application and dependencies.
  • GIT: Version control and code management.
  • IAM: Manages user permissions and access controls.

Outcomes

20%
Automation of generated contextually rich descriptions
10%
Enhanced data usability and improved efficiency
40%
More comprehensive user and technical documentation