Google introduces Gemini Omni for next-generation multimodal AI creation

Google has unveiled Gemini Omni, a new multimodal AI model capable of generating and editing video, audio, images, and text from a single conversational interface.

Google has introduced Gemini Omni, a new family of multimodal AI models designed to generate and edit content across video, audio, images, and text from a unified conversational interface.

The first release, Gemini Omni Flash, allows users to create videos using text prompts, existing videos, images, or audio while editing outputs directly through natural language instructions.

Google says the model combines Gemini reasoning capabilities with advanced media generation to support “anything from any input.” Gemini Omni is rolling out across the Gemini app, Google Flow, and YouTube Shorts as part of Google’s broader push toward agentic and multimodal AI experiences.

‍

Google