Google has introduced Gemini Omni, a new family of multimodal AI models designed to generate and edit content across video, audio, images, and text from a unified conversational interface.
The first release, Gemini Omni Flash, allows users to create videos using text prompts, existing videos, images, or audio while editing outputs directly through natural language instructions.
Google says the model combines Gemini reasoning capabilities with advanced media generation to support “anything from any input.” Gemini Omni is rolling out across the Gemini app, Google Flow, and YouTube Shorts as part of Google’s broader push toward agentic and multimodal AI experiences.


.jpg)


