Models
August 29, 2025

Microsoft announces MAI-Voice-1, its first speech generation model

MAI-Voice-1 can currently produce one minute of natural, expressive voice in under a second on a single GPU.

MAI-Voice-1 is the most expressive and natural AI voice generation model yet, designed for efficiency and scale.

Capable of generating a full minute of humanlike audio in less than one second on a single GPU, it pushes the boundaries of real-time speech synthesis. Now live in Copilot Daily and Podcasts, it brings conversations, narration, and storytelling to life with unprecedented clarity and emotion.

Users can also experiment hands-on in Copilot Labs, exploring new ways to create immersive voice experiences. MAI-Voice-1 marks a breakthrough in speed, realism, and accessibility for next-generation AI applications.

#
Microsoft

Read Our Content

See All Blogs
Gen AI

Measuring Generative AI ROI

Cricka Reddy Aileni

October 7, 2025
Read more
AI safety

Decoding White House Executive Order on “Winning the AI Race: America’s AI Action Plan” for Organizations planning to adopt Gen AI

Rishabh Sood

September 24, 2025
Read more