Models
August 29, 2025

Microsoft announces MAI-Voice-1, its first speech generation model

MAI-Voice-1 can currently produce one minute of natural, expressive voice in under a second on a single GPU.

MAI-Voice-1 is the most expressive and natural AI voice generation model yet, designed for efficiency and scale.

Capable of generating a full minute of humanlike audio in less than one second on a single GPU, it pushes the boundaries of real-time speech synthesis. Now live in Copilot Daily and Podcasts, it brings conversations, narration, and storytelling to life with unprecedented clarity and emotion.

Users can also experiment hands-on in Copilot Labs, exploring new ways to create immersive voice experiences. MAI-Voice-1 marks a breakthrough in speed, realism, and accessibility for next-generation AI applications.

#
Microsoft

Read Our Content

See All Blogs
Gen AI

How OpenAI and Amazon Bedrock are building a next generation AI orchestration platform for enterprise AI

Deveshi Dabbawala

March 5, 2026
Read more
LLM Models

Why LLM benchmarking on leaderboards is not enough for enterprise AI

Deveshi Dabbawala

March 3, 2026
Read more