<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Microsoft VibeVoice on The Coders Blog</title><link>https://thecodersblog.com/tag/microsoft-vibevoice/</link><description>Recent content in Microsoft VibeVoice on The Coders Blog</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Tue, 28 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://thecodersblog.com/tag/microsoft-vibevoice/index.xml" rel="self" type="application/rss+xml"/><item><title>Microsoft VibeVoice: Open-Source Frontier Models for Next-Gen Expressive Long-Form Voice AI</title><link>https://thecodersblog.com/microsoft-vibevoice-open-source-frontier-models-for-next-gen-expressive-long-form-voice-ai/</link><pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate><guid>https://thecodersblog.com/microsoft-vibevoice-open-source-frontier-models-for-next-gen-expressive-long-form-voice-ai/</guid><description>&lt;h2 id="introduction-the-evolving-landscape-of-voice-ai"&gt;Introduction: The Evolving Landscape of Voice AI&lt;/h2&gt;
&lt;p&gt;The demand for natural, expressive, and scalable voice interactions within software applications continues to accelerate. From sophisticated conversational agents to dynamic content creation platforms, the ability to seamlessly generate and recognize human speech is paramount. Traditional Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) systems have historically struggled with the complexities of long-form audio, multi-speaker dynamics, and nuanced emotional expression. These limitations often necessitate laborious post-processing or result in synthetic, unnatural outputs.&lt;/p&gt;</description></item></channel></rss>