
OpenAI and Microsoft Introduce New Voice Models
OpenAI and Microsoft Corp. today unveiled two artificial intelligence models optimized for speech generation.
OpenAI's new algorithm, gpt-realtime, is described as its most advanced voice model. The AI produces more natural-sounding speech than previous models and can change tone and language mid-sentence. According to OpenAI, gpt-realtime excels at following instructions, allowing developers to customize it for specific tasks.
A software team building a technical support assistant could instruct gpt-realtime to cite knowledge base articles in certain responses. Developers applying the model to technical support use cases also have access to a new image upload tool, enabling users to upload screenshots of malfunctioning applications for troubleshooting.
Developers can access gpt-realtime through the OpenAI Realtime API, which allows interaction with the ChatGPT developer's voice and multimodal models. As part of today's product update, OpenAI moved the API into general availability with several new features.
Microsoft detailed a voice AI model called MAI-Voice-1, initially available in its Microsoft Copilot assistant. The model powers features that enable the assistant to summarize updates like weather forecasts and generate podcasts from text.
Microsoft claims MAI-Voice-1 is one of the industry's most hardware-efficient voice models, capable of generating one minute of audio in under a second using a single graphics processing unit. The company also introduced MAI-1-preview, a second new AI model trained using 15,000 Nvidia H100 accelerators.