OpenAI DevDay 2024: Realtime API, Vision Fine-Tuning & New Tools

OpenAI introduced the public beta of its Realtime API, allowing for low-latency, AI-generated voice responses with six distinct voices.
OpenAI also launched a vision fine-tuning feature, enabling developers to improve the performance of GPT-4o for tasks involving visual understanding.

Amid a week of executive changes and fundraising news, OpenAI made headlines at its 2024 DevDay, unveiling new tools for AI app developers. The highlight is the public beta of the "Realtime API," which allows for low-latency, AI-generated voice responses. While not identical to ChatGPT’s Advanced Voice Mode, the Realtime API offers six unique voices for developers to integrate into their applications.

In a demo, the API was shown powering a trip-planning app that provided real-time, speech-to-speech responses, and mapped restaurant locations based on user input.

Chief Product Officer Kevin Weil assured that the recent exits of CTO Mira Murati and Chief Research Officer Bob McGrew wouldn’t impact the company’s progress.

“Bob and Mira have been awesome leaders... but we’re not going to slow down,” said Weil.

Additionally, OpenAI announced vision fine-tuning for GPT-4o, allowing developers to improve performance on tasks requiring visual understanding. Developers can now fine-tune smaller models using larger ones, thanks to a new model distillation feature that promises cost savings.

The Realtime API can integrate with tools like Twilio for tasks like placing food orders, though it’s up to developers to disclose AI involvement during phone interactions—a requirement under new California laws. While the much-anticipated GPT Store remains unreleased, DevDay continues to position OpenAI as a key player in a competitive AI landscape, despite fierce competition from Meta and Google.

No new AI models were announced at this year’s event, with developers eagerly awaiting further updates on OpenAI o1 and video generation model Sora.

Edited by Harshajit Sarmah