As OpenAI continues to reshuffle its leadership yet again, it is now working hard to reassure developers that it remains the top platform for building AI applications. This year’s DevDay was focused on elaborating and improving its existing AI suite.
The event marked a shift in its focus towards empowering developers and highlighting community success stories. This strategic change reflects the growing competition in the AI landscape, where OpenAI is striving to maintain its leading position. And boy are they working hard!
OpenAI introduced four groundbreaking innovations at this recent event in San Francisco - Vision Fine-Tuning, Realtime API, Model Distillation, and Prompt Caching. These new developments are all about equipping the developers with the best and keeping pace with the mainstream.
1. Realtime API
OpenAI has introduced a groundbreaking new feature: the Real-time API. This API enables developers to create voice apps that directly process audio input and output, eliminating the need for separate transcription, inference, and text-to-speech components.
The API offers direct audio processing, low latency, and support for both audio and text input. It seamlessly integrates with GPT-4, enabling complex interactions and function calling. Developers can define custom functions within their apps and trigger them through voice commands, opening up possibilities for a wide range of tasks from ordering pizza to diagnosing car issues. As the event promised to introduce images and videos to this function too - we have some really exciting things to look forward to.
The Real-time API has the potential to revolutionize various industries, from customer support to personalized fitness coaching. While it offers significant advantages, it comes with a higher cost compared to text-based interactions due to the increased processing requirements.
“Whenever we design products, we essentially look at like both startups and enterprises,” Godement explained. “And so in the alpha, we have a bunch of enterprises using the APIs, the new models of the new products as well.”
Companies like Healthify and Speak have already integrated the Real-time API into their products, demonstrating its potential to enhance user experiences.
2. Vision Fine-tuning
Another significant advancement in OpenAI API - vision fine-tuning. This feature allows developers to use images, in addition to text, to fine-tune their GPT-4 applications. This enhancement should theoretically improve GPT-4's performance in tasks involving visual understanding.
Grab, a prominent Southeast Asian company, has already successfully employed vision fine-tuning to enhance its mapping services. By using just 100 examples, Grab achieved notable improvements in lane count accuracy and speed limit sign localization.
To utilize vision fine-tuning, developers will need to host images online and reference them within their system messages. OpenAI has set a reasonable price for this feature.
It's important to note that developers cannot upload copyrighted imagery, images depicting violence, or content that violates OpenAI's safety policies.
OpenAI's introduction of vision fine-tuning aligns with the competitive landscape in the AI model licensing space. Its prompt caching feature, similar to Anthropic's, offers developers cost savings and improved latency. OpenAI claims a 50% cost reduction, while Anthropic promises a more substantial 90% discount.
3. Prompt Caching
Prompt caching - a feature that allows developers to cache frequently used context between API calls, reducing costs and improving latency. Olivier Godement, OpenAI's head of product, highlighted the significant cost reduction achieved through this feature, emphasizing its efficiency compared to other technologies.
While prompt caching was initially introduced by Google earlier this year - in Google I/O. anthropic following the lead with prompt caching support for the various claude models, OpenAI's implementation offers a unique advantage - the price model. The price of cache input tokens is halved. In contrast, Google charges a fee for caching.
“We’ve been pretty busy,” said Olivier Godement, OpenAI’s head of product for the platform, at a small press conference at the company’s San Francisco headquarters kicking off the developer conference. “Just two years ago, GPT-3 was winning. Now, we’ve reduced [those] costs by almost 1000x. I was trying to come up with an example of technologies who reduced their costs by almost 1000x in two years—and I cannot come up with an example.”
4. Model Distillation
This last and perhaps one of the most interesting updates is model distillation. So this is where companies create smaller, faster, and more cost-effective versions of large language models. By distilling a larger model, developers can create smaller models that retain a significant portion of the original model's capabilities.
Meta recently demonstrated the effectiveness of model distillation by releasing 1B and 3B versions of Llama. These models were distilled from the larger 8B and 70B versions. While these smaller models may not be as powerful as the larger ones - they offer a viable option for those seeking more efficient and affordable solutions.
The distillation process involves fine-tuning a smaller model using outputs and completions from a larger model. This allows the smaller model to learn from the larger model's knowledge and improve its performance.
By exploring different distillation paths, including stored completion, evaluations, and fine-tuning - developers can create smaller models that are tailored to specific needs and constraints.
In The End
While several anticipators were looking forward to any update from OpenAI and Sora, it seems that the wait will be a little longer. While there are no new upcoming AI models - the things have significantly paced up for the developers.