Over the past decade, Spotify has been expanding its selection of audible entertainment by hosting and even buying podcast exclusivity for the platform. Now, in light of generative AI technology, they are planning to leverage OpenAI voice generation to provide language translation in the original hosts’ voices.
Advancements in Online Streaming Entertainment
Like social media and other streaming services, Spotify is one of those companies made possible through the internet. As more people flocked to online leisure activities, sites and apps like YouTube, Netflix and Spotify carved a viable business model out of a rapidly changing entertainment landscape.
They weren’t alone, with iGaming being another industry that developed online. This new era of digital entertainment was necessary for today’s AI innovation, built on a foundation of people watching video, listening to podcasts and engaging in live casino gambling through the internet. Naturally, these business were facilitated by technological progress. Smartphones put the internet in people’s hands and 4G was a game-changer for everybody’s bandwidth speed, enabling livestreaming.
Now, AI is set to change digital entertainment in a big way by generating images, powering smart chatbots and cloning the voices of individuals. It’s the last one that Spotify is pursuing with their new pilot voice translation software, which could cut costs of delivering content at a global scale.
Spotify’s OpenAI Collaboration
Of all the names that have emerged from the rise of generative AI, OpenAI is the biggest. It’s the foremost AI organisation in America, beginning as a non-profit with a for-profit subsidiary coming later. Its backers include the likes of tech visionaries like Sam Altman and Elon Musk, whose funding helped seed the business in 2015. Nearly a decade later, they unveiled (at the time) the world’s most advanced chatbot, personal assistant and search engine all in one – ChatGPT.
In 2022, OpenAI also created Whisper. This is their speech recognition and transcription model which laid the groundwork for the 2023 Spotify collaboration. Just like how image generation had become possible, companies like OpenAI and ElevenLabs were foraying into voice generation too. Using OpenAI’s voice cloning technology, where AI learns from and recreates the voice of an individual, Spotify has created a proprietary Voice Translation tool. This allows for pre-recorded podcasts to be translated using the voice of the host, so they can deliver the podcast in every language.
Ziad Sultan, Spotify’s VP of Personalisation, said: “By matching the creator’s own voice, Voice Translation gives listeners around the world the power to discover and be inspired by new podcasters in a more authentic way than ever before.”
Podcasters – what if I told you could offer your pod to any listener around the world, in their own local language but still keep it in your own voice? That’s the pilot we’re launching @Spotify!
It’s called Voice Translation and using AI, translates podcasts episodes into… pic.twitter.com/kYq0bgxJYq
— Daniel Ek (@eldsjal) September 25, 2023
If made viable, this could replace dubbing and closed captions to deliver podcasts in any country in the world. Spotify are starting small, rolling the feature out to certain podcasts and only with Germanic and Romance languages like German, Spanish and French. At present, supported creators include Lex Fridman, Dax Shepard, Steven Bartlett, Bill Simmons and Monica Padman, with Trevor Noah’s new podcast waiting in the wings.
This new Spotify tool represents a huge step toward for AI-generated entertainment, and one that breaks long-standing language barriers that have segmented the internet. However, translation isn’t as simple as a 1:1 word replacement. Sometimes, human translators must use context, culture and their own discretion to choose the best words that best fit the meaning of the original. With there being so-called untranslatable words out there, only time will tell if AI can tackle niche, long-form podcast discussions in a way that pleases international audiences.
This article was provided to Verge by a third party.