Last year I narrated a fascinating book for MIT Press that explores various applications of machine learning in creative work, The Artist in the Machine: Inside the World of AI-Powered Creativity, by Arthur I. Miller. The author discusses the nature of creativity and how we know it when we see it, but then goes on to give numerous examples of how computers are now composing original music, generating expressive visual art, even writing surprisingly interesting poetry, prose, and plays.
One thing that wasn’t included in Miller’s book was the fact that machine learning and AI are now being used to create synthetic models of the human voice, too. Incredibly good ones.
How good? Good enough that yours truly completely failed to select the human voice during recent demonstrations of technology from VocaliD and Scribe Audio. And these ears have been trained for the past fifteen years to pick up on all sorts of incredibly subtle nuances of speech!
Unlike previous generations of text-to-speech that were merely reassembling little recorded slices from a large bank of various phonemes to form words and phrases, this new technology is actually generating audio from scratch. Essentially, it does this by analyzing many hours of recording and creating an intricate set of rules regarding all of the little nuances of a person’s voice tone and texture, patterns of speech, and emotional voice modulation.
Some of the potential pitfalls of this sort of technology have come up in the news recently thanks to how it was (mis)used to deepfake Anthony Bourdain posthumously and to narrate many thousands of TikTok videos in my pal Bev Standing’s voice without her knowledge.
These two high profile cases are just the beginning, however. The quality and availability of such technology is increasing rapidly. It will not only radically alter the voiceover industry that I know and love but it will also pose some unique challenges for us socially, politically, and psychologically. The comfort and assurance we get from voices we know and trust are wired deeply into our psyche- what happens when such voices can be operated by another?
In order to help explore this wild frontier and advocate for the interests of our fellow voice talent, several of my colleagues and I are currently part of a study group within the Open Voice Network. OVON is an arm of the Linux Foundation “dedicated to the communal development and adoption of industry standards and usage guidelines, development, and documentation of voice-centric value propositions, and education and advocacy initiatives.” We are helping to produce a white paper that will help inform policy makers and executives alike of the myriad impacts of synthetic voice to consider as they decide how to implement and regulate the use of these technologies.
I have plenty more thoughts but that’s probably enough for now. This may be the first time you’ve heard me speak on this subject, but I’m quite sure it won’t be the last!
What do I do when visions of the AI voice apocalypse start to get me down? Get out there and take a ride! Whether on two wheels or four legs, we have had some really fabulous summer adventures on the roads and trails around Piedmont the past few weeks, specifically around Torino and Asti. Many, many thanks so my kind and generous father-in-law who surprised me on my birthday this year by shipping my beloved motorcycle, Herr von Grau, to Italy for me!!! Total game changer as we take our explorations of our new home further afield.
Speaking of Italian explorations, today I want to celebrate the launch of a new project by my incredibly talented wife, Ashlinn Romagnoli, called Utopia: An Italian Study. It’s a series consisting of written essays with corresponding podcasts detailing some of our more interesting experiences here in Italy. Or as she puts it:
“A somewhat futile attempt to make sense of life in Italy by pinning bizarre happenings down like butterflies* for observation and further study.
*No butterflies were harmed during the making of this series.”
I’m so, so proud of her. I love the how she sees the world around her in such an insightful and entertaining way. And honestly, after listening to the quality of narration in the first installment of her podcast this morning, I am quite convinced that the student has surpassed her teacher! Felt like I was listening to NPR….