by Becky Parker Geist
I’ve been hearing recently about companies that specialize in automatic translation of printed text into audio, for use in audio books and other purposes. Many authors would like to do audiobook versions of their books, but have been put off by the cost of the process. I asked my colleague and fellow BAIPA board member Becky Parker Geist to look into this idea. Becky is a “book audiologist” who produces audio books and coaches actors, and who always impresses me with her short, memorable, and beautifully-sounding announcements at our meetings. Here’s her report.
As authors, we choose our words carefully and craft the way we put words together to convey as clearly as possible just exactly what we mean and what we want our readers to receive. There is no automation in the process. Yet we live in a fast-paced technological world that calls out to us to find ways to work faster, cheaper, and more technologically advanced.
Those of us in the self-publishing world face questions involving “tree books”, ebooks, audiobooks, apps, enhanced ebooks, and whatever is coming next. Joel invited me to look into text-to-speech companies like iSpeech.org about their ability to produce computer-generated audiobooks. As a voiceover professional, a large part of my work is creating audiobooks and soundtracks for enhanced ebooks, so I wondered what iSpeech might be bringing to the table.
iSpeech has been around since 2007 and provides a cloud-based speech technology that creates “human quality” text to speech (TTS). They also have a patent pending on a voice cloning technology.
The idea of voice cloning is to be able to clone a voice that can then be applied to any text. So if you want to hear Grandma read to little Billy, this technology would make that possible. Interesting concept, but let’s dig deeper.
The voice cloning benefits are said to include (aside from the Grandma example), being able to revise an audiobook at a later time without having to find the same narrator who created the original, or someone whose voice matches closely enough to work.
iSpeech says that there is demand for this technology in e-learning, and with institutional designers who are looking for a quicker and cheaper alternative to hiring voice actors.
TTS in E-Learning
I found evidence online to support this idea in this blog showing that e-learning programs may benefit from TTS.
An example is Pearson, who experimented with the iSpeech technology and received a ‘good enough’ response to the computerized voices with younger listeners to warrant going ahead with a project.
This technological approach allows them to have an audiobook completed in just hours—instead of weeks—at about one tenth the cost of hiring a narrator. iSpeech provides voices of both genders in most of the 27 languages included in their system.
What about non-fiction, where there may be greater application than in fiction. My experience, however, suggests that non-fiction can be even more dependent on the clarity and emphasis an actor brings to make sense of complex content for listeners.
While applications such as TTS conversion might work well as a fast and inexpensive way to deal with materials an individual needs for study, broader audiences will expect and demand, I believe, the clarity only a human can deliver.
I return to my opening: if we take such care with the crafting of our text, is it in the best interest of the author or the listener to then deliver an audiobook through the voice of a computer?
Checking It Out
Listening to samples of TTS technology is not reassuring. Human voice conveys so much more than just words—words form the skeleton that carries the flesh and blood of feeling and of meaning.
And isn’t it really that—feelings and meanings—that the words are there for in the first place? The flesh requires the skeleton as the feelings require the words. An actor brings life and understanding to the text through means as mysterious as the life force itself.
It may be that iSpeech “opens many opportunities in application development,” and they are working on trying to get more emotion into the computer-generated voices.
I’m uncertain how the computer will know which emotion is called for, however, and when to switch from one to another and how to mix emotions and in what quantities, or even how they will manage the implantation of emotion.
With the growing audiobook market and hundreds of thousands of new titles being published each year in print, is “good enough” good enough to be effective in selling the finished audiobook?
Consider going to the theatre: we will go see Romeo and Juliet more than once, even though we know the story. It is in the telling that the magic happens.
Agnieszka Szarkowska’s study with the blind and visually impaired, reported in The Journal of Specialised Translation (Issue 15, Jan 2011) says that their results agreed with a report stating, “…’listeners prefer natural sounding speech, both in comparing natural speech to synthetic speech and in comparing different synthetic voices” (Cryer and Home 2008: 7). It is worth noting, however, that while the visually impaired viewers in this study find natural speech preferable, many of them would find synthetic speech acceptable.”
I’m certain there are some great applications for TTS programs, especially in light of the fact that about 11 million people have visual impairments and 1.5 million are totally blind in the U.S. alone.
I, personally, remain unconvinced that audiobooks, in general, would profit from this technology. You’ll have to decide for yourself whether your books would profit from this technology.
Are there ways you would use an automatic text-to-speech converter? I’d love to hear them, please let me know in the comments.
Becky Parker Geist is a professional actor, voiceover talent, director, producer, writer, solo-performer, and acting coach. She has toured stages internationally with Chaucer Theatre and served as its Executive Director 1997-2013. After receiving her MFA in Acting from University of Illinois in 1981, she began narrating Talking Books for the Blind for Library of Congress, narrating over 70 titles in two years before moving to the San Francisco Bay Area. She is the founder of Pro Audio Voices, narrating and producing audiobooks full time. Becky is also a self-published author (Game Plan for Educators) and a produced playwright (Joy with Wings: A Daughter’s Tale) and is currently working on a series of children’s books and her first novel. For more info see Pro Audio Voices.
Photo: bigstockphoto.com. Amazon links contain my affiliate code.