Stuttering towards accessibility

Andy TheyersFounder

The Stack

“Hello, I’m Andy and I have a stammer.”

While this is true, thankfully very few people notice it nowadays. Like many older stammerers I’ve developed complex and often convoluted strategies to avoid triggering it. But still, if you were to put me on stage and ask me to say that sentence we’d be there all week.

Over the last ten years or so as I’ve aged and gained more control over my stammer I’ve not given it much thought, barring politely turning down the occasional invitation to speak in public. Recently though, I’ve been forced to reassess both it and my coping strategies in the light of the rapid increase in voice interfaces for everything from phones to cars. And that’s made accessibility a very personal issue.

Like many stammerers I struggle with the start of my own name, and sounds similar to it. In the world of articulatory phonetics the sounds that trip me up are called “open vowels”. That is, sounds that are generated at the back of the throat with little or no involvement from the lips or tongue. In English that’s words starting with vowels or the letter H. So the first seven words of the sentence “Hello, I’m Andy and I have a stammer” are pretty much guaranteed to stop me in my tracks (unless I’m drunk or singing – coping strategies!).

We recently got an Amazon Echo for the office and wired it up to a bunch of things, including Spotify. Colleagues tell me it’s amazing, but because the only way I can wake it up is by saying “Alexa!” it’s absolutely useless to me.

And it gets worse. Even if a stammerer is usually able to overcome their problem sounds other factors will increase their likelihood of stammering in a particular situation.

One is over-rehearsal, where the brain has time to analyse the sentence, spot the potentially difficult words and start to worry about them, exacerbating the problem. This can be caused by reading aloud – even bedtime stories for the kids (don’t get me started on Harry and Hermione or Hiccup Horrendous Haddock the Third) – but anything where the words are predetermined can be a problem; be that a sales presentation, giving your name as everyone shakes hands as they walk into a meeting, performing lines from a play, making the vows at your wedding, literally anything where you have time to think about what you’re going to say and can’t change the words.

Speech interfaces currently fall firmly into the realm of over-rehearsal. You’re forced to plan carefully what you’re going to say, and then say it. “Alexa! Play The Stutter Rap by Morris Minor and the Majors” (yeah, that was a childhood high point, let me tell you) is a highly structured sentence and despite Alexa’s smarts it’s the only way you’re going to get that track played. So it’s not only a problematic sound, but it’s over-rehearsed… Doubly bad.

The other common trigger for stammering is often loosely defined as social anxiety, but is anywhere where the stammerer is drawing attention to themselves, either from being the focus of an activity (on stage, say) or from disturbing the normal flow of activity around them (for example, by trying to attract someone’s attention across a crowded room).

If I want to talk to the Echo in our office I know that saying “Alexa!” is going to disturb my colleague’s flow and cause them to involuntarily prick up their ears, which brings it right into the category of social anxiety… As well as already being a trigger sound and over-rehearsed… Triply bad.

However good my coping strategies might normally be I can’t use any of them when speaking to Alexa, and speaking to Alexa is exactly when I would normally be employing them all. Even when I’m in the office on my own it’s useless to me, because trigger sound and over-rehearsal is enough to stop me.

And the Echo isn’t alone. There’s “Hey, Siri!”, “Hey, Cortana!”, “OK Google!”, and “Hi TV!”. All of them, in fact. Right now all of the major domestic voice controls use wake words that start with an open vowel. Gee. Thanks everyone.

Google recently announced that 20% of mobile searches use voice rather than text. More than half of iOS users use Siri regularly. Amazon and Microsoft are doubling down on Echo and Cortana, respectively. Tesla are leading the way in automotive, but all the major manufacturers offer some form of voice control for at least some of their models. It makes absolute sense for them to do so – speech is such a natural interface, right? And it’s futuristic – it’s the stuff of Star Trek. Earl Grey, Hot! and all that. But just as screen readers have constantly struggled to keep up with web technologies we’re seeing developers doomed to repeat those same mistakes with voice interfaces, as they leap ahead without consideration for those that can’t use them.

To give some numbers and put this in context there are approximately twice as many stammerers in the UK (1% of the population) as there are registered visually impaired or blind (0.5% of the population). That’s a whole chunk of people. And while colleagues would say that me not being able to choose music for the stereo is a benefit not a drawback, it makes light of the fact that a technology we generally think of as assistive is not a panacea for all.

Currently Siri, Cortana, Samsung TVs and Alexa can only be addressed with sentences that start with an open vowel (Siri, Cortana and Samsung can’t be changed, Alexa can, but only to one of “Alexa”, “Echo” and “Amazon”). Google on Android can thankfully be changed to any phrase the user likes, even if the process is a little convoluted. Interestingly for me, though, is that the Amazon Echo offers no alternative interface at all. It is voice control only, and has to be woken with an open vowel. It is the worst offender.

For me this has been an object lesson in checking my privilege. Yes, I’m short sighted, but contact lenses give me 20/20 vision. I had a bad back for a while, but I was still mobile. This is the first piece of technology that I’ve actually been unable to use. And it’s not a nice experience. As technologists we know that accessibility is important – not just for the impaired but for everyone – yet we rarely feel it. I’m sure feeling it now.

Voice control is still in its infancy. New features and configurations are being introduced all the time. Parsing will get smarter so that wake words can be changed and commands can be more loosely structured. All of these things will improve accessibility for those of us with speech impediments, who are non-verbal, have a throat infection, or are heavily accented.

But we’re not there yet, and right now I’ve got to ask Amazon… Please, let me change Alexa’s name.

I was thinking Jeff?

Back to The Stack