Voice Technology – Challenges and Opportunities


11 minute read
Alumni: Kieron Gurner

Alumni: Kieron Gurner

UX & Design Lead

Digital Insights

Last week the Government Digital Service announced that information about public services can now be accessed through voice-activated smart speakers like the Amazon Echo. The services available include applying for a passport and accessing details about the national minimum wage. This means that people can access over 12,000 pieces of information simply by asking – using voice technology.

As designers and developers of digital services, the team at Calvium thought that we’d get together to explore voice technology interactions and the implications for our clients.

To start, we reminded ourselves of the recent key moments in tech history. Undoubtedly, the launch of the iPhone in 2007 was seminal; it’s defined our relationship with personal technology ever since. Everything from washing machines to car radios have touch screens now (for better or worse) and they’ve been accepted as the mobile interface of choice for the last decade.

Perhaps an equally crucial tech moment happened in 2011 with the launch of Apple’s Siri – the first-voice enabled assistant on a smartphone. Apple was not the first to introduce voice assistance technology but they certainly popularised it, as only Apple can. Today, we’re experiencing an explosion of voice-powered user interfaces (VUI) and a wealth of new ways to think about our interactions with and through digital tech. A reported 9.5 million people in the UK used a smart speaker in 2018: that number is projected to grow by almost one-third in 2019.

While the purpose of VUI is quite simple (letting people speak to their device and accomplish things through command or conversation) and voice interaction has quickly become commonplace, it seems that consumers and manufacturers alike are still figuring out the best ways to use them.

Team Calvium recognise the vast opportunity for voice technology, but we also acknowledge the challenges and limitations of it too – in terms of UX, ethics, functionality and more. Here’s a summary of our ruminations:

Voice Technology has Ambiguity

Ben Clayton, Technical Director, kicked-off the conversation:

“Unlike visual technologies, voice tech is limited by a lack of immediate clarity – it’s not obvious what users can and can’t do with voice-enabled products. Visual screen interfaces are good at revealing the possibilities with buttons, labels and images.”

Designers and users have to update the familiar affordances when voice is the primary mode of interaction.

Ben was on a roll:

“Voice UIs are generally pretty hopeless when there’s the potential for multiple functions – because a user can’t understand those available options in the same way as seeing a list. Should all the options be read out as a list? How can the user review the first option if they’re being spoken? For instance, what if there are thousands of requests you can make? We’ll need to be designing ways to give users only the most relevant options to them, and not overloading people with choice – which is another question entirely.”

What we are seeing at the moment are amazing functions such as sophisticated text-to-speech, speech recognition, machine learning and adaptive VUIs that will underpin voice powered services and experiences. What we’re only just beginning to explore is how the experiences that we currently have with our screens can be accommodated through voice-activated devices. Some interactions won’t be suitable – and we shouldn’t try to force them. After all, what matters is the quality of the experience for the user, as we can see from Ben’s children:

“The only thing my children (aged three and six) use Siri for is to ask questions in the form ‘What’s the biggest/smallest/fastest/best ____ in the world?’. Probably as it’s the only question style that – for them – has given consistently good results.”

Privacy and ethical issues with voice technology

It wasn’t long before the topic of privacy came up, through numerous public data leaks from Facebook last year and changes to the data-protection laws here in Europe, it’s something we’re all becoming more aware of – and rightly so!

Ben Clayton, our Technical Director offered:

“Current voice technologies I have used don’t know who is talking to them, only that someone is.  For example, with Siri, only I can trigger ‘Hey Siri,’ as it’s trained to my voice, but anyone in the family can ask a question once Siri is listening.

“Why does this matter? Well, for one reason permissions. One of my friends’ kids ‘accidentally’ ordered a bunch of Amazon stuff via Alexa, without Alexa needing permission from their parents. This is possibly a useful feature of Alexa but definitely not if anyone within earshot can order things using your credit card!”

“However, if Alexa or Siri or another device can recognise each individual’s voice then personalised recommendations would be much easier. Of course this makes the privacy implications even worse – now Amazon can store voice recordings alongside the actual name of the speaker!”

With every exciting new opportunity provided by new technologies and applications, there will be potential drawbacks and misuses – the unintended (or intended) consequences. These examples are anecdotes which are frustrating to the user and possibly damaging to the trust you place in your digital assistants. When it comes to interactions (with anything) trust is core to delivering a successful service.

Kieron Gurner, our UX & Design Lead, said:

“Smart speaker products are already on the market and are being used by large numbers of consumers – 9.5 million people in the UK (which is roughly 14% of the population).We’ve now seen that after decades of people using the internet, users haven’t been made aware how important and valuable their data is, or how they can protect it from misuse. I hope we can learn from this situation and empower people to be more aware, and in turn, more comfortable with how their voice data is being used – whilst these interfaces are still in their consumer infancy.”

Jo Morrison, Calvium’s Director of Digital Innovation & Research took the smart product theme and expanded it to the smart city location:

“We are already living in an era where networked ‘smart’ technologies are being used to mediate many aspects of our lives, e.g. city services, work and consumption. What concerns me is that the loudest voices have been and still seem to be those who are championing the techno-economic agenda. As we know from the horrible (but actually very useful as a common reference point) Cambridge Analytica/Facebook scandal what we don’t want to see playing out in our homes, places of work, public realm etc is surveillance and manipulation. We can’t opt-out of the smart city. There’s no GDPR rights in the smart home. So, bringing it back to voice input, and echoing Kieron’s point, it seems sensible that people are made aware of the implications of voice technologies, it seems sensible that governments and policy makers gen up swiftly on socio-tech such that they can have conversations on an informed and thus equal footing with the big – and the not so big – tech companies. Voice input has a wealth of positive implications for a multitude of people, but we mustn’t allow the oh so predictable negative consequences to take hold.”

Photo from above, of Alexa on stack of books
Photo by Andres Urena on Unsplash

Voice technology and accessibility

We learned a lot about accessible design and development from our work with UCAN, which has led us towards our new explorations for train station users with invisible impairments. We’ve found that designing for specific needs of a user group can lead to solutions that also appeal to the majority.

Every major shift in technology gives us the opportunity to reimagine how the future will look, and how we want to approach designing for a new medium. Will we replicate what we’ve known before, or find new avenues to explore that were impossible in an existing medium?

Vitor Alves Quintãos, from our software team, gave his thoughts:

“I like thinking that the motive for the development of this kind of technology is to help people with disabilities. How could VUIs make technology, and therefore contemporary life, more accessible for people with motor impairments?”

Almost all interfaces in the past have required some kind of physical interaction. It’s not just digital interfaces under their glass surfaces which need to be touched, but the physical world around us: door handles, push buttons and computer mice are all driven by touch of some kind. For users with limited motor capabilities, these interactions can be difficult – or painful – so implementing options for voice controls when possible makes sure more people can take part. VUIs don’t solve these problems, but open up new ways for a wider variety of people to interact with a system without reliance on touch.

Users with limited vision could also benefit from VUIs, since much of our digital world has been designed for sighted people. Text and images being the primary form of communicating information, which presents challenges to those with visual impairments. VUIs begin to introduce a mode of interaction that steps away from requiring a certain level of vision to engage with that information. Instead of having to find alternative solutions (e.g. adapting text into speech), people who prefer speech can now start from there, with an interface that is designed intuitively around their preferred approach.

The way in which VUIs need to be designed for clarity of communication, unlike a human conversation, could also benefit people with neuro-diversity, by presenting a system with concise and unambiguous options whose intent is less difficult to be interpreted than visual choices in colour or label.

Of course, we also need to consider users that could be hindered by VUIs, such as those with hearing impairments. If a system is entirely designed around voice input and output, with no alternative options available, then segments of society could be excluded from accessing your service (in the UK, around 16% of the population has some form of hearing loss). By ensuring that we use these advances in voice tech as additions, rather than replacements, we can build upon what has been built before and expand the opportunities for access to more people than before.

Societal Impacts of Scaling Voice Technology

As any technology grows its daily usage inevitably affects our behaviour in some way: personally and socially. We spent some time musing on how this might play out with voice in particular.

Kieron Gurner, Calvium’s UX & Design Lead, said:

“One question is: how will voice interfaces be used in public? As audio messaging become more popular, and voice assistants increase in their sophistication  – we’re already seeing more people talking into their devices In public. I’m curious how this will impact the aural landscape of cities, cafés and workplaces – particularly in terms of ambient noise and further down the line, changes to the way humans talk to one another, as behaviours shift.”

In a short space of time, society has already shifted in response to mobile devices. Most people wouldn’t have dreamed to use a phone or tablet in a meeting or at dinner, but ‘digital natives’ interact with one another much more naturally through digital, so this behaviour becomes part of their social behaviour – not an interruption to it. We’re likely to see similar changes in response to voice, but at this stage, we can only speculate.

Vitor Alves Quintãos, one of our Developers, added:

“I wonder if the development of voice technology might lead to a point where people will consider that reading and writing are “things from the past”. I could imagine that people will eventually just stop directly speaking to one another and just listen to recordings instead – conversations would become really slow.”

And so our conversation concluded, for now.


The opportunities for Voice Technology

As the capabilities of voice user interfaces develop and consumers become more comfortable with using the technology for themselves, so do the possibilities of improving virtually every user experience, from shopping to commuting. What’s more, quickly advancing machine learning will make interacting with these interfaces more personalised, offering more relevant answers to common questions, as well as curated choices for music, recipes and many other kinds of content.

There is also the massive benefit of drawing users away from so much screen time, which has shown to have negative impacts on health and sociability. VUIs allows users to access information or perform actions without being as physically dependant on handling their devices.

Understanding the challenges posed by this evolving technology helps us better manage expectations, anticipate issues and help users discover features relevant to their lives.

Enjoyed this? Read our comprehensive guide to mobile technologies.

Calvium circle logo