Research: What’s Next for Voice Assistants?

Voice (or virtual) assistants continue to feature strongly within consumer electronics devices and are now penetrating further into the automotive sector. Indeed, just over 2.1 billion products shipped globally with a built-in voice assistant platform in 2022. This represents a modest market expansion of 1 percent, following a period of considerable turbulence during the Covid-19 pandemic, during which many consumers brought forward purchases of TVs, tablets, and laptops, many of which integrated voice technology as standard.

Voice Assistant illustration — Getty Images

The market for products featuring voice technology looks set to maintain an upward trajectory. Futuresource anticipates that shipments could exceed 3.1 billion units in 2026, representing a 2022 to 2026 CAGR of 10 percent. Alongside this expansion, the installed base will rise from 6.6 billion units to around 8.4 billion units worldwide over the same period.

Virtual assistants have existed for over a decade now, and rudimentary voice interfaces go back even further. Yet we’re only around half-way through the potential development lifecycle. Apple introduced Siri to the world in 2011 — arguably the commencement of this era for voice technology — and it’s been almost nine years since Amazon unveiled Alexa.

Intelligence beyond voice

The first platforms were based around “command and control” functions. Over time these voice interfaces steadily acquired improved and extended language support. They also began to use other inputs, including cameras on smart displays, alongside voice biometrics to identify who is talking, taking steps toward multi-modal operation.

Today, the industry is part way though achieving conversational ability. Assistants are usually quiescent until summoned for a task, but platform vendors are working on methods to elicit a conversation with users — something that Futuresource calls “intelligent interjection.” This ability is useful within the automotive segment, where the assistant can proactively converse with the driver at appropriate moments. It is also feasible to achieve in smart speakers, but most acknowledge that this behavior can be disturbing for users when a virtual assistant opens dialog unexpectedly.

Nevertheless, the capability for an assistant to start a conversation is a development that platform vendors are wanting to pursue. The assistant can remind users of things happening in the future, but it also goes further in that it can influence consumer thoughts and behavior, increasing the opportunity to drive engagement. Contextual and situational awareness is the key to success here, therefore assistant platforms will need to build historic context by maintaining knowledge of past conversations. This is a difficult fundamental to pursue given the privacy concerns and regulation that could follow, which would thwart the potential here. But by appealing to a database of previous interactions, the underlying AI in virtual assistants would be able to construct far better reasoning and situational awareness.

The next phase of development is likely to concentrate on extending that multi-modal operation. Combining natural language processing with object recognition in the visual domain, newer devices will be better able to derive that vital missing contextual information. They are also likely to take nuance and voice intonation into account and discern meaning from the natural pauses in human speech. For example, emotion can be derived from inflexion and tone in voice utterances and used to tailor responses.

Diverging Pathways

From the consumer standpoint, voice-enabled devices might appear to be mature products. There is, of course, expectation that constant evolution of the assistant comes via frequent and free software updates. Meanwhile, assistant platform owners are investing millions of dollars in research and development, similarly in training and optimization of language models, so the fact that consumers rarely explore the deeper capabilities of virtual assistants is frustrating. This is giving rise to new thinking about the way these platforms might evolve, and so development pathways are diverging.

A parallel opportunity offers an alternative targeted solution for consumer electronics devices that promises to extend voice usage. Application-specific assistants can run entirely offline and exist purely to handle specific intents in a single domain. They have a limited vocabulary and phraseology so are unsuitable for answering general queries, but instead are trained in competency for specific tasks or “command and control” applications. These types of assistants are expected to become more widespread, since users are more likely to interact verbally with several distinct assistant platforms daily, each of which are skilled in handling different tasks. Futuresource is monitoring advances here and expect further use-cases to emerge. Primary examples include Microsoft’s shift of Cortana to become an office productivity assistant, and last year’s announcement of Sonos Voice for controlling its range of speakers and soundbars.

Consumer Engagement

Consumer usage behavior with virtual assistants had been largely invariable year on year. However, in 2022 there was a slight but definite broadening of engagement. Interaction with entertainment services dominate the top use cases, which is unsurprising given the utility of these devices and their associated ecosystems tailored around audio and visual streaming services. Command and control of the smart home are the broad secondary use cases, given the simplicity of using voice instead of dedicated apps: controlling smart home accessories, thermostats, and also smart lighting all increased in frequency during 2022. But commercial and monetizable activities — booking services, adding items to a shopping cart — have been consistently near the bottom of frequent usage behaviors. This is because there is perception of financial consequence if an assistant misinterprets a request and orders erroneously.

Related: TV and Hi-Res Audio Consumer Trends

The Covid-19 pandemic forced people to spend more time at home, and therefore more time with their voice-enabled devices. The results illustrate that consumers are beginning to explore the capabilities of assistant platforms. Yet commitment is not ubiquitous amongst the population: 28 percent of those surveyed said they do not use any voice assistant platform.

Bits, Qubits, and AI

Going into the new decade — 2030 and beyond — the intersect between the disciplines of machine learning and quantum machines looks likely to deliver a monumental shift in AI compute performance. This is arguably a crucial stage in enabling the “ambient intelligent” world that companies including Amazon, Baidu, and Google are now positioning as the future.

Constructed using entirely new machine-learned architectures, the advancement of virtual assistants offers potential for emotional intelligence, coupled with voices that employ intonation to convey meaning, convincing enough that they will largely be indistinguishable from real human conversations. Google have already demonstrated how the introduction of “um’s” and “ah’s” create verbal interactions that are more appealing to human listeners and are therefore apparently better received.

In becoming more human, virtual assistants will require self-awareness, understanding the situation and environment to quantify when and where voice interjections should occur, and how to handle them. They will need an ability to monitor and respond to both verbal and non-verbal communication, so advanced multi-modality is necessary to deliver this new level of interaction. Future assistants are also likely to have a degree of self-learning, with an ability to improve and expand assistant capabilities without human intervention. Advanced AI could extract unknown concepts in queries, phrase questions that seek qualification, and derive the most likely results. Beyond this, the performance level of quantum AI could create potential for virtual assistants that can form opinions and hypotheses from knowledge. These are currently in the realms of science fiction, yet it is now conceivable that this type of assistant could become science fact within ten years.

Simon Forrest is a principal technology analyst at Futuresource Consulting.

Futuresource Consulting is a market research and consulting company, providing its clients with expertise in professional AV, consumer electronics, education technology, content and entertainment, professional broadcast, and automotive. Combining strong methodologies and unsurpassed data refinement with in-depth market knowledge and forecasting, Futuresource delivers the latest insights and technological developments to drive business decision-making.