Technology & Data

Breaking the behaviour barrier with voice technology – a ways to go

Sérgio Brodsky October 5, 2018

Armed with a clear understanding of the triggers that drive behaviour, good marketers will be able to weave voice tech into their brands in new and innovative ways. By Sérgio Brodsky.

This article is part of Marketing’s special focus on audio and voice in marketing.

At some point in the mid-1500s, upon finishing his Moses sculpture, Michelangelo violently hit the knee of the statue with a hammer, shouting, “Why don’t you speak to me?!” This anecdote documents what was possibly the last time in history that a person voice-commanded an inanimate object expecting a conscious, voluntary response.

Now, technology is trying to make this kind of interaction the new normal. Will it succeed?

Innovation can be broadly divided in two ways, push or pull.

Push: a breakthrough occurs, leading to a new tech being developed and then commercialised, hoping for market relevance and subsequent financial success. For example, Henry Ford’s Model T: “If I asked what they wanted they would’ve said a faster horse.”

Pull: through research, an express market need or unarticulated desire are identified, leading to a new tech development – that is invented or just repurposed – aimed at meeting pre-conceived market projections. For example, every qualitative/quantitative research department and/or agency trying to drive sustained innovation, such as P&G’s Consumer Market Knowledge.

The ‘pull’ success of Nespresso machines is because it understood that simplifying coffee-making did not mean dumbing the experience down. You actually feel sophisticated by injecting a coffee capsule in exchange for a standardised ristretto. In fact, Nespresso’s buyers almost ritualistically invite their guests to make their own coffees and experience the brand.

On the other hand, the great invention that is the Segway didn’t really become the shortcut of congested urban intersections. It’s mainly a toy. A playful means of transport used by tour guides or cops at Venice beach and other liberal places.

Where does voice come in?

The point is, to publicly speak to your voice assistant on a daily commute asking to show your appointments or order food is not yet the norm.

The common behaviour is to indulge on visual based interfaces, a private space where we can be left alone, yet still communicate with the world via text, images, videos and animations.

Voice may gradually replace the digital, visual interfaces and deliver the promise of ‘freeing our hands and sights’ as portrayed in Spike Jonze’s film Her. Yet, we’re all still whispering in this reality and despite property developers designing new homes wired for Google Home and Alexa, marketers still need to find ways to develop many new behaviours to take sales from early adopters to late majorities.

Enter persuasive technology. This is the tech designed to change the attitudes or behaviours of users through persuasion and social influence, but not through coercion. The study of persuasive technology began at Stanford University in the 1990s.

BJ Fogg, a Stanford doctoral student from 1993 to 1997, used methods from experimental psychology to demonstrate that computers can change people’s thoughts and behaviours in predictable ways. His thesis was entitled ‘Charismatic Computers’ and he went on to establish the Persuasive Tech Lab in 1998.

In today’s world, we don’t need scientific data to believe that computing technology influences people. From websites to mobile applications, the evidence is all around us. Persuasive technology is part of our ordinary experience.

Take for example the fuel meters that are installed in hybrid cars like the Toyota Prius or Ford Fusion, informing drivers about their fuel consumption. The system’s real-time feedback acts as personal driving coach on how to maximise fuel efficiency, so the driver learns overtime how to change the way they drive to improve their score.

In the online world, we meet persuasion attempts at every click. Virtually every website has a persuasive purpose: the creators intend to affect user attitudes or behaviours in some way: ‘like and share for your chance to win’, ‘recommend a friend and receive’, ‘enter your email address,’ it goes on.

How can we create new behaviours?

The Fogg Behaviour Model (FBM) shows that three elements must converge at the same moment for a behaviour to occur (if it doesn’t, something is missing):

motivation: to perform the target behaviour,
ability: to perform the target behaviour aided by the interfacing technology, and
trigger: to urge the target behaviour to be performed.

Motivation is something we have in spades. With our full, busy lives anything that brings ease and saves time is certainly a motivator. The success of ecommerce platforms and frictionless technologies speak directly to that.

In terms of ability? “Users don’t yet feel 100% comfortable with utilisation of voice technology, but this is exactly where the opportunity exists”, says Lucio Ribeiro, MD of Online Circle Digital and founder of Habla.com.au, a voice agency based in Melbourne.

“There are two aspects when it comes to the ability of utilisation and mass adoption of voice. The first depends on the development and enhancement of natural language and context delivery, which is now rapidly evolving.

“The second represents a shift from the ‘quantified self’ to the ‘better self’, where voice-enabled technology aides and enhances all aspects of learning, arguably the most important skill in the 21st century. Alexa’s ‘7-Minute Work-Out’ skill is a very popular one but my favourite is ‘Bravo Tango Brain Training’ from National Geographic.

The latter is a skill built in partnership with former air force psychologist and war veteran Dr Michael Valdovinos, supporting war veterans with mental-health advice, acting as a bridge to those seeking help in their homes and reducing the effort required by visual interfaces.”

When it comes to triggers, the current iteration of voice technologies are work in rapid progress. According to active listening skills global authority Oscar Trimboli, author of Deep Listening and host of the podcast with the same name, voice interfaces, “To trigger a behaviour change, will need to evolve from ‘listening to content’ to ‘listening for context’, as demonstrated below:

“Interacting with the technology today, I might request ‘Alexa, Remind me to buy sheets of pasta, minced meat, garlic, onions and tomatoes on Saturday at 11am’.

“Saturday at 11am, the system will remind me to buy them or buy them for me. This is listening for content. To trigger a change, these systems will need to listen to the context as well as the content.

“‘Alexa – I want to make a lasagne for Mary’s birthday next Saturday’. Understanding the context, Alexa should remind me about next Saturday night and explain a Lasagne recipe to me. Unfortunately, whether I remind Alexa for Mary’s birthday lasagne or remind Alexa to buy the ingredients it’s not nuanced enough (yet) to join the dots.”

The above shouldn’t be all surprising as we’re just entering this new era. But, exponential machine learning by Amazon, Google, Apple and Microsoft means adoption triggers will happen in the next decade rather than the next century.

Marketers have been very successful with the large numbers of devices being currently sold. The next step is to create new sales cycles through the services (not just the product) enabled by voice technology. This will happen by a sound (pun intended) understanding of behavioural triggers and our cultural context to manufacture new ‘natural’ actions that we feel (and look!) good about.

Further Reading: