Type to search

A/B testing: we are are only scratching the surface

Technology & Data

A/B testing: we are are only scratching the surface


A/B testing changed the way the industry operates – gone are the days of shrugged shoulder decisions and blind button pushing. According to Mark Sanderson, marketers are yet to see the technique’s full potential.

The market testing that helped give us the Google search we know today is being emulated by industries from hospitality to manufacturing to help better focus their products and services and meet customer needs. So what did Google do?

If you travel back through internet time via the Internet Archive, you can see what Google looked like soon after it first launched, more than 20 years ago.


While the logo is familiar, the look and feel of the website used to be quite different to what it is now. How did Google evolve into the faster-to-load, nicer-to-see, easier-to-read, pages and apps that we use today?

A senior Google employee told me that the search engine kept ahead of the competition via a process of rigorous prototype testing. At the time we spoke, prototypes were tested ‘offline’ by measuring the reactions of hired test subjects to particular features and designs. But soon testing moved ‘online‘ and we all became the subjects of A/B tests.

What is A/B testing?

An A/B test is when a company gives a user access to one of two versions of a website or app:

A) the current version, and

B) the prototype.

The way users interact with the product is measured during testing. Subtle differences in these interactions can illustrate which version is more effective, according to particular criteria. If the prototype is proven superior, it replaces the existing version as the default product.

Google engineers ran their first A/B test in 2000 to determine the optimum number of search results that should be shown per page.


Statistics decide, not managers

Websites and apps have become a constellation of comparisons that collectively evolve systems to an improved state. Every change to an interface or alteration to an algorithm is A/B tested.

Web companies run an astonishing number of tests. In a talk, Microsoft stated that the Bing search engine runs over 1000 a month. So many, in fact, that every time we access an internet site or app, we are likely unwitting subjects of an A/B test. We are rarely aware of the tests because the variations are often subtle.

Companies are able to run so many tests that they have moved to a process known as hill climbing: taking small steps, getting gradually better. This approach has been so successful that it drives the way many companies innovate today.

Teams are charged with the goal of increasing the user measures. If a small tweak tanks, it’s dropped. If it triumphs, it’s launched. The decisions are made by statistics, not managers.

Indeed, advocates of A/B testing stress the importance of ignoring the views of managers, which they call HiPPOs – the ‘Highest Paid Person’s Opinions’. This acronym was coined from tales such as that of Greg Linden, an early Amazon employee. Linden suggested that, just as supermarkets put magazines and snacks by the checkout queue, Amazon should adopt the same approach with its online shopping carts.

He recalls that a “senior vice president was dead set against” the idea, fearing it would discourage people from checking out.

Linden ignored the HiPPO and ran an A/B test. The results showed that Amazon would make more money and not lose customers, so Linden’s idea was launched. A/B tests have proved to be more accurate, faster and less biased than any HiPPO.


A/B testing can’t solve everything

The complicated part of A/B testing is figuring out how to measure users in a way that will yield the insights you need. Tests need to be carefully designed, and continually reviewed.

Do it wrong and you could end up with success in the short term, but failure in the long run. A news site that promotes celebrity tidbits might get the immediate gratification of clicks, but lose loyal readers over time.

There are also limits to what A/B testing can observe. The testing relies on measuring user input, mouse clicks, typing, spoken commands, or taps on a mobile screen. Spotify recently asked if someone has a playlist on in the background and they aren’t interacting with their phone, how can Spotify measure if the user is satisfied? No one currently has an answer.

More from The Conversation: Google hits 20, but will it struggle to become a trillion dollar company like Apple? »

Google on laptop and tablet


Taking A/B testing offline

Despite these risks and limitations, the success of A/B testing pervades all companies with an internet presence. And now this testing is being trialled in the physical world.

A couple of years ago, I met with a company that prints and sends utility bills to customers. They A/B tested different formats of the bill, learning which formats improved the rates of customers paying on time.

Restaurants and bars are reportedly using data from sensors to learn which restaurant layout encourages the most sales. For example, if an intimate seating arrangement in the back of a bar attracts people to stay longer, customers in that space are likely to spend more on drinks.

A/B testing could even extend to manufacturing. Slightly different versions of a product could be made on flexible production lines. Production could then be altered if one version of the product was found to sell better than another.

It’s not always a smooth ride, but the power of A/B testing is here to stay.The Conversation


Mark Sanderson, professor of information retrieval, RMIT University

This article is republished from The Conversation under a Creative Commons license. Read the original article.


Further Reading:




Image credit:Cam Morin


You Might also Like

Leave a Comment