If 100% is perfection, then our intuition would tell us that 90%, 99% and surely 99,9% must be very, very good. Right?
The world has been abuzz with percentages the last couple of weeks. Three candidate COVID-19 vaccines reached the end of their phase 3 trials, and their makers announced efficacies between 70% and 95%. This percentage expresses the percentage reduction in the disease incidence in the vaccinated group compared to the unvaccinated group. 100% efficacy means nobody who has been give the vaccine will contract the disease against which provides protection; 90% efficacy means that vaccination cuts your chance of catching the diseases to 1/10th of what it would otherwise be. Even 70% would cut it to less than 1/3. Vaccines like these can bring COVID-19’s effective reproduction number, estimated at around 2.5, down to well under 1, and hence smother the pandemic.
Elsewhere too, we intuitively tend to compare percentages with the perfection of 100%. And numbers that are sufficiently close to it easily evoke ‘near perfection’. But is 70% or even 95% efficacy always so good?
As a young graduate, I was involved in speech recognition research. Our prototypes at the time could recognize around 70% of words and phrases by a given speaker (on whose voice the system was trained). Pretty staggering back then (it was a while ago). But would that have been impressed ordinary users? Imagine you are a customer of a company, and you ring them up with a query. You get to speak to an automated system that understands 70% of what you say. It would probably take less than a minute before you’re ready to throw the phone out of the window in frustration. To a developer comparing their system with a machine that doesn’t understand any speech at all, a 70% recognition rate is spectacular. To a user who compares it with a human operator, it is lamentable.
Technology has since moved on and there are many speech-to-text applications, such as the close captioning of YouTube videos and of online meeting applications. In 2017 Google claimed to be able to understand human language with an accuracy of 95%, comparable to human speech recognition. But when, in 2019, the 5% where it gets it wrong includes rendering “we’re gonna go check out their booths” as, er, something else, maybe we’re not quite there yet.
I am also a Spotify user – an occasional one, not a frequent one, and here is why. The service boasts access to 60 million tracks. That is a lot. If I were to listen to music non-stop for 12 hours per day, it would take me about 700 years to listen to them all and never hear the same track twice. But does that matter to me? I am much more interested in how much of the music I actually want to listen to is in fact available on Spotify. Say I listen to it 1.5 hours a day – that’s about 30 tracks. And every day I find there is one track I want to hear that they do not have – perhaps Tanzmusik from Kraftwerk’s Ralf und Florian album, or Tranquillizer by Jan Akkerman from the album Eli (with Kaz Lux). Spotify could still claim it satisfies my demands with more than 96% efficacy. But from my perspective, 100% of the days, Spotify fails to provide me with all the music I want to hear – and I will need to go elsewhere to get that demand met. That is frustratingly distant from my expectation, and puts me off using it, and recommending it to others. I remember the tracks it doesn’t have much more vividly, than the ones it serves without a problem.
For suppliers of goods or services, percentages can be a comforting measure, if not a performance indicator, especially if they appear not to be too far from perfection. 95%? Pat on the back! 98%? Crack open the bubbly! But that is false comfort, because it is not how the customers of those goods or services see things. For many of the things we use regularly, we expect far better than 95%, 99% or even 99.9% efficacy.
It doesn’t matter that 97% of all lines on offer by a fashion retailer are on the racks and on the shelves, if the item that we are looking for is out of stock. It doesn’t matter that 98.3% of the trains are running, if the train that we need to take to go to that job interview is cancelled. It doesn’t matter that 95% of calls to our bank are answered within 30 seconds, if we have been listening to Greensleeves for eight minutes and counting.
It is the consumer who, ultimately, is the arbiter of what is, and isn’t good enough. And that is something we cannot gauge through flattering efficacy percentages.