‘Job figures definitely up or down’ shock!
HOW NOT to be bamboozled by statistics, and how to avoid bamboozling readers, was the topic of September's London Freelance Branch meeting. Our speaker was mathematician David Schley of the Pirbright Institute (formerly the Institute for Animal Health), a British Science Association Media Fellow. He said that "scientists talking to journalists have a lot in common  a common interest in finding out the truth." And a lot of what David does is ask, "Is this really a story? Maybe people who write this stuff up know it's a nonstory."

David Schley deconstructs a graph of the rise in Portuguese debt, with its dodgy intervals between dates along the bottom axis
Image © Mike Holderness

Laughter followed when David displayed his first example of a nonstory  a BBC piece referring to the "49 per cent who get less than the national average broadband speed." Well, yes, it's an average! So you'd expect about half of the sample to fall below it, wouldn't you?
To be more accurate, that average is a mean. David explained the difference between a mean and a median (to someone in his job, that's like having to explain breathing).
The mean salary of the people present would be the total amount they all, earn divided by the number of people in the room. But if "Bill Gates walks in, our [mean] salary would rocket.
" To work out the other common average, the median, "you line everybody up" in order from highest to lowest earnings, and the median is "who's in the middle... Bill Gates walks in, it doesn't affect it. " David notes that " Mean salaries have gone up in last few years, median salaries probably have not. " (As freelances, of course, it's earnings not salaries, but we got the point.)
Another perennial nonstory is three children in a class at school being born on the same day, an "astonishing 48 million to one" chance, according to one newspaper. That would be 1 X 365 X 365 X 365 = 48,62,125. But, as David pointed out, that's the odds of three different people being born on one particular date. The odds on three being born on the same day, any day in the year, is one in 133,225. There are 167,000 births in the country per year. So it's hardly surprising at all.
Next year, says David, there might well be a story about three children in the same school class or family with the same birthday, which "beats odds of 133,000". If there's a story like this two years in a row, it may be not a story. But the statistical probabilities around births  and especially deaths  are serious business, and come up in court for cot death inquests.

ResponseCard (aka RFLCD) voting gizmo used by LFB members to indicate their guess at the average person's weekly tax bill for the NHS budget
Image © Mike Holderness

The "interactive bit" followed, with David handing out voting gizmos that display "votes" on a screen (they're called ResponseCards or RFLCDs). He invited LFB members to use the gizmos to pick from a range of options, all the way from £1 million to £100 trillion, the total budget of the NHS. The majority went for £10 billion. But when David put the same question as "How much per person per year goes towards NHS budget", most people thought it was £1600. When we were asked to guess the "NHS spend per person per week", with choices from 3p a week via £30 a week to £300 a week, most went for £30 and got it right. The total NHS budget was £108.9 billion in 201213.
Most people can ask themselves, "How much do we earn in a week, how much do taxpayers pay in a week? Can we really afford to spend £300 a week, with 35 million working people in UK? " There's realistically no way "everyone paying tax can really be paying £300 a week for the NHS, is there? " And there's no way we could be paying 3p in taxes for the NHS each per week, could we? If we were, what top secret projects would all those other tax receipts get diverted to instead?
David's point was that "as we broke it down to per person per week, more and more of us came to the right answer, once it broken down to a human scale  if you can get a human scale on the numbers, it makes it more sense for your readers. "
Then there's "Daily fryup boosts cancer risk by 20 per cent" in the Express  but it "doesn't tell you what the absolute risk is... 20 per cent more likely than what? Relative to what? "
As David explains, "Five strict veggies per 1000 die in a lifetime from pancreatic cancer. How many regular baconeaters die per 1000 from pancreatic cancer? Six. I don't know about you, but it's not enough to make me give up bacon. But the 20 per cent sounds "freaky and dangerous." There's a different between relative risk and absolute risk.
And then there's "biggest debt currently owed to the Student Loans Company is £66,150, the BBC has learned". A nice opening sentence, but David invited us to "look at the top ten student debts: when you get down to the eighth, the debt is down to £56,000, already a grand lower". That is "not put in context in the article either. This story doesn't really tell you about student debt. " The article is, says David, "factually correct  but misleading. " A more accurate, but more boring headline, would be "Most students who finish university owe £27k".
And David deconstructed another BBC news website headline: "'Worrying' jobless rise needs urgent action  Labour" from August 2011. This said that "The number of people out of work rose by 38,000 to 2.49 million in the three months to June, official figures show. " Well, no. Office of National Statistics (ONS) numbers are compiled by sampling of some job centres, they couldn't possibly interview everyone in the country, so their figures go out with a "confidence interval", a spread of figures between which the real number definitely lies, with the statisticians 95 per cent confident of this (95 per cent being a standard confidence interval.)
The ONS release on which the BBC piece was based rather boringly gave a figure of 38,000 out of work, with a confidence interval of 87,000. So the ONS is 95 per cent confident that the number of unemployed either rose by somewhere fewer than 125,000 people or fell by anything up to 49,000. The actual number, says David, lies "somewhere between these two values: we don't know whether it's gone up or down. The data doesn't match what Labour say... 'Job figures have definitely gone up or down'  is not much of a story." David admits: "most of what I do is destroying stories. "
Confidence intervals are like a health warning  "with confidence you can say the number is somewhere between here and here. "Always ask the statisticians for the confidence interval," advises David: "never accept a single figure for estimates." Always ask if you can, "where has your data come from? How reliable is it? How much have they had to extrapolate?" Ask those releasing the data to "be honest about how much uncertainty" there is.
Another case in point is the Mail Online's piece on the "cost of Britain's sick note culture" which showed an increase in sickies. But they had "cherrypicked" the data: the increase at the end of the period covered by the statistics came after a very big drop. And the figures on which the story was based originated in an ONS news release titled "sickness absence dropped in recession but then rose again. "

Cherrypicked data from a graph on the fall and then rise in sickies over time
Image © Mike Holderness

Other sins in statistics reporting include "ignoring inherent variability". What's that? A good surgeon who loses four patients in a year will lose one the next year, seven the next year... there's going to be that range.
And then there's "neglecting regression to the mean". That means that "extreme events are unlikely to happen two years running" In an area covered by a speed camera, you may have "four deaths one year, then 20 in one year" and then four deaths again the year after that. It would be wrong to claim that "putting in a speed cameras saved all those lives". Any "40 per cent drop in deaths due to speed cameras" is really down to "regression to the mean... an extreme event is not repeated. "
© Matt Salusbury
