The Scientist’s Toolkit: Know your trend

10 02 2010

“Let me introduce you to a radical and highly complex, story-wrecking mathematical insight. Ready? Numbers go up and down.”

Another very educational piece about why stats can go wonky, from the BBC’s Go Figure series. Michael Blastland looks at the fluctuations of teen pregnancy on the Scottish island of Orkney, which, like the Hawthorne effect, shows some of the dangers of making a story out of what we know.

Looking at the annual figures for teenage pregnancy in Orkney, we can see one of the problems with our tendency to make stories from our data: The long-term annual view show the data fluctuating constantly, but there’s not much of an overall trend one way or the other. But if you only look at the figures from, say, 2002 onwards, then you see a peak followed by a clear decline. Obviously, this is due to the heroic actions of health workers on Orkney, taking action to halve the teen pregnancy rate overall between 1994 and 2006.

All this is great, of course, until you review the figures again at the end of 2007, and discover that they’ve cycled right back to their 1994 peak. (Incidentally, the first graph Blastland shows is one of the most beautifully misleading pieces I’ve ever seen. An excellent example of how you can torture your data until it confesses to anything).

Here’s the thing: data is always “noisy”. There are hundreds, if not thousands, of factors you simply can’t control or account for at any given time, and they will make the data randomly fluctuate up and down. Teenage pregnancy, for instance, shows a seasonal variation: teenagers are most likely to get pregnant at the end of the school year, probably because they’re having sex more on account of the warm weather and lack of schoolwork. If you only look at a short period of time, it’s easy to be convinced that the data show an overall upward or downward trend… but you’ve really got to take the long view to make sure that this isn’t simply random variation, or “noise”. The more data you have, the less vulnerable your data are to random fluctuations – take a look at the line representing Scotland, for instance, which shows some minor variations but is much more flat overall. (We call this the law of large numbers.)

If you really think your data (teenage pregnancies, sales, salaries) are showing an overall trend… make sure you’re taking a long view. Are there seasonal fluctuations you haven’t taken into account? Anomalous weather? What was happening in the economy at the time – are you comparing it to the right things? These things matter.

The Scientist’s Toolkit: Understanding the numbers

21 11 2009

Let’s say you’re reading a newspaper over the weekend. Let’s say you spot a front-page headline in this newspaper, all direly big, that says something along the lines of, “EATING YOGHURT DOUBLES YOUR RISK OF BRAIN CANCER!”

Assuming you pay attention, and go on to read the article, should you immediately stop eating yoghurt? After all, “doubling” is an awful lot. But when you read the small print in this kind of article, you’re likely to find out that 1) the baseline risk (i.e. the number of people, out of 1,000, who will get this illness in their lifetime) is extremely low; and 2) the correlation between eating yoghurt and brain cancer adds up to a slightly higher, but still extremely low risk. Let’s say that the number of people who will typically get brain cancer is something like 0.25 per thousand, or one person per four thousand. In the yoghurt-eating contingent, it is found that 0.5 people per thousand will go on to develop brain cancer, or one in two thousand – basically, one extra person per four thousand yoghurt eaters. The newspapers are perfectly entitled to report this as “RISK DOUBLES!”, and usually do.

Now, chances are that you didn’t make a vow to stay away from yoghurt when you read this article, because you’ve read too many like it, and possibly even muttered something about damned lies and statistics before you turned the page. That’s a shame, because we need statistics. Yes, they can be represented all kinds of ways, and some of those ways are more informative and useful than others, but it is statistics that we turn to when we need to know if a study or a programme worked, or whether crime rates really have changed, or if we should start excluding yoghurt from our diets. You need the toolkit to go up close and understand what the numbers are telling you.

There are some excellent resources online , for starters, try the Open University’s Statistics and the Media. And I recommend it so frequently, I feel like a broken record, but pick up Ben Goldacre’s Bad Science too – hilarious, fun to read, and the best simple primer on how to read, understand and criticise a science study that I’ve ever read.