The Scientist’s Toolkit: Know your trend

10 02 2010

“Let me introduce you to a radical and highly complex, story-wrecking mathematical insight. Ready? Numbers go up and down.”

Another very educational piece about why stats can go wonky, from the BBC’s Go Figure series. Michael Blastland looks at the fluctuations of teen pregnancy on the Scottish island of Orkney, which, like the Hawthorne effect, shows some of the dangers of making a story out of what we know.

Looking at the annual figures for teenage pregnancy in Orkney, we can see one of the problems with our tendency to make stories from our data: The long-term annual view show the data fluctuating constantly, but there’s not much of an overall trend one way or the other. But if you only look at the figures from, say, 2002 onwards, then you see a peak followed by a clear decline. Obviously, this is due to the heroic actions of health workers on Orkney, taking action to halve the teen pregnancy rate overall between 1994 and 2006.

All this is great, of course, until you review the figures again at the end of 2007, and discover that they’ve cycled right back to their 1994 peak. (Incidentally, the first graph Blastland shows is one of the most beautifully misleading pieces I’ve ever seen. An excellent example of how you can torture your data until it confesses to anything).

Here’s the thing: data is always “noisy”. There are hundreds, if not thousands, of factors you simply can’t control or account for at any given time, and they will make the data randomly fluctuate up and down. Teenage pregnancy, for instance, shows a seasonal variation: teenagers are most likely to get pregnant at the end of the school year, probably because they’re having sex more on account of the warm weather and lack of schoolwork. If you only look at a short period of time, it’s easy to be convinced that the data show an overall upward or downward trend… but you’ve really got to take the long view to make sure that this isn’t simply random variation, or “noise”. The more data you have, the less vulnerable your data are to random fluctuations – take a look at the line representing Scotland, for instance, which shows some minor variations but is much more flat overall. (We call this the law of large numbers.)

If you really think your data (teenage pregnancies, sales, salaries) are showing an overall trend… make sure you’re taking a long view. Are there seasonal fluctuations you haven’t taken into account? Anomalous weather? What was happening in the economy at the time – are you comparing it to the right things? These things matter.





The Scientist’s Toolkit: Check your prejudices.

2 02 2010

Some things make me sad. Some things make me angry. This particular article makes me both, but in all fairness, Aaron Sell’s anger is both more justified and more righteous.

For those of you who have missed the blog kerfuffle, Aaron Sell, a psychologist for the Centre for Evolutionary Psychology, recently published an article studying aggression and suggesting that individuals who perceive themselves to be stronger, or more attractive, are more likely to behave aggressively. This research was picked up and published by the Sunday Times as an article titled, “Blonde women born to be warrior princesses“.

It’s hard to know where to start with all the things that are wrong with this.  Sell’s research did not refer to blondes at all. Sell details, in his subsequent angry letter to the Times, how the journalist, John Harlow, told him he was writing a piece about blondes, and asked him whether blondes exhibited more anger. Sell pointed out that his work didn’t look at hair colour at all, but agreed to re-analyse the data on this basis. He found no link between hair colour, entitlement and aggressive behaviour, and told Harlow so. Harlow’s article subsequently appeared, not only claiming that “blondes are more aggressive and more determined to get their own way”, but attributing some completely outrageous and utterly fabricated quotes directly to Sell. “This is southern California – the natural habitat of the privileged blonde”?

I’d really like to believe that this was a one-off, but it’s hard to. It’s clear that Harlow had the story already written in his mind, and chose not to let the lack of actual facts get in his way. There’s been some online coverage of this egregious example of reporting (try here and here) and some discussion of the role of a responsible press in not totally fabricating stories and quotes from whole cloth in defiance of evidence (can you tell this bothers me?).  But I actually think the real lesson is slightly different.

Newspapers, on the whole, find it far more convenient to tell us what we already believe – changing people’s minds is time-consuming, difficult, and they don’t like it much. We’re all disposed to seek out and overvalue information that confirm the beliefs we already have  (confirmation bias) – some nifty studies have been done on the phenomenon. Harlow’s study panders shamelessly to our prejudices and our stereotypes. It’s a bit controversial, but not so much so that we can’t secretly, lazily, accept it as true because it ties in with some of our other social shortcuts. This is why we do science; because we can’t fully trust our brains to evaluate evidence effectively when we already have beliefs on a topic. We will always be inclined to seek out and accept the information that confirms what we already believe – it’s so much easier than re-evaluating those beliefs.

I don’t know about all of you, but when I’m reading the paper from now on, I’m going to very carefully evaluate any story reporting a study on how it plays to my prejudices. Because if it does, I need to be extra, extra careful before I accept any part of it. And since the Times has refused to print Aaron Sell’s letter, or alter or remove the original article, please help make it up to him by reading his excellent original research.





The Scientist’s Toolkit: Understanding the numbers

21 11 2009

Let’s say you’re reading a newspaper over the weekend. Let’s say you spot a front-page headline in this newspaper, all direly big, that says something along the lines of, “EATING YOGHURT DOUBLES YOUR RISK OF BRAIN CANCER!”

Assuming you pay attention, and go on to read the article, should you immediately stop eating yoghurt? After all, “doubling” is an awful lot. But when you read the small print in this kind of article, you’re likely to find out that 1) the baseline risk (i.e. the number of people, out of 1,000, who will get this illness in their lifetime) is extremely low; and 2) the correlation between eating yoghurt and brain cancer adds up to a slightly higher, but still extremely low risk. Let’s say that the number of people who will typically get brain cancer is something like 0.25 per thousand, or one person per four thousand. In the yoghurt-eating contingent, it is found that 0.5 people per thousand will go on to develop brain cancer, or one in two thousand – basically, one extra person per four thousand yoghurt eaters. The newspapers are perfectly entitled to report this as “RISK DOUBLES!”, and usually do.

Now, chances are that you didn’t make a vow to stay away from yoghurt when you read this article, because you’ve read too many like it, and possibly even muttered something about damned lies and statistics before you turned the page. That’s a shame, because we need statistics. Yes, they can be represented all kinds of ways, and some of those ways are more informative and useful than others, but it is statistics that we turn to when we need to know if a study or a programme worked, or whether crime rates really have changed, or if we should start excluding yoghurt from our diets. You need the toolkit to go up close and understand what the numbers are telling you.

There are some excellent resources online , for starters, try the Open University’s Statistics and the Media. And I recommend it so frequently, I feel like a broken record, but pick up Ben Goldacre’s Bad Science too – hilarious, fun to read, and the best simple primer on how to read, understand and criticise a science study that I’ve ever read.