The Scientist’s Toolkit: Know your trend

10 02 2010

“Let me introduce you to a radical and highly complex, story-wrecking mathematical insight. Ready? Numbers go up and down.”

Another very educational piece about why stats can go wonky, from the BBC’s Go Figure series. Michael Blastland looks at the fluctuations of teen pregnancy on the Scottish island of Orkney, which, like the Hawthorne effect, shows some of the dangers of making a story out of what we know.

Looking at the annual figures for teenage pregnancy in Orkney, we can see one of the problems with our tendency to make stories from our data: The long-term annual view show the data fluctuating constantly, but there’s not much of an overall trend one way or the other. But if you only look at the figures from, say, 2002 onwards, then you see a peak followed by a clear decline. Obviously, this is due to the heroic actions of health workers on Orkney, taking action to halve the teen pregnancy rate overall between 1994 and 2006.

All this is great, of course, until you review the figures again at the end of 2007, and discover that they’ve cycled right back to their 1994 peak. (Incidentally, the first graph Blastland shows is one of the most beautifully misleading pieces I’ve ever seen. An excellent example of how you can torture your data until it confesses to anything).

Here’s the thing: data is always “noisy”. There are hundreds, if not thousands, of factors you simply can’t control or account for at any given time, and they will make the data randomly fluctuate up and down. Teenage pregnancy, for instance, shows a seasonal variation: teenagers are most likely to get pregnant at the end of the school year, probably because they’re having sex more on account of the warm weather and lack of schoolwork. If you only look at a short period of time, it’s easy to be convinced that the data show an overall upward or downward trend… but you’ve really got to take the long view to make sure that this isn’t simply random variation, or “noise”. The more data you have, the less vulnerable your data are to random fluctuations – take a look at the line representing Scotland, for instance, which shows some minor variations but is much more flat overall. (We call this the law of large numbers.)

If you really think your data (teenage pregnancies, sales, salaries) are showing an overall trend… make sure you’re taking a long view. Are there seasonal fluctuations you haven’t taken into account? Anomalous weather? What was happening in the economy at the time – are you comparing it to the right things? These things matter.

Advertisements

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: