This morning, Éduc’alcool, an organization devoted to moderation in alcohol consumption, released the results of a poll it commissioned from polling firm CROP related to how people in Quebec drink.
The results were offered to the media under embargo yesterday, and stories appear today in the Journal de Montréal, La Presse, Métro, Rouge FM, Radio-Canada, CJAD and elsewhere that focus on interesting results outlined in the organization’s press release: that there’s a significant difference in how francophones and non-francophones in Montreal drink. Francophones drink more and more often than the rest of the city.
You won’t find a story about this in the Montreal Gazette, despite how relevant this kind of information is to its readership. It’s not because there wasn’t a journalist to cover it — a story was written about it and was set to be given good play in today’s paper. But I had it killed last night.
Well, actually the city editor is the one who made the call, but I’m the reason why, and it sounds cooler to say “I had it killed” than “I noticed something and brought it up to an authority figure”.
The reason is simple: All that data about how non-francophones in Montreal drink is based on sample sizes of 30-40 people, which is laughably small for any survey. None of the conclusions on the difference between language groups could be taken seriously, and without that data there’s really no story here.
It started with a chart
It started innocently enough. While I was working the copy desk in the evening, I wanted to break some of the data presented in the story into a chart for easier reading, so I asked our assignment editor for a copy of the data the story was based on.
It’s a huge pet peeve of mine to have stories about poll results that convert numbers into prose, to have paragraph after paragraph that reads “X per cent of people said Y, and A per cent said B, while for C it was D per cent.” Not only can much of this information be presented more succinctly in a table, but it’s a lot easier to understand and looks better when it’s done that way.
Trying to convert prose back into data is how you notice when there’s data missing from the story. One of the results didn’t specify if it was referring to francophones in Montreal or in Quebec as a whole, so I wanted to check that against the source material.
The editor forwarded me the survey, and that’s when I noticed something odd. The total sample size of the survey was 2,400 people (it varies by the question), but the sample size for Montreal was only 150.
This didn’t make any sense to me. If you’re doing a survey of Quebec, surely half the people sampled should be in the Montreal area, not 6%. And even if you’re talking about just the city of Montreal proper, where 1.6 million of Quebec’s 8 million people live, it should be closer to 20%. So what gives?
Eventually, I figured out that this isn’t a survey of 2,400 people in Quebec. It’s 16 surveys of 150 people each for 16 of Quebec’s 17 regions (they didn’t do a survey of northern Quebec). Educ’alcool used that to rank the regions by their level of “moderate drinking” and then used that to sell this story to regional media across Quebec. And a quick Google search suggests that it worked.
A sample size of 150 is pretty low to draw conclusions from, though. And it gets even worse when you try to break it down further, which is what CROP and Educ’alcool did when they tried to sell this story to the Gazette and other Montreal media.
The low sample size seemed so incredible to me, particularly since this was a professional polling firm, that I asked the assignment editor to call them and double-check that number. Once they confirmed, I called up the city editor and he quickly agreed that the poll was garbage and the only reasonable thing to do is kill the story before it’s published.
(That’s not a simple decision, by the way. We were past 7pm, the National Post was waiting for this story, and killing it meant I’d have a giant hole to fill in the paper and only a couple of hours to do it.)
Of the stories I link to above, only the Journal de Montréal notes the small sample on the language-related questions. (And even then it incorrectly attributes to “anglophones” data that applies to all non-francophones.)
N is important
Part of the problem is that Educ’alcool didn’t highlight its sample sizes. It’s not mentioned in the press release, and even in the survey results themselves where results are broken down by language, the samples aren’t included. The chart above is one of only two exceptions, well down in the survey. And an inexperienced journalist might not have noticed what “n=” means in this context.
The other part of the problem is that because the survey has a total sample of 2,400, it seems to be much more reliable than it is to someone who isn’t paying close attention.
That’s the danger of breaking down polls. It’s why many national election polls warn about taking results in places like Saskatchewan and Manitoba too seriously, and why they don’t break down Atlantic Canada by province.
I’m disappointed that CROP would provide this kind of data without a big red warning about tiny sample size, and I’m disappointed that more journalists didn’t pick up on the problem.
And I’m disappointed that I can’t trust what seem to be interesting data about the relationships between Montreal’s language groups and alcohol. Maybe someone can step in and do a real survey to give us a more reliable picture.
Good catch!
See, that’s what’s wrong with you old farts in the traditional media. Fact checking, analysis, and journalistic integrity cuts into your CPMs and your click-through rate. You should be giving people what they want: easily digestible horse shit that they don’t have to think about.
Signed,
Bob’s yer Startup
Let’s not overstate the process in place here. It was almost by accident that I stumbled across this problem. There isn’t really any fact-checking in the editing process at most daily newspapers, unless an editor spots something that they think might be wrong.
>>killing it meant I’d have a giant hole to fill in the paper and only a couple of hours to do it<<
You should have put together a quick story on how to deceive with statistics.
Unfortunately I didn’t have a lot of free time on my hands. And I don’t have the power to assign myself stories.