Overcoming Bias in Crowdsourcing

Aaron Naparstek reports

that "WNYC’s Brian Lehrer wants to know how many

SUV’s there are on your block." Apparently this is an experiment

in "crowdsourcing", and it involves Lehrer’s listeners (and Naparstek’s

readers) wandering outside at some point over the next week, counting the number

of cars vs SUVs on their block, and then leaving the results ~~in Naparstek’s~~

~~comments section~~ on the WNYC

website. It’s an interesting idea, but it has little empirical validity.

Why?

The main reason is that public-radio listeners and Streetsblog readers are

not an impartial group: they generally hate SUVs. When they see a lot of SUVs

on their block, they’re likely to get annoyed, and remember the Streetsblog

post, and start counting. It’s conceivable that they might even exaggerate,

either consciously or unconsciously, depending on shades of grey about what

exactly constitutes an SUV.

The readers might even find themselves walking down a block which is not their

own, see that it’s full of SUVs, and report that block, rather than

their own. And if and when they look out their window or walk down their block

and see that there are precious few SUVs on it, they’re less likely to report

that fact.

I think Naparstek’s experiment would be much more effective if he made participants

participate twice. First, they would post which specific block they were going

to count, and the specific time and day they were going to count it –

which would have to be at least 6 hours in the future. Then the second post,

conducted at the predetermined place and time, would be the actual count.