The FINANCIAL — Recently I was asked by a dear friend to explain how a social scientist can pretend to know the opinions of an entire country by only polling a thousand people.
The FINANCIAL — Recently I was asked by a dear friend to explain how a social scientist can pretend to know the opinions of an entire country by only polling a thousand people.
“They’ve never asked me,” she said, “how could they possibly know what the whole country thinks if no one ever asked me or my friends.” This was not some country bumpkin, but by an educated lawyer.
We social scientists sometimes take for granted that the general public understands basic concepts behind survey research, whether well-educated or not. So while the readers of The FINANCIAL are certainly more educated than the norm, I will take this week to explain why it is that GORBI can ask questions of a relatively tiny group and know the mindset of millions.
Heads and Tails — There is a simple mathematical truth of the world. As far as we know, some things are simply random. If you were to flip a coin once, there would only be a 50% chance that it lands heads up. It is impossible to predict this outcome reliably. However, if you were to flip the coin one hundred times, you can make a pretty good guess that there would be about 50 heads and 50 tails. Maybe it would be 43 and 57, or 49 and 51, but your guess would be close most of the time.
The really interesting thing about random events, though, is that they are consistently random. The more you flip this coin, the closer the final count will be to 50% heads and 50% tails. This same concept applies to polling. If we make certain that every person in the country has the same chance of being selected, and then ask enough people questions, we will be close to the “true” count. This means that if we “flip the coin” one hundred thousand times, we will almost always get the same result as from flipping it only one thousand times.
To clarify this important point: each successive coin flip has less of an effect on certainty than the last. If you took data from two thousand flips, the count would be more accurate than from one thousand flips. The improvement in accuracy, however, is not as great as the improvement from five hundred to a thousand. This diminishing return means that, while we could invest time and money by surveying thousands and thousands of people, the quality of the data would not be that much greater than if we survey a thousand.
In fact, general social science surveys seem to have a “magic number” of around one thousand respondents. This sample size yields a margin of error of just over 3% at 95% confidence. Unfortunately I don’t have enough space this week to explain these numbers in depth, but the important thing to take away from this is that an appropriately sized sample will find the balance between cost efficiency and certainty.
Milk and Sugar — This is all fine and good, but people are not coins. People living in Tbilisi have different opinions and lifestyles from those in Mestia. How can we just walk around asking people questions considering their diversity of location and mind?
To understand why, think of a coffee cup. It has water, tiny particles of coffee bean, some sugar and some milk. If you wanted to know how much of the cup was sugar, and how much was milk, you can stir the cup very well, take one spoonful from the top and analyze it. The percentage of sugar in the spoonful will be nearly identical to that of the cup.
However, if the coffee has not been well stirred, this trick will not work. Perhaps more milk is floating on the top of the mug, or the whiskey you just added has sunk to the bottom. Random sampling is like stirring the coffee cup. This is why randomization technique is so important. It must ensure that each person in the population has the same chance of being selected.
Randomization Technique — In countries with very good public records, randomization is relatively easy. You can simply put all of the people’s addresses into a “hat” and pull out a thousand respondents. These people will be very close to representative of the whole population most of the time.
In countries like Georgia, where some people don’t even know they have an address, we must improvise. Depending on the study we conduct, one of the techniques we use is called “random walking.” First, we select geographic starting points at random, relative to regional populations. Once the interviewer arrives at the location, they begin “counting doors.”
In an apartment complex, for example, an interviewer will enter the building and knock on a previously selected door. Perhaps it’s apartment number 4. If they can, they will complete an interview here. To select their next respondent, they will count three doors and knock on apartment 8. Then they’ll count another three and knock on 12, and so on. Once they reach the top of the building they start over at the bottom, while still counting three doors. The same technique is used with houses.
This technique ensures that, wherever you live, you have the same chance of being selected for a study.
That chance is very low, which explains why you have probably never been polled, but it is the same for you as for your grandmother’s friend living on the other side of the country.
All the Little Problems — There are, of course, more problems that we must deal with in collecting data from people. Some people are unavoidably underrepresented in surveys. Homeless people, frequent movers, misanthropes: all of these are harder to find, or more likely to refuse surveys. However, as with choosing the “magic number,” we must make concessions in the name of efficiency. This balance is by far the most difficult and most important part of survey research, getting quality data efficiently.
Next week I’ll further explain margin of error and confidence intervals and give some tips and tricks you can use to assess the quality of a study. It should be obvious from recent weeks that published statistics can be wildly different depending on their source. With this in mind I’ll show you some common mistakes made by polling firms and explain why, even without mischief or misdeed, a study can be biased.
Discussion about this post