Tuesday, 26 June 2018

Survey Notes: Sensitive Information, Heaping, and Psychometrics


Below are some additional notes on four survey question topics that warranted more specific information than my 20 survey question tips could offer: 
1. How even the most innocent of differences can produce a statistically significant effect,
2. One way to ask for sensitive information without respondents admitting anything.
3. The heaping phenomenon.
4. Where to find previously made and tested psychometric scales.

 The effect of wording


Consider the following two questions asked in surveys by Quinnipiac University and Gallup, respectively.

1) "Do you think the United States is doing enough to address climate change, doing too much, or do you think more needs to be done to address climate change?" (Quinnipiac University Poll)
2) "Do you think the U.S. government is doing too much, too little, or about the right amount in terms of protecting the environment?" (Gallup Poll)

On the surface of these look like the same question but there are a few subtle differences.

Question (1) asks if the entire United States is doing enough. Question (2) specifically asks if “the U.S. government” is doing enough.

Question (1) also asks about specifically about climate change whereas (2) asks about the entire environment.

The order of the prompted responses in (1) is “enough”, “ too much”, and “needs more”. The order of responses in (2) is “too much”,  “too little” and “about right”.

These differences may seem very minor or subtle but they can affect the rate of responses to such surveys as well as the responses themselves.

The website pollingreport.com reports that the survey questions (1) and (2) were asked until 1291 and 1041 responses were collected, respectively. Their 95% margins of error were 4 and 3.3% respectively, which makes the contrast margin of error about 5.1%.

These questions were both asked of the American public in March 2018, so the population is essentially the same. However (1), the Quinnipiac University question found 22% answered ‘doing enough’, while in (2), the Gallup poll question, 28% answered ‘about right’. That's a difference larger than margin of error just from minor differences in polling choices like wording and sampling. Furthermore 5% of the respondents to (1) answered as unsure, whereas only 1% of the respondents to (2) did.

Neither of the survey questions is bad or notably better than each other, and still we see differences like this. These differences can happen for surveys that have been developed and executed even at a professional level by impartial, unbiased polling groups.


Sensitive information

One way to collect proportion information about something embarrassing or illegal is to allow respondents to mask their answers through a more benign condition.

As an example consider the following question:

Flip a coin and answer this which of the following describes your situation best?

A) My coin landed heads.
B) My coin landed Tails or I have used illegal drugs this year or both.

This sort of setup is useful when the variable of interest is the prevalence or proportion of the sensitive condition. If the true proportion of the condition, for example drug use, is P, then the expected proportion that would and answer “yes or drugs” would be E =  (1 + P)/2

We can estimate P by inverting this formula to P = 2E - 1

The main appeal of this method is that respondents can collectively reveal a proportion of something without anyone responded admitting to a behavior.

However since you cannot know which respondents this have the sensitive condition further analysis like chi-squared test are difficult or impossible.

From a survey wording standpoint this method also introduces an increased potential for confusion or distrust compared to a simple question.
Another drawback compared to asking something directly is that the margin of error is doubled. If the true proportion P is small it is even plausible that the estimate of P would be less than 0.


The heaping phenomenon


Consider the following pair of graphs from the 2011 census on literacy in India.
In the histogram of age counts there are spikes in reported age frequencies every 5 years.  Why might this be? The size and regularity of the spike is far beyond what random variation historic events or demographic Trends would suggest.



A key detail is that these ages are reported ages instead of actual ages. Poor conditions mean many older Indians are unsure of their exact age in years so they report on a proximate age. Some of the ages will tend towards the nearest multiple of 5 because of the phenomenon called heaping.

Also consider the graph of literacy rates (Y) over age (X). The downward spikes are a result of the same heaping phenomenon as it interacts with another variable. Respondents that heap their age to the nearest five years are also less likely to be literate. Although age heaping is an extreme example, any open question that is asked is subject to heaping.

Heaping happens to any number that is not known exactly. For example, the answer to an open answer survey question on annual income is likely to be heaped to the nearest $5,000 or $10,000. Similarly, the amount someone is willing to pay for an item is likely to be heaped to the nearest dollar or price point. This is why such questions are often asked as ordinal ranges instead of open-ended questions.



Pre-made psychometric scales


There has been a great deal of work already done on psychometric scales, and many of these scales have been published and validated for you to use for your own server. These skills include the ECQ2 (emotional control questionnaire), the PNS (Personal Need for Structure survey), the NPI (Nacrissistic Personality Inventory), and the VLQ (Valued Living Questionnaire). These are all found the Acceptance and Commitment Therapy measures package by Dr. Joseph Ciarrochi  and Linda Bilich, along with notes on each measure’s validity.

There are several types of validity and ways to measure them. Test-retest validity describes the level of consistency between test scores if the same test is taken by the same person at different times. There is also internal validity, which describes surveys for which each question is measuring the same underlying psychometric property, one way to measure internal validity is Cronbach's Alpha.

Also, it’s very hard to tell a survey is actually measuring the psychometric property that it’s intended to. Scores on tests for depression, for instance, can be vastly different for different tests on the same person.