Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Wednesday 19 July 2017

Writing Seminar on Short Scientific Pieces

The following is the first of five seminars I am giving this and next week on statistical writing. 

About half of the material here has appeared in previous blog posts on making survey questions.


Composing short pieces. A workshop about scientific writing, using the skill of survey writing as a catalyst.

Why start with surveys / questionnaires?

1. Survey writing tends to be left out of a classical statistics program, and is instead left to the social sciences. It is, however, a skill that is asked of statisticians because of the way it integrates into design of experiments.

2. Surveys have the potential to be very short, therefore it should not be an overwhelming task to create one.

3. Writing a proper survey absolutely requires that the writer imagine an intended reader and how someone else might understand what is written.

With a survey, as with most other writing that you as statisticians, graduate students and/or faculty will be doing, will be read by others that know less about the subject at hand than you do. You are writing from the perspective of an expert.

This perspective is a major shift from much of the work done in undergraduate that involves writing. Aside from peer evaluations, most undergrad writing is done to demonstrate understanding of a topic to someone who knows that topic better than you. That often reduces to using specific key terms and phrases that the grader, a professor or teaching assistant is looking for, and making complete sentences. If any key parts are missing from that work that a casual reader would need to be able to understand the topic, then the grading person can fill it that missing part with their own understanding.

When writing a research report, a scientific article, a thesis, most people that read the material will do so with the intent of learning something from it. That means they won't be able to fill in any missing critical information with knowledge, because you have the knowledge and they don't. Even worse, they may fill in missing parts with incorrect knowledge.

In the case of a survey, even though respondents are the ones answering the questions, the burden of being understood rests with the agent asking the questions. As the survey writer, YOU are the one that knows the variables that you want to measure.

So even though this workshop is titled 'composing short pieces', a large amount of time will be spent on survey questions. Much of this applies to all scientific writing.

Writing Better Surveys

Tip 1. Make sure your questions are answerable. Anticipate cases where questions may not be answerable. For example, a question about family medical history should include an 'unknown' response for adoptees and others who wouldn't know. If someone has difficulty answering a survey question, that frustration lingers and they may guess a response, guess future responses, or quit entirely. Adding a not-applicable or open ended 'other' question, or a means to skip a question are all ways to mitigate this problem.

Tip 2. Avoid logical negatives like 'not', 'against', or 'isn't' when possible. Some readers will fail to see the word 'not', and some will get confused by the logic and will answer a question contrary to their intended answer. If logical negatives are unavoidable, highlight them in BOLD, LARGE AND CAPITAL.

Tip 3. Minimize knowledge assumptions. Not everyone knows what initialisms like FBI or NHL stand for. Not everyone knows what the word 'initialism' means. Lower the language barrier by using as simple language as possible without losing meaning. Use full names like National Hockey League, or define them regularly if the terms are used very often.

Tip 4. If a section of your survey, such as demographic questions, is not obviously related to the context of the rest of the survey, preface that section with a reason why are you asking them. Respondents may otherwise resent being asked questions they perceive as irrelevant.

Tip 5. Each question comes at a cost out of a respondent's attention budget. Don't include questions haphazardly or allow other researchers to piggyback questions onto your survey. Every increase in survey length increases the risk of missed or invalid answers. Engagement will drop off over time. See Tip 17.

Tip 6. Be specific about your questions, don't leave them open to interpretation. Minimize words with context specific definitions like 'framework', and avoid slang and non-standard language. Provide definitions for anything that could be ambiguous. This includes time frames and frequencies. For example, instead of 'very long' or 'often', use '3 months' or 'five or more times per day'.

Tip 7. Base questions on specific time frames like 'In the past week how many hours have you...', as opposed to imagined time frames like 'In a typical week how many hours have you...'. The random noise involved in people doing that activity more or less than typical should balance out in your sample. Time frames should be long enough to include relevant events and short enough to recall.

Exercise: Writing with exactness (also called exactitude)

Part 1 of 2: Consider
Why is it “in the last week” and not “in a typical week”?

If a question asks something like “in a typical week, how many alcoholic drinks have you consumed?”

- Respondents are invited will tend to over-average and discount rare events.
- Respondents are invited to idealize their week, which may increase the potential for social desirability bias.
- Every respondent will draw their week from a different time frame (imagined or real) as their typical week. However, “in the last week”

Part 2 of 2: Create

Put yourself in the shoes of...

...wait, let me restart, that was an idiom. (See Tip 21)

Consider the perspective of a stakeholder in a survey. A stakeholder could be anyone involved in the survey or directly benefiting from what it reveals, such as a respondent, the surveying firm or company, or the client that paid for the survey. Discuss amongst your group the different consequences of choose to ask about a respondent's place of residence in one of two ways:

Version 1:
Where is your main place of residence?

Version 2:
What was your place of residence on July 14, 2017?

Exercise: Sizes of Time Frames

Even if a human respondent is trying their best to be honest, memory is limited. Rare or noteworthy events may be able to be recalled for years, but more mundane things won't be.

Discuss the benefits and drawbacks (the good and bad aspects) of the following three survey questions.

Version 1:
In the last week, how many movies did you see in a theater?

Version 2:
In the last year, how many movies did you see in a theater?

Version 3:
In the last ten years, how many movies did you see in a theater?

Tip 8. For sensitive questions (drug use, trauma, illegal activity), start with the negative or less socially desirable answers first and move towards the milder ones. That gives respondents a comparative frame of reference that makes their own response seem less undesirable.

Tip 9. Pilot your questions on potential respondents. If the survey is for an undergrad course, have some undergrads answer and critique the survey before a full release. Re-evaluate any questions that get skipped in the pilot. Remember, if you could predict the responses you will get from a survey, you wouldn't need to do the survey at all.

Tip 10. Hypothesize first, then determine the analysis and data format you'll need, and THEN write or find your questions.

Tip 11. Some numerical responses, like age and income, are likely to be rounded. Some surveys ask such questions as categories instead of open-response numbers, but information is lost this way. There are statistical methods to mitigate both problems, but only if you acknowledge the problems first.

Tip 12. Match your numerical categories to the respondent population. For example, if you are asking the age of respondents in a university class, use categories like 18 or younger, 19-20, 21-22, 23-25, 26 or older. These categories would not be appropriate for a general population survey.

Tip 13. For pick-one category (i.e. multiple choice, polytomous) responses, including numerical categories, make sure no categories overlap (i.e. mutually exclusive), and that all possible values are covered (i.e. exhaustive.)

Tip 14. When measuring a complex psychometric variable, (e.g. depression), try to find a set of questions that have already been tested for reliability on a comparable population (e.g. CES-D). Otherwise, consult a psychometrics specialist. Reliability refers to the degree to responses to a set of questions 'move together', or are measuring the same thing. Reliability can be computed after the survey is done.

Exercise - Synonyms
Pick an informative word from a short passage (e.g. Tip 14)
1. Find a synonym of that word.
2. Write a definition of that new word.
3. Consider how using the new word changes the sentence.
"Share of the smartphone market was hotly contested."
Using "contest" implies more of a struggle or a fight and the original word "compete".

Replace the verb "compete" with "contest".
Contest (verb): To fight to control or hold.

Tip 15. Ordinal answers in which a neutral answer is possible should include one. This prevents neutral people from guessing. However, not every ordinal answer will have a meaningful neutral response.

Tip 16. Answers that are degrees between opposites should be balanced. For each possible response, its opposite should also be included. For example, strongly agree / somewhat agree / no option / somewhat disagree / strongly disagree is a balanced scale.

Tip 17. Limit mental overheard - the amount of information that people need to keep in mind at the same time in order to answer your question. Try to limit the list of possible responses to 5-7 items. When this isn't possible, don't ask people to interact with every item. People aren't going to be able to rank 10 different objects 1st through 10th meaningfully, but they will be able to list the top or bottom 2-3. An ordered-response question rarely needs more than 5 levels from agree to disagree. See Tip 5.

Exercise – Information Density
Part 1 of 2: Consider

Consider the following two sentences. They convey the same information, but one version packs all that information into a single sentence with one independent clause. The other version splits this into two sentences and three independent clauses.

Version 1
“Reefs of Silurian age are of great interest. These are found in the Michigan basin, and they are pinnacle reefs.”

Version 2
“Pinnacle reefs of Silurian age in the Michigan basin are of great interest.”
(Inspiring Source: The Chicago Guide to Communicating Science, 2nd ed, page 46)

Each version is appropriate, but for different situations. When words are at a premium, such as when writing an abstract, when giving a talk of very limited time, or giving priming information for a survey question, the shorter version is typically appropriate. However, readers and listeners, especially those that speak English as an additional language will have a harder time parsing the shorter version, even if it takes less time to read or say.
The operative difference between the versions is information density. The longer version requires less effort to read because there are fewer possibilities for each word to modify or interact with the other words in its clause. This is done by adding syntax words that convey no additional information on their own.

Part 2 of 2: Create

On your own take the following sentence and make a less information dense version of it by breaking it into smaller sentences. 

“Data transformations are commonly-used tools that can serve many functions in quantitative analysis of data, including improving normality of a distribution and equalizing variance to meet assumptions and improve effect sizes, thus constituting important aspects of data cleaning and preparing for your statistical analyses.“

Now take this following passage and condense it into a single sentence with greater information density.

“Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robust to modest violations of these assumptions.”

Source: Jason W. Osborne, Improving your data transformations: Applying the Box-Cox transformation , Practical Assessment, Research & Evaluation. Vol 15, Number 12, Oct 2010

Tip 18. Layout matters. Make every response field unambiguously next to its most relevant text. For an ordinal response question, make sure that ordering structure is apparent by lining up all the answers along one line or column of the page.

Tip 19. Randomize response order where appropriate. All else being equal, earlier responses in a list are chosen more often, especially when there are many items. To smooth out this bias, scramble the order of responses differently for each survey. This is only appropriate when responses are not ordinal. Example of an appropriate question: 'Which of the following topics in this course did you find the hardest?'

Tip 20. A missing value for a variable does not invalidate a survey. Even if the variable is used in an analysis, the value can substituted with a set of plausible values by a method called imputation. A missing value is not as bad as a guessed value, because then the uncertainty can be identified.

Tip 21. Restrict your language to 'international English' (assuming the questions are in English). This means that idioms, or local names for things should be avoided when possible. When there are two or more competing names for a thing, rather than one internationally recognized one, use all major names for an object that are in use for your target demographic.

[As time permits, Exercise prompt: Try to figure out what 'Your potato is baking.' means without knowledge of Brazilian Portuguese]

Main Inspiration Source for tips: Fink, Arlene (1995). "How to Ask Survey Questions" - The Survey Kit Vol. 2, Sage Publications.

Digital Writing Tools
Hemmingway, find/replace, text diff, texrendr
Caveat / Pitfall 1: Digital tools are not a substitute for judgement. In one book, every instance of the word 'mage' was to be replaced with the synonym 'wizard', according to the style guide of the publisher. (Both 'mage' and 'wizard' are words that refer to people with magic-using abilities. However, the publisher may have preferred to use one term over another for internal consistency.) Rather that make a case-by-case replacement, the person responsible for making the change simply used a digital 'replace all' function, changing every instance of 'mage' to 'wizard'. Unfortunately, this particular text also included the word 'damage', which was changed automatically to the nonsense word 'dawizard'.

Caveat / Pitfall 2: Another issue with digital tools is that they can't all be depended upon to be available in their current forms forever. Microsoft is moving towards a SaaS (software as a service) model, where access to tools like Word with its grammar check are based on a subscription rather than a one-time fee. This means in the future you may lose access to that tool for reasons beyond your control. Web-based tools like Hemmingway carry an even greater risk, because the server for Hemmingway could be shut down without any warning and leave you without access.

Also, you may need to send your writing or other material (e.g. figures, tables) to a remote server to be processed in order to use those tools. If your writing contains sensitive or confidential material, you may be breaking legal agreements with your data providers by using these tools.

Further Homework and Reading

This is based on Chapter 8 of the book Successful Surveys - Research Methods and Practice by George Gray and Neil Guppy. The chapter is "Designing Questions of the book Successful Surveys."
Q1. Give an example of a numerical (e.g. quantitative) open-ended question and a numerical closed-ended question.
Q2. Give an example of a non-numerical (e.g. nominal, text-based) open-ended question and a non-numerical closed-ended question.
Q3. In your OWN WORDS, give two advantages and disadvantages of open-ended questions.
Q4. In your OWN WORDS, give two advantages and disadvantages of closed-ended questions.
Q5. How do field coded questions combine the features of both open- and closed-ended questions.
Q6. For what kind of surveys are open-ended questions more useful? When are they less useful?
Q7. What are five features that make for well worded questions.
Q8. What is the name used for a survey question that asks about and focuses on two distinct things?
Q9. What is a Likert scale?
Q10. What are four things that all have important effects on how people respond to survey questions?

No comments:

Post a Comment