Wednesday, 19 July 2017

Writing Seminar on Short Scientific Pieces

The following is the first of five seminars I am giving this and next week on statistical writing.

About half of the material here has appeared in previous blog posts on making survey questions.

------------

Composing short pieces. A workshop about scientific writing, using the skill of survey writing as a catalyst.

Why start with surveys / questionnaires?

1. Survey writing tends to be left out of a classical statistics program, and is instead left to the social sciences. It is, however, a skill that is asked of statisticians because of the way it integrates into design of experiments.

2. Surveys have the potential to be very short, therefore it should not be an overwhelming task to create one.

3. Writing a proper survey absolutely requires that the writer imagine an intended reader and how someone else might understand what is written.

With a survey, as with most other writing that you as statisticians, graduate students and/or faculty will be doing, will be read by others that know less about the subject at hand than you do. You are writing from the perspective of an expert.

This perspective is a major shift from much of the work done in undergraduate that involves writing. Aside from peer evaluations, most undergrad writing is done to demonstrate understanding of a topic to someone who knows that topic better than you. That often reduces to using specific key terms and phrases that the grader, a professor or teaching assistant is looking for, and making complete sentences. If any key parts are missing from that work that a casual reader would need to be able to understand the topic, then the grading person can fill it that missing part with their own understanding.

When writing a research report, a scientific article, a thesis, most people that read the material will do so with the intent of learning something from it. That means they won't be able to fill in any missing critical information with knowledge, because you have the knowledge and they don't. Even worse, they may fill in missing parts with incorrect knowledge.

In the case of a survey, even though respondents are the ones answering the questions, the burden of being understood rests with the agent asking the questions. As the survey writer, YOU are the one that knows the variables that you want to measure.

So even though this workshop is titled 'composing short pieces', a large amount of time will be spent on survey questions. Much of this applies to all scientific writing.

Writing Better Surveys

Tip 1. Make sure your questions are answerable. Anticipate cases where questions may not be answerable. For example, a question about family medical history should include an 'unknown' response for adoptees and others who wouldn't know. If someone has difficulty answering a survey question, that frustration lingers and they may guess a response, guess future responses, or quit entirely. Adding a not-applicable or open ended 'other' question, or a means to skip a question are all ways to mitigate this problem.

Tip 2. Avoid logical negatives like 'not', 'against', or 'isn't' when possible. Some readers will fail to see the word 'not', and some will get confused by the logic and will answer a question contrary to their intended answer. If logical negatives are unavoidable, highlight them in BOLD, LARGE AND CAPITAL.

Tip 3. Minimize knowledge assumptions. Not everyone knows what initialisms like FBI or NHL stand for. Not everyone knows what the word 'initialism' means. Lower the language barrier by using as simple language as possible without losing meaning. Use full names like National Hockey League, or define them regularly if the terms are used very often.

Tip 4. If a section of your survey, such as demographic questions, is not obviously related to the context of the rest of the survey, preface that section with a reason why are you asking them. Respondents may otherwise resent being asked questions they perceive as irrelevant.

Tip 5. Each question comes at a cost out of a respondent's attention budget. Don't include questions haphazardly or allow other researchers to piggyback questions onto your survey. Every increase in survey length increases the risk of missed or invalid answers. Engagement will drop off over time. See Tip 17.

Tip 6. Be specific about your questions, don't leave them open to interpretation. Minimize words with context specific definitions like 'framework', and avoid slang and non-standard language. Provide definitions for anything that could be ambiguous. This includes time frames and frequencies. For example, instead of 'very long' or 'often', use '3 months' or 'five or more times per day'.

Tip 7. Base questions on specific time frames like 'In the past week how many hours have you...', as opposed to imagined time frames like 'In a typical week how many hours have you...'. The random noise involved in people doing that activity more or less than typical should balance out in your sample. Time frames should be long enough to include relevant events and short enough to recall.

Exercise: Writing with exactness (also called exactitude)

Part 1 of 2: Consider

Why is it “in the last week” and not “in a typical week”?

If a question asks something like “in a typical week, how many alcoholic drinks have you consumed?”

- Respondents are invited will tend to over-average and discount rare events.

- Respondents are invited to idealize their week, which may increase the potential for social desirability bias.

- Every respondent will draw their week from a different time frame (imagined or real) as their typical week. However, “in the last week”

Part 2 of 2: Create

Put yourself in the shoes of...

...wait, let me restart, that was an idiom. (See Tip 21)

Consider the perspective of a stakeholder in a survey. A stakeholder could be anyone involved in the survey or directly benefiting from what it reveals, such as a respondent, the surveying firm or company, or the client that paid for the survey. Discuss amongst your group the different consequences of choose to ask about a respondent's place of residence in one of two ways:

Version 1:

Where is your main place of residence?

Version 2:

What was your place of residence on July 14, 2017?

Exercise: Sizes of Time Frames

Even if a human respondent is trying their best to be honest, memory is limited. Rare or noteworthy events may be able to be recalled for years, but more mundane things won't be.

Discuss the benefits and drawbacks (the good and bad aspects) of the following three survey questions.

Version 1:

In the last week, how many movies did you see in a theater?

Version 2:

In the last year, how many movies did you see in a theater?

Version 3:

In the last ten years, how many movies did you see in a theater?

Tip 8. For sensitive questions (drug use, trauma, illegal activity), start with the negative or less socially desirable answers first and move towards the milder ones. That gives respondents a comparative frame of reference that makes their own response seem less undesirable.

Tip 9. Pilot your questions on potential respondents. If the survey is for an undergrad course, have some undergrads answer and critique the survey before a full release. Re-evaluate any questions that get skipped in the pilot. Remember, if you could predict the responses you will get from a survey, you wouldn't need to do the survey at all.

Tip 10. Hypothesize first, then determine the analysis and data format you'll need, and THEN write or find your questions.

Tip 11. Some numerical responses, like age and income, are likely to be rounded. Some surveys ask such questions as categories instead of open-response numbers, but information is lost this way. There are statistical methods to mitigate both problems, but only if you acknowledge the problems first.

Tip 12. Match your numerical categories to the respondent population. For example, if you are asking the age of respondents in a university class, use categories like 18 or younger, 19-20, 21-22, 23-25, 26 or older. These categories would not be appropriate for a general population survey.

Tip 13. For pick-one category (i.e. multiple choice, polytomous) responses, including numerical categories, make sure no categories overlap (i.e. mutually exclusive), and that all possible values are covered (i.e. exhaustive.)

Tip 14. When measuring a complex psychometric variable, (e.g. depression), try to find a set of questions that have already been tested for reliability on a comparable population (e.g. CES-D). Otherwise, consult a psychometrics specialist. Reliability refers to the degree to responses to a set of questions 'move together', or are measuring the same thing. Reliability can be computed after the survey is done.

Exercise - Synonyms

Pick an informative word from a short passage (e.g. Tip 14)

1. Find a synonym of that word.

2. Write a definition of that new word.

3. Consider how using the new word changes the sentence.

Example:

"Share of the smartphone market was hotly contested."

Using "contest" implies more of a struggle or a fight and the original word "compete".

Replace the verb "compete" with "contest".

Contest (verb): To fight to control or hold.

Tip 15. Ordinal answers in which a neutral answer is possible should include one. This prevents neutral people from guessing. However, not every ordinal answer will have a meaningful neutral response.

Tip 16. Answers that are degrees between opposites should be balanced. For each possible response, its opposite should also be included. For example, strongly agree / somewhat agree / no option / somewhat disagree / strongly disagree is a balanced scale.

Tip 17. Limit mental overheard - the amount of information that people need to keep in mind at the same time in order to answer your question. Try to limit the list of possible responses to 5-7 items. When this isn't possible, don't ask people to interact with every item. People aren't going to be able to rank 10 different objects 1st through 10th meaningfully, but they will be able to list the top or bottom 2-3. An ordered-response question rarely needs more than 5 levels from agree to disagree. See Tip 5.

Exercise – Information Density

Part 1 of 2: Consider

Consider the following two sentences. They convey the same information, but one version packs all that information into a single sentence with one independent clause. The other version splits this into two sentences and three independent clauses.

Version 1

“Reefs of Silurian age are of great interest. These are found in the Michigan basin, and they are pinnacle reefs.”

Version 2

“Pinnacle reefs of Silurian age in the Michigan basin are of great interest.”

(Inspiring Source: The Chicago Guide to Communicating Science, 2^nd ed, page 46)

Each version is appropriate, but for different situations. When words are at a premium, such as when writing an abstract, when giving a talk of very limited time, or giving priming information for a survey question, the shorter version is typically appropriate. However, readers and listeners, especially those that speak English as an additional language will have a harder time parsing the shorter version, even if it takes less time to read or say.

The operative difference between the versions is information density. The longer version requires less effort to read because there are fewer possibilities for each word to modify or interact with the other words in its clause. This is done by adding syntax words that convey no additional information on their own.

Part 2 of 2: Create

On your own take the following sentence and make a less information dense version of it by breaking it into smaller sentences.

“Data transformations are commonly-used tools that can serve many functions in quantitative analysis of data, including improving normality of a distribution and equalizing variance to meet assumptions and improve effect sizes, thus constituting important aspects of data cleaning and preparing for your statistical analyses.“

Now take this following passage and condense it into a single sentence with greater information density.

“Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robust to modest violations of these assumptions.”

Source: Jason W. Osborne, Improving your data transformations: Applying the Box-Cox transformation , Practical Assessment, Research & Evaluation. Vol 15, Number 12, Oct 2010

http://pareonline.net/pdf/v15n12.pdf

Tip 18. Layout matters. Make every response field unambiguously next to its most relevant text. For an ordinal response question, make sure that ordering structure is apparent by lining up all the answers along one line or column of the page.

Tip 19. Randomize response order where appropriate. All else being equal, earlier responses in a list are chosen more often, especially when there are many items. To smooth out this bias, scramble the order of responses differently for each survey. This is only appropriate when responses are not ordinal. Example of an appropriate question: 'Which of the following topics in this course did you find the hardest?'

Tip 20. A missing value for a variable does not invalidate a survey. Even if the variable is used in an analysis, the value can substituted with a set of plausible values by a method called imputation. A missing value is not as bad as a guessed value, because then the uncertainty can be identified.

Tip 21. Restrict your language to 'international English' (assuming the questions are in English). This means that idioms, or local names for things should be avoided when possible. When there are two or more competing names for a thing, rather than one internationally recognized one, use all major names for an object that are in use for your target demographic.

[As time permits, Exercise prompt: Try to figure out what 'Your potato is baking.' means without knowledge of Brazilian Portuguese]

Main Inspiration Source for tips: Fink, Arlene (1995). "How to Ask Survey Questions" - The Survey Kit Vol. 2, Sage Publications.

Digital Writing Tools

Showcase:

Hemmingway, find/replace, text diff, texrendr

Caveat / Pitfall 1: Digital tools are not a substitute for judgement. In one book, every instance of the word 'mage' was to be replaced with the synonym 'wizard', according to the style guide of the publisher. (Both 'mage' and 'wizard' are words that refer to people with magic-using abilities. However, the publisher may have preferred to use one term over another for internal consistency.) Rather that make a case-by-case replacement, the person responsible for making the change simply used a digital 'replace all' function, changing every instance of 'mage' to 'wizard'. Unfortunately, this particular text also included the word 'damage', which was changed automatically to the nonsense word 'dawizard'.

Caveat / Pitfall 2: Another issue with digital tools is that they can't all be depended upon to be available in their current forms forever. Microsoft is moving towards a SaaS (software as a service) model, where access to tools like Word with its grammar check are based on a subscription rather than a one-time fee. This means in the future you may lose access to that tool for reasons beyond your control. Web-based tools like Hemmingway carry an even greater risk, because the server for Hemmingway could be shut down without any warning and leave you without access.

Also, you may need to send your writing or other material (e.g. figures, tables) to a remote server to be processed in order to use those tools. If your writing contains sensitive or confidential material, you may be breaking legal agreements with your data providers by using these tools.

Further Homework and Reading

This is based on Chapter 8 of the book Successful Surveys - Research Methods and Practice by George Gray and Neil Guppy. The chapter is "Designing Questions of the book Successful Surveys."

Q1. Give an example of a numerical (e.g. quantitative) open-ended question and a numerical closed-ended question.

Q2. Give an example of a non-numerical (e.g. nominal, text-based) open-ended question and a non-numerical closed-ended question.

Q3. In your OWN WORDS, give two advantages and disadvantages of open-ended questions.

Q4. In your OWN WORDS, give two advantages and disadvantages of closed-ended questions.

Q5. How do field coded questions combine the features of both open- and closed-ended questions.

Q6. For what kind of surveys are open-ended questions more useful? When are they less useful?

Q7. What are five features that make for well worded questions.

Q8. What is the name used for a survey question that asks about and focuses on two distinct things?

Q9. What is a Likert scale?

Q10. What are four things that all have important effects on how people respond to survey questions?

Monday, 17 July 2017

Chess Variants - 960 and Really Bad Chess

Fischer chess, or chess 960, is like queen’s chess which is the standard Orthodox game we know, except in 960, the starting position of the pieces in the back rank is random. It can be played live with a standard chess set, or online through Jocly or Lichess.

There are restrictions on the starting arrangements such as that the king must be between the rooks and that one bishop must be on each colour of square. All the restrictions leave 960 legal arrangements out of 8!/(2!2!2!1!1!) or 5040 unique ways to arrange two rooks, two knights, two bishops, one king and one queen. Both players are given the same arrangements using the files or columns on the board not mirrored for each player's perspective chosen at random.

As far as chess variants go, ‘960’ it's pretty close to Queen’s chess in that it is played with the the same number and types of pieces the same 8 by 8 board and that aside from difference in starting Arrangements no other rule changes are imposed. The restrictions even ensure that castling in either direction is possible.

If chess 960 is chess done with poetic license then Really Bad Chess (RBC) is done with artistic transgression. In Really Bad Chess, by Zach Gage (http://reallybadchess.com/presskit/ ) and available on Android and Apple, play is done on an 8 by 8 board with one king and a collection of pawns, knights, bishops, rooks, and queens arranged into ranks per team. That's about all I can say for certain about the starting state of a Really Bad Chess game.

The pieces each player receives are random, really random. The average piece tends to be more powerful than they would be in a game of Queen’s chess. At the lowest ranks/difficulties, players typically start with three or four queens as well as a front rank comprised mostly of bishops. At those same lower difficulties the opponent AI would have a much less impressive collection of pieces.

That's one big difference between 960 and RBC: the players are each given their own random set. When playing against humans, both are treated to a random high-powered arsenal. In ranked mode, games are against the AI and the difference in power is determined by the difficulty setting. At rank/difficulty 0 the player starts with a massive material advantage over the computer. At 100, this advantage is reversed. At 50, both players are given equally strong pieces although the pieces are still different for each player and they are still more powerful on average than in Queen’s chess. For example each player could have two queens and roughly six knights.

Every win in ranked play against the AI increases the difference of the next ranked game. Every loss does the opposite. The AI itself gets no smarter which may be to simplify the concept of difficulty, and it keeps the game reasonably fast because AI does not think to Greater depth at high levels.

This is the first variant I've played where I'm not absolute trash and I have the proof:

The AI tends to strongly value putting you in check and will needlessly sacrifice pieces to do so sometimes. This strategy works well and it partially sidesteps the issue of uneven material value. For example, in difficulties less than 50 even trades for pieces are desirable. At difficulties above 50 they are not. But these material advantages are most important in the endgame. I have played several matches above-50 where I have overcome the material disadvantage and then some only to be checkmated in the mid-game. For example, this game, in which I played as white:

There is an undo button that allows you to take back one move against an artificial intelligence. The button only works to a depth of one move, and it can only be done so many times. Undo uses can be recovered or stockpiled by either watching video ads (5 per view), or direct purchase ($1.40 per 100). 100 undo uses are bundled with the premium version of the game. The premium version settings are well worth the $4 if you're going to play 10 or more matches. The default color scheme is hideous but you can change it with premium.

One minor complaint is that the term ‘rank’ is used instead of ‘difficulty’. Rank goes up as you win when logically such a number should decrease. Rank 1 typically means ‘the best’, but not here.

A bigger problem is that pawns are always promoted to queens when promoted. This is usually what I want but cases do exist when another piece is better and the extra decision step isn't that cumbersome.

It's a lot of fun to go on a power trip and play matches with a lot more non-pawn pieces than I would otherwise have. It speeds the game up to the point of absurdity where 50% to 70% of pieces are moved to capture another piece.

The undo button and the wide set of possible scenarios in Really Bad Chess has been a fun tool for a casual player like me to practice tactics, however impossible they would be in a real game. I'm a little worried that games like this are teaching me to play chess incorrectly which will make it harder to develop skill in the standard game. However I've never played chess seriously and I'm in my thirties so the opportunity cost doesn’t seem too steep.

Really Bad Chess is really good at making puzzles as well. It has daily and weekly puzzles which are just matches with preset pieces.

Wednesday, 12 July 2017

Annual Report to Stakeholders 2016-17

Executive Summary (informally, the ‘TL;DR’):

Accomplishments this year felt like a natural extension of the previous year. The amount of writing dedicated to teaching five courses does not reflect the proportion of this year's effort that went towards that.

For reference, last year's report is found here

Personal:

Gabriela and I have the start of a family going; she has moved in and we have a chihuahua-poodle puppy. People that are impressed that Einstein did all that he did with a family of five children are probably the same people that have never seen a dog like this.

Education (Learning):

I have read extensively on the craft of scientific writing. Among the most useful books have been 'The Chicago Guide to Communicating Science' and 'The Copy Editor's Handbook' for my own learning and various IELTS and TOEFL test preparation guides for teaching preparation. There are many books on academic writing, few books on scientific writing, and apparently none on statistical writing.

Between a ten-day trip to Brazil, and many conversations with Gabriela and her Brazilian friends,, I can call myself an intermediate learner of Portuguese. I also finished all the Portuguese lessons on Duolingo, and got this nifty trophy!

Although I haven't obtained any more SAS certifications, I did gain more depth as a necessity to teaching Stat 342.

I read several books on chess and chess variants. This was in order to answer the personal question 'why is our current set of rules, also called Queen's Chess, the canonical set of rules and not some other iteration?'. So far, the best answer I have the same answer for why words are spelled the way they are, or why the US uses the imperial measurement system: That's what the most popular set of rules were at the time they were frozen by their widespread, reproducible use.

I took the test to be on Jeopardy! To study for this, I played at home, read trivia books, and played the mobile version of the game until I was in the top 200 of 40,000+ players. Only 4% of test takers are contacted for an interview, and I was among the remaining 96%.

Education (Teaching):

Including this summer, I have taught five courses: Stat 305, Stat 342, Stat 201 (twice), and Stat 203. All courses except 342 were service courses, and all courses except 203 were new to me. My ratings are now collectively only slightly above average, but my applause record is still perfect at 5/5 (and one missing due to snowstorm).

In the last 12 months, about 700 students have had me as a lecturer.

Earlier this summer, I gave a series of 5 two-hour seminars on R programming for the graduate students and faculty in the department. The topics of the seminars were vectorization, optimization, scraping and cleaning text data, imputation, and using GGplot, respectively.

I also gave an invited lecture (over video conferencing) to Kevin Kniffin's sports science class at Cornell University. I gave a presentation of data mining tools to the SFU Sports Analytics Club.

In the next year, I will be teaching 203, 305, and 342 again. The notes for 305, and especially 203, are robust, but 342 needs work. Stat 342 is the SAS programming course, and I want to give it more depth this time around.

The other course for next year, Stat 300, is statistical writing, for which I've made extensive preparations.

Research:

The hockey pace paper that Rajitha, Tim, I wrote was accepted and we did a round of suggested revisions and some other improvements. Kevin Kniffin and Christian Hilbricht (of Cornell University) and I co-authored a paper on the timeout in hockey. I wrote my first solo paper on goalie fatigue in hockey. I also submitted the network analysis work from the thesis to ArXiV.

I updated the previously mentioned hockey research on the overtime loss rule at the Cascadia Sports Conference in Vancouver in September with recent years of data, as well as additional depth on shot count. However, the results were less significant than expected, meaning that teams are playing closer , there were negative results.

I networked in person with some potential collaborators at the University of Campinas in Brazil.

Publishing and Service:

I have started writing a coursepack / textbook on statistical writing. So far, along with material from the blog, I've made

- A test based on IELTS to see if the book and course are appropriate for you.

- An assignment on writing scientific questions

- An assignment, complete with example, to write a shortened paper for general interest

- A collection of example 'microconsults', based on my answers to statistics questions on online forums.

- Two reading comprehension assignments based on reproducibility and on undergrad research (in addition to the four such assignments previously posted on the blog)

2 papers were refereed for the Open Journal of Statistics.

7 papers were copy-edited for the Canadian Journal of Statistics.

20 blog posts were made and kept, not including this one. This is down from last year's 26, but the length of the posts are trending longer, and popularity of the blog itself is trending higher. The post on the Jeopardy analysis has more than 1300 views as of writing this.

3 students' theses were helped toward completion through copy-editing, coding, and/or consulting.

Game Design:

Not my accomplishment, but I funded (solo, not crowd) the creation of a mobile game by a friend. It's still in pre-alpha, but he's making a lot of progress.

About the Author

Jack Davis is a teaching professor in Statistics at the University of Waterloo, Canada.

Their research spans statistics in sport, data mining, adult education and Bayesian computation.

They have a course called "Statistics, Gambling, and Games of Chance" on Udemy.

They also pretend to be an expert in writing and game design, which is why they wrote a textbook called "Writing for Statisticians" and why they programmed a video game called "Doc Logic" for Xbox 360.

They enjoy chess and chess variants, but they are so, so, very bad at them. They want to try living on a houseboat sometime.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science