Wednesday, 6 December 2017

Reflection on teaching a 400-level course on Big Data for statisticians.

This was the first course I have taught that I would consider a 'main track' course, in that the students were to learn more about what they were already competent in at the start. Most of the courses I have previously taught were 'service courses', in that they were designed and delivered by the Statistics Department in service to other departments that wanted their own students to have a stronger quantitative and experimental background (e.g. Stat 201, 203, 302, and 305). The exception, Stat 342, was designed for statistics majors, but is built as an introduction to SAS programming. Since most other courses in the program are taught using R or Python, teaching SAS feels like teaching a service course as well, in that I am teaching something away from the students' main competency, and enrollment is mainly driven by requirement rather than interest.

In my usual courses, I am frequently grilled from anxious students about what exactly is going to be on exams. Frequent complaints I receive in student responses are about how I spend too much time on 'for interest' things are not explicitly being tested on the exams. I've also found that I needed to adhere to rigid structure in course planning and in grading policy. Moving an assignment's due date, teaching something out of the stated syllabus order, changing the scope or schedule of a midterm, or even dropping an assignment and moving the grade weight have all caused a cascade of problems in previous classes.

Stat 440, Learning from Big Data, was a major shift.

I don't know which I prefer, and I don't know which is easier in the long term, but it is absolutely a different skill set. The bulk of the effort changed from managing people to managing content. I did not struggle to keep the classroom full, but I did struggle to meaningfully fill the classroom's time. I had planned to cover the principles of modern methods (cross validation, model selection, LASSO, dimension reduction, regression trees, neural nets), some data cleaning (missing data, imputation, image processing), some technology (SQL, parallelization, Hadoop), some text analysis (regular expressions, edit distance, XML processing), but I still had a couple of weeks that I had to fill at the end because of the lightning speed that I was able to burn through these topics without protest.

In 'big data', the limits of my own knowledge became a major factor. Most of what I covered in class wouldn't have been considered undergrad material ten years ago when I was a senior (imputation, LASSO, neural nets); some of it didn't exist (Hadoop). There are plenty of textbook and online resources for learning regression or ANOVA, but the information for many of the topics of this course were cobbled together from blog posts, technical reports, and research papers. A lot of resources were either extremely narrow in scope or vague to the point of uselessness. I needed materials that were high-level enough for someone not already a specialist to understand, and technical enough that someone well-versed in data science would get something of value from it, and I didn't find enough.

The flip side of this was that motivation was easy. Two of the three case studies assigned had an active competition component to them. The first such study was a US-based based challenge to use police data from three US cities. In this one, presentation was a major basis that the police departments would judge the results. As such, I had requests for help with plotting and geographic methods that were completely new to me. A similar thing happened with the 'iceberg' case study, based on this Kaggle competition. I taught the basics of neural nets, and at least three groups asked me about generalizations and modifications to neural nets that I didn't know about. (The other case study was a 'warm-up' in which I adapted material from a case study competition held by the Statistical Society of Canada. The students were not in active competition). At least 20% of the class has more statistical talent than I do.

In order to adapt to this challenge and advantage, I changed about mid-semester from my usual delivery method of a PDF slideshow to one of commentary while running through computer code. This worked well for material that would be directly useful for the three case study projects, such as all the image processing work I showed for the case study on determining the difference between icebergs and ships. It wasn't as good for material that would be studied for the final exam. I went through some sample programs on web scraping, and the feedback wasn't as positive for that, and the answers I got on the final exam for the web scraping question were too specific to the examples I had given.

A side challenge was the ethical dilemma of limiting my advice to students looking to improve their projects. I had to avoid using insights that other students had shared with me because of the competitive nature of the class. Normally if someone had difficulty with a homework problem, I could use their learning and share it with others, but this time, that wasn't automatically the case.

There was also a substantial size difference, I had 20-30 students, which is by far the smallest class I've ever lectured to. Previously, Stat 342 was my 'small' class, which had enrollment between 50-80, because it was compared to service classes of 100-300 students. This allowed me to actually communicate with students an a much more one-on-one level. Furthermore, since most of the work was done in small team settings, I got to know what each group of students was working on for their projects.

I worry that what I delivered wasn't exactly big data, and was really more of a mixed bag of data science. However, there was a lot of feedback from the students that they found it valuable, and value-added was what the goal all along.

Monday, 13 November 2017

Snakes and Ladders and Transition Matrices

Recently, /u/mikeeg555 created this post on the statistics subreddit with their results from a simulation of 10,000,000 games of this instance of Snakes and Ladders. This is the sort of information that's good to show in an undergrad or senior secondary level classroom as a highlight of the sort of insights you can get from the kind of simulation that anyone can program.

Writing a Resume as a Data Scientist or Statistician.

What do viral posts and resumes have in common? They rise to the top based on a very superficial evaluation.

Here are some notes from a seminar on resume writing for students, especially undergraduates, in statistics and data science.

Evaluating Exam Questions using Crowdmark, IRT, and the Generalized Partial Credit Model

Making good exam questions is universally hard. The ideal question should have a clear solution to those with the requisite understanding, but also difficult enough that someone without the knowledge needed can guess at an answer.

An item response theory (IRT) based analysis can estimate the difficulty of a question, as well as the general skill of each of the test takers. The generalized partial credit model extends classical IRT from questions with binary scores to ones with an ordinal set of possible scores.

R code and example inside.

Book review of Improving How Universities Teach Science Part 2: Criticisms and comparison to the ISTLD

As far as pedagogical literature goes, Carl Wieman’s Improving How Universities Teach Science - Lessons from the Science Education Initiative was among my favourites. It keeps both jargon and length to a minimum, as it is barely more than 200 pages without counting the hiring guide at the end. The work and its presentation are strongly evidence-based and informative, and the transformation guide in the appendices provides possible actions that the reader can do to improve their own classes. Most of it is applicable to most of science education.

I have some criticisms, but please don’t take this as a condemnation of the book or of the SEI in general.

The term ‘transformation’ is vague, despite being used extensively throughout the first two thirds of the book. This is partially forgivable because it has to be vague in order to cover what could be considered a transformation across many different science fields. However there could have been more examples or elaboration or a better definition of the term early on. First concrete examples to show up and clarify what transformation meant are found in the appendix, 170 pages in.

Dismissal of the UBC mathematics Department, and of mathematics education in general.

The metric Wyman used primarily the proportion of faculty that bought in to the program. That is the proportion of Faculty that transformed their courses, because typically faculty transformed all of their courses or none of them. Many departments were considered a success in that 70% or more of the faculty transformed their classes. Those that were under 50% were mostly special cases where they had entered into the Science Education initiative later and hadn't had the opportunity to transform. Among the All-Stars was the UBC statistics Department 88% of their 17 faculty with teaching appointments transform their classes. Among The Faculty of the UBC mathematics Department however only 10% of their 150 + strong Department bought in and transformed their classes. To contrast 1.2 million dollars was spent on the mathematics Department while $300,000 was spent on the statistics Department, so the mathematics people got more in total but the statistics people got more per faculty. It's not the failure to transform the mathematics Department that bothers me but the explanation for it.

Wieman boils down the failure to transform the mathematics department into two factors. First was the culture within that particular department, which was one that did not emphasize undergraduate education and seemed to assume that mathematics was an innate ability that either students had or had not regardless of the amount of effort put in. Before Wieman started attempting to transform this department it had a policy of automatically failing the bottom few percentiles of every introductory calculus class regardless of performance. The second factor Wieman uses to explain the failure is that mathematics is inherently not empirical, which means that a lot of the active learning meant to make Concepts more concrete would not have applied.

Having taught and been taught in both mathematics and statistics departments at multiple institutions myself I don't buy these arguments. From personal experience the most engaging and active classrooms I experienced have spread equally across mathematics and statistics. With in mathematics the most memorable was abstract algebra which by definition is non-empirical. Furthermore, at Simon Fraser University it's the mathematics department that has been leading the way on course transformation.

As for the argument about innate ability, this is an idea that spreads far beyond just university departments. I have no qualification to claim how true or false it is. However it's not a useful assumption, because it makes many things related to teaching quality in mathematics automatically non-actionable.

Finally it seems like a strange argument for a professor of physics to make about mathematics. I would have like to see more investigation and perhaps it's covered in some of his other literature, but then I would have like to see more reference towards that literature if it exists.

Compared to the Institute for the Study of Teaching and Learning in the Disciplines (ISTLD) at SFU, Wieman’s SEI is several times larger in scale and tackles the problem of university teaching entire departments at a time. The SEI works with department chairs directly and with faculty actively through their science education specialists. The ISTLD’s projects are self-contained course improvements, where staff and graduate student research assistants provided literature searches, initial guidance, and loose oversight over the course improvement projects. Both initiates fostered large volumes of published and publicly presented research.

The funding for course improvement projects through the ISTLD was non-competitive; the only requirements to receive a grant were to submit a draft proposal, to attend some workshops on pedagogy and to submit a new proposal guided by these workshops. Grants from the SEI at both UBC and CU was a competitive process, which Wieman used because, in his words, it was the only system familiar to science faculty.

In case you missed it, here is the first part of this book review, which discusses the content more directly.

Book review of Improving How Universities Teach Science, Part 1: Content.

Unlike many other books and literature on the subject, Carl Wieman’s Improving How Universities Teach Science - Lessons from the Science Education Initiative spent most of its pages talking about the administrative issues involving the improvement of university teaching. If you're familiar with recent pedagogical literature this book doesn't come with many surprises. What set it apart to me is the scale of the work that Wieman undertook, and his emphasis on educational improvement being an integrative process across an entire department rather than a set of independent advances.

The Science Education Initiative, or the SEI, model is about changing entire departments in large multi-year, multi-million dollar projects. The initiative focuses on transforming classes by getting faculty to buy into the idea of transforming them, rather than transforming the classes themselves directly.

The content is based on Wieman’s experience developing a science education initiative at both University of British Columbia (UBC) and at Colorado University (CU). It starts with a vision of what an ideal education system would look like any university mostly as an inspiring goal rather than any practical milestone. It continues with the description of how change was enacted in both of these universities. The primary workforce behind these changes was a new staff position called the science education specialist or SES. SES positions typically went to recent science PhD graduates of that had a particular interest in education. These specialists were hired and then trained in modern pedagogy and techniques to foster active learning. These specialists were assigned as consultants or partners to faculty that had requested help in course transformation.

The faculty themselves were induced to help through formal incentives like money for research, or through teaching buy-outs that allowed them more time to work on research, and through informal incentives like considering in teaching assignments and opportunities for co-authorship on scholarly research. Overcoming the already established incentive systems (e.g. publish or perish) that prioritized research over teaching was a common motif throughout this book.

The middle third of the book is reflective, and it’s also the meatiest part; if you’re short on time, read only Chapters 5, 6, and the coda. Here, Wieman talks about which parts of the initiative worked immediately, which worked after changes, and which never worked and why. He talks about his change from a focus on changing courses to a focus on changing the attitudes of faculty. He talks about the differences in support he had at the different universities and how that affected the success of his program. For example, UBC got twice the financial support and direct leadership support from the dean. He also compares the success rate of different departments within the science faculty. Of particular interest to me are the UBC statistics and the UBC mathematics departments, which obtained radically different results. The statistics department almost unanimously transformed their courses, while the mathematics department almost unanimously didn’t.

Wieman also talks at length about ‘ownership’ of courses, and how faculty feeling that they own certain courses is a roadblock. Calling it a roadblock is partly because of the habit of faculty to keep their lecture notes to themselves on the assumption that they are the only one teaching a particular course. Furthermore, the culture of ownership was perceived to contribute to resistance from faculty to changes to their courses.

Under Wieman's model, course material is to be shared with the whole department so that anyone teaching a particular course has access to all the relevant material that has been made for it by department. Although UBC managed to create a repository for course material, the onus on populating that repository the faculty and there were few people that actually contributed. However where this matters most in the introductory courses even partial sharing was enough because many people tend to teach those courses.

The final third of the book is a set of appendices which include examples of learning activities and strategies in transformed courses, guiding principles for instruction, and several short essays on educational habits and references to much of the other work that Wieman has done. It also includes a hiring guide with sample interview questions for possible Science Education specialists.

The book also includes coda, which is an 8 page executive summary of the first two parts of the book. The coda served as a good review also a nicely packaged chapter that could be shared with decision makers such as deans and faculty chairs. Decision makers are exactly who I would recommend this book to; it has an excellent amount of information for the time and effort it takes to digest.

I had a few other thoughts about this book that were set aside for the sake of flow. You can find them in the second part of this book review.

Tuesday, 19 September 2017

Advice on Selecting the Right Journal

When I try to get something published in a journal, it's for the prestige and implied proof of quality.
Otherwise, if I wanted to write something and get attention for the idea quickly, I could write a blog post like this.

As such, I aim for a balance between the perceived importance of the journal and the chance of acceptance.

(Updated March 2019 with a section on predatory journals)

Left-brain creativity

One commonly cited way to improve or maintain mental health is to do something creative, such as painting, drawing, writing, dancing, making music, knitting or cooking. So what's the strategy to gain these benefits if you're not creatively inclined in these or any similar ways? What if you're, say, a statistician or a software engineer?

This post is about acknowledging other, more mathematical means of being creative, that aren't general thought of as traditionally creative. I'm calling these 'left-brain' creative means, which is reductionist, but easy to convey. Whether any of these are artistic in any way is irrelevant.

Martin Gardner was a master of left-brain creativity. He wrote books of mathematical puzzles and novelties, including a version of mini chess mentioned here. Making these challenges was absolutely creative. I would argue that the process of solving these puzzles would also be creative because it requires imagination and decisions that are novel to the solver.

Reiner Knizia has a PhD in mathematics and makes board games for a living. His visual artwork is rudimentary, which is fine because it's meant merely as dressing for the real creative work of abstract sets of rules meant to inspire clever player behaviour.

I mention these two first because I have been disparaged before for being un-creative when I would rely on similar abstractions as outlets. For instance, when told by (now ex-) girlfriend to go try to do something creative, I started working on a farming game I had envisioned, and decided to start with a list of livestock and a draft of their prices in the game. This didn't impress her.

With building toys, I usually made abstract patterns rather than anything that would traditionally have been considered creative. With Lego / Mega Blocks, my most memorable builds were a giant hollow box for holding hockey pucks, and an extremely delicate staircase. With K'nex, my work was always abstract shapes made in an ad-hoc manner.

I enjoy the concept of building toys a lot more more than actually building anything with them. It's a dream of mine that Capsella toys will make a return through 3D printing. Capsella was a toy make of clear plastic capsules with gears inside. It would be difficult, but doable.

There's also this game called Gravity Maze, in which the goal is to drop a marble in one tower of cubes and have it land in a target cube. The game comes with a set of puzzle cards which include a starting configuration of towers and a set of towers that you need to add to finish the maze. The game only comes with 60 such puzzle cards and additional ones aren't available for sale. On one vacation, I took it upon myself to draft a program that could randomly generate configurations and see if they were solutions. It's still in a notebook somewhere. Is this creative? It feels better if I think of it that way; doing this gave me the same joy I imagine someone gets from more traditional creative exploits.

On another vacation, I wrote a proof of concept for Gaussian elimination of a 4x4 matrix where the matrix was populated with fractions. The point was to write the entries of the resulting matrix each as a single fraction. That way, an ASIC (Application Specific Integrated Circuit) could later be made to solve such a matrix in fractions, which avoids the computationally slow method of subtraction, which is typically done through iterated subtraction. Was that creative? It felt a lot like doodling or sketching to decide upon this and solve it.

I was a big fan of Dungeons and Dragons, and later Rifts and GURPS, when I was younger. I almost never played roleplaying games, but I spent a lot of time reading rulebooks and compendiums, and writing my own material such as new monsters. To someone expecting creative work to look more like art, this probably resembled accounting.

This clearly isn't a new discovery to a lot of people. Just looking at websites for chess variants and chess puzzles tell me that much, along with the large custom card making subset of the Magic: The Gathering community tell me this much. There are many people that seem to enjoy making up challenges and rulesets and get creative joy out of it.

If there's a thesis to this post, it's that if you're not inclined to make what would be typically considered art, you can still reap the mental health benefits of being creative through more 'left-brain' means. Other activities worth mentioning, but not from personal experience, include making crossword puzzles, nurikabe puzzles, maps, and fractals. Do something that involves building or making and a lot of small decisions, and don't worry about whether it's expressive, artistic, or traditionally considered creative.

Sunday, 13 August 2017

Sports questions - Speculation, RISP, and PEDs

What will popular sports look like in the year 2030? (All sports)

Technology is rapidly making new sports possible.

Will improved cameras and laser gates make races with running starts viable? I would love to know how much faster the 100 metre dash can be without starting from a standstill.

Will drone racing or drone hunting take flight? Will e-sports continue their growth and penetration into the mainstream?

Will self driving cars start competing in Nascar? Formula One? Rally car racing? Will all of these racing formats survive or maintain their scale in the next 15-20 years?

Demographics are opening new possibilities too.

Shrinking populations and urbanization are leaving behind many otherwise livable and usable buildings as abandoned. Will terrain-based sports like airsoft and paintball take off with the abundance of good locations? Will GoPro and similar robust and portable camera make such pursuits into spectator sports?

Will we see a shift of focus towards women in sport, following the trend of tennis? Will we see mixed-sex competition in sports where size and muscle mass mean less?

Will extreme sports see a revival, led by Red Bull sponsored events like Crashed Ice and Flugtag?

What sports will decline?

Will UFC mixed martial-arts continue to eat into the viewing market share of WWE wrestling? Why didn't Texas Hold 'em keep its hold on the public? Could the NHL (and the KHL) mismanage ice hockey into a fringe sport? Can American Football maintain its popularity in the face of growing concern over brain injury? Will American football adapt? Can golf maintain its popularity given its cost?

What about stadiums?

Instead of building stadiums for specific sports, or a limited set of sports, will new sports emerge to fit into already made stadiums? Will existing sports start to use stadiums that were built for other purposes, such as softball in a baseball stadium, or soccer football in an American football stadium?

On RISP, Runners In Scoring Position. (Baseball)

Batters do better (or pitchers do worse) when there are runners in scoring position. Why? Is it just a result of skill auto-correlation, such as a pitcher's tendency to do poorly in streaks of batters, or is it something else? Is it the distraction on the pitcher for having a batter who could steal a base or read signs? Is it the effect of the fielders having to do more than one job at once?

A more measurable or actionable question: is the RISP advantage greater for certain batters? For example does a player with a reputation for stealing bases give a larger 'RISP bonus' than another with ordinary base-running. Does the effect add with multiple runners? Does it change with base? Does it change with pickoff attempts? How much of this is balks being drawn?

Similarly, how should pickoff attempts be counted with regards to pitching load? My guess is that they have the effect of about half a pitch in terms of performance in that game and in that plate appearance.

What performance enhancers are 'fair'? (All sports)

A lot of drugs are banned from a lot of sports, but why? My assumption is that it makes the feats of one era comparable to another. We can take Usain Bolt's running records and compare them to the records of Donovan Bailey's in the 90's, and say with little or no argument that Bolt at his peak was faster than Bailey at his. The difference in their 100 metre dash times can isolated to the runners and not the chemical technology of their respective eras.

My assumption comes from the qualifying statements in hockey and baseball about different eras of each sport defined by seemingly minor changes in the equipment or rules of the game. Hitting feats from the 1990s seasons of MLB baseball are qualified with comments about steroid use by superstar hitters. Steroid use was allowed at the time, I presume on the basis that every player had access to the steroids.

Why is chemical technology is seen as unfair and other technology like improved running shoes is fair? Probably the hidden nature of drugs, and the related difficulty in directly regulating the 'equipment' used. It's much simpler to enforce rules about the volume of a golf club face, or the curvature of a hockey stick, rather than an acceptable dosage of steroids.

Things have gotten confusing lately.

Oscar Pistorius, whom had both his legs amputated below the knee as an infant, was until recently a competitive paralypmic sprinter. He used springy blades, described here, to run. He also wanted to compete in general sprinting competition but was barred from general competition as it was found that his prosthetic feet were more efficient for running than baseline human feet. So, even though paralypmic competition was designed to provide viable competition to those with physical disabilities, the technology used to mitigate Pistorius's disability was deemed too effective.

In January 2017, the IOC (International Oympic Committee) released the results of testing they had done on various drugs to test for performance enhancement in, of all things, chess. They found that caffeine, Ritalin, and Adderal all improved performance in double blind tests. So, if chess ever becomes an Olympic sport, should these drugs be banned and tested for? What happens if someone has a prescription for Ritalin, do they have to go without to compete?

Things are about to get a lot more confusing.

CRISPR is a technology that may have the potential to arbitrarily rewrite genetic code. If done to a human embryo to specialize the resulting human into a particular sport, what are the rules to be surrounding that? Generic editing seems like drugs and blood doping in that it's a hidden technology that would be very complicated to regulate other than to ban completely. It would be at least intended to be performance enhancing, and not every competitor would have access to the technology, at least not at first.

But changing the genetics of a person is not adding something foreign to the person, it is changing who that person is. That's who that person will be through their entire life growing up. Should we ban someone from competition for being 'naturally' too good at something as a result of a decision made before that person's birth?

Or, do we separate competitors into 'baseline' and 'enhanced' humans? This is starting to sound way more like a dystopian, racist dog show (with terms like 'best in breed') than the 'faster, higher, stronger' tone I was aiming for. It's something we collectively need to think about though, not just for sport but for all human interaction going forward.

Let's close with this thought on the subject by speedrunner Narcissa Wright: "All the categories are arbitrary".

Wednesday, 19 July 2017

Writing Seminar on Short Scientific Pieces

The following is the first of five seminars I am giving this and next week on statistical writing.

About half of the material here has appeared in previous blog posts on making survey questions.

------------

Composing short pieces. A workshop about scientific writing, using the skill of survey writing as a catalyst.

Why start with surveys / questionnaires?

1. Survey writing tends to be left out of a classical statistics program, and is instead left to the social sciences. It is, however, a skill that is asked of statisticians because of the way it integrates into design of experiments.

2. Surveys have the potential to be very short, therefore it should not be an overwhelming task to create one.

3. Writing a proper survey absolutely requires that the writer imagine an intended reader and how someone else might understand what is written.

With a survey, as with most other writing that you as statisticians, graduate students and/or faculty will be doing, will be read by others that know less about the subject at hand than you do. You are writing from the perspective of an expert.

This perspective is a major shift from much of the work done in undergraduate that involves writing. Aside from peer evaluations, most undergrad writing is done to demonstrate understanding of a topic to someone who knows that topic better than you. That often reduces to using specific key terms and phrases that the grader, a professor or teaching assistant is looking for, and making complete sentences. If any key parts are missing from that work that a casual reader would need to be able to understand the topic, then the grading person can fill it that missing part with their own understanding.

When writing a research report, a scientific article, a thesis, most people that read the material will do so with the intent of learning something from it. That means they won't be able to fill in any missing critical information with knowledge, because you have the knowledge and they don't. Even worse, they may fill in missing parts with incorrect knowledge.

In the case of a survey, even though respondents are the ones answering the questions, the burden of being understood rests with the agent asking the questions. As the survey writer, YOU are the one that knows the variables that you want to measure.

So even though this workshop is titled 'composing short pieces', a large amount of time will be spent on survey questions. Much of this applies to all scientific writing.

Writing Better Surveys

Tip 1. Make sure your questions are answerable. Anticipate cases where questions may not be answerable. For example, a question about family medical history should include an 'unknown' response for adoptees and others who wouldn't know. If someone has difficulty answering a survey question, that frustration lingers and they may guess a response, guess future responses, or quit entirely. Adding a not-applicable or open ended 'other' question, or a means to skip a question are all ways to mitigate this problem.

Tip 2. Avoid logical negatives like 'not', 'against', or 'isn't' when possible. Some readers will fail to see the word 'not', and some will get confused by the logic and will answer a question contrary to their intended answer. If logical negatives are unavoidable, highlight them in BOLD, LARGE AND CAPITAL.

Tip 3. Minimize knowledge assumptions. Not everyone knows what initialisms like FBI or NHL stand for. Not everyone knows what the word 'initialism' means. Lower the language barrier by using as simple language as possible without losing meaning. Use full names like National Hockey League, or define them regularly if the terms are used very often.

Tip 4. If a section of your survey, such as demographic questions, is not obviously related to the context of the rest of the survey, preface that section with a reason why are you asking them. Respondents may otherwise resent being asked questions they perceive as irrelevant.

Tip 5. Each question comes at a cost out of a respondent's attention budget. Don't include questions haphazardly or allow other researchers to piggyback questions onto your survey. Every increase in survey length increases the risk of missed or invalid answers. Engagement will drop off over time. See Tip 17.

Tip 6. Be specific about your questions, don't leave them open to interpretation. Minimize words with context specific definitions like 'framework', and avoid slang and non-standard language. Provide definitions for anything that could be ambiguous. This includes time frames and frequencies. For example, instead of 'very long' or 'often', use '3 months' or 'five or more times per day'.

Tip 7. Base questions on specific time frames like 'In the past week how many hours have you...', as opposed to imagined time frames like 'In a typical week how many hours have you...'. The random noise involved in people doing that activity more or less than typical should balance out in your sample. Time frames should be long enough to include relevant events and short enough to recall.

Exercise: Writing with exactness (also called exactitude)

Part 1 of 2: Consider

Why is it “in the last week” and not “in a typical week”?

If a question asks something like “in a typical week, how many alcoholic drinks have you consumed?”

- Respondents are invited will tend to over-average and discount rare events.

- Respondents are invited to idealize their week, which may increase the potential for social desirability bias.

- Every respondent will draw their week from a different time frame (imagined or real) as their typical week. However, “in the last week”

Part 2 of 2: Create

Put yourself in the shoes of...

...wait, let me restart, that was an idiom. (See Tip 21)

Consider the perspective of a stakeholder in a survey. A stakeholder could be anyone involved in the survey or directly benefiting from what it reveals, such as a respondent, the surveying firm or company, or the client that paid for the survey. Discuss amongst your group the different consequences of choose to ask about a respondent's place of residence in one of two ways:

Version 1:

Where is your main place of residence?

Version 2:

What was your place of residence on July 14, 2017?

Exercise: Sizes of Time Frames

Even if a human respondent is trying their best to be honest, memory is limited. Rare or noteworthy events may be able to be recalled for years, but more mundane things won't be.

Discuss the benefits and drawbacks (the good and bad aspects) of the following three survey questions.

Version 1:

In the last week, how many movies did you see in a theater?

Version 2:

In the last year, how many movies did you see in a theater?

Version 3:

In the last ten years, how many movies did you see in a theater?

Tip 8. For sensitive questions (drug use, trauma, illegal activity), start with the negative or less socially desirable answers first and move towards the milder ones. That gives respondents a comparative frame of reference that makes their own response seem less undesirable.

Tip 9. Pilot your questions on potential respondents. If the survey is for an undergrad course, have some undergrads answer and critique the survey before a full release. Re-evaluate any questions that get skipped in the pilot. Remember, if you could predict the responses you will get from a survey, you wouldn't need to do the survey at all.

Tip 10. Hypothesize first, then determine the analysis and data format you'll need, and THEN write or find your questions.

Tip 11. Some numerical responses, like age and income, are likely to be rounded. Some surveys ask such questions as categories instead of open-response numbers, but information is lost this way. There are statistical methods to mitigate both problems, but only if you acknowledge the problems first.

Tip 12. Match your numerical categories to the respondent population. For example, if you are asking the age of respondents in a university class, use categories like 18 or younger, 19-20, 21-22, 23-25, 26 or older. These categories would not be appropriate for a general population survey.

Tip 13. For pick-one category (i.e. multiple choice, polytomous) responses, including numerical categories, make sure no categories overlap (i.e. mutually exclusive), and that all possible values are covered (i.e. exhaustive.)

Tip 14. When measuring a complex psychometric variable, (e.g. depression), try to find a set of questions that have already been tested for reliability on a comparable population (e.g. CES-D). Otherwise, consult a psychometrics specialist. Reliability refers to the degree to responses to a set of questions 'move together', or are measuring the same thing. Reliability can be computed after the survey is done.

Exercise - Synonyms

Pick an informative word from a short passage (e.g. Tip 14)

1. Find a synonym of that word.

2. Write a definition of that new word.

3. Consider how using the new word changes the sentence.

Example:

"Share of the smartphone market was hotly contested."

Using "contest" implies more of a struggle or a fight and the original word "compete".

Replace the verb "compete" with "contest".

Contest (verb): To fight to control or hold.

Tip 15. Ordinal answers in which a neutral answer is possible should include one. This prevents neutral people from guessing. However, not every ordinal answer will have a meaningful neutral response.

Tip 16. Answers that are degrees between opposites should be balanced. For each possible response, its opposite should also be included. For example, strongly agree / somewhat agree / no option / somewhat disagree / strongly disagree is a balanced scale.

Tip 17. Limit mental overheard - the amount of information that people need to keep in mind at the same time in order to answer your question. Try to limit the list of possible responses to 5-7 items. When this isn't possible, don't ask people to interact with every item. People aren't going to be able to rank 10 different objects 1st through 10th meaningfully, but they will be able to list the top or bottom 2-3. An ordered-response question rarely needs more than 5 levels from agree to disagree. See Tip 5.

Exercise – Information Density

Part 1 of 2: Consider

Consider the following two sentences. They convey the same information, but one version packs all that information into a single sentence with one independent clause. The other version splits this into two sentences and three independent clauses.

Version 1

“Reefs of Silurian age are of great interest. These are found in the Michigan basin, and they are pinnacle reefs.”

Version 2

“Pinnacle reefs of Silurian age in the Michigan basin are of great interest.”

(Inspiring Source: The Chicago Guide to Communicating Science, 2^nd ed, page 46)

Each version is appropriate, but for different situations. When words are at a premium, such as when writing an abstract, when giving a talk of very limited time, or giving priming information for a survey question, the shorter version is typically appropriate. However, readers and listeners, especially those that speak English as an additional language will have a harder time parsing the shorter version, even if it takes less time to read or say.

The operative difference between the versions is information density. The longer version requires less effort to read because there are fewer possibilities for each word to modify or interact with the other words in its clause. This is done by adding syntax words that convey no additional information on their own.

Part 2 of 2: Create

On your own take the following sentence and make a less information dense version of it by breaking it into smaller sentences.

“Data transformations are commonly-used tools that can serve many functions in quantitative analysis of data, including improving normality of a distribution and equalizing variance to meet assumptions and improve effect sizes, thus constituting important aspects of data cleaning and preparing for your statistical analyses.“

Now take this following passage and condense it into a single sentence with greater information density.

“Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robust to modest violations of these assumptions.”

Source: Jason W. Osborne, Improving your data transformations: Applying the Box-Cox transformation , Practical Assessment, Research & Evaluation. Vol 15, Number 12, Oct 2010

http://pareonline.net/pdf/v15n12.pdf

Tip 18. Layout matters. Make every response field unambiguously next to its most relevant text. For an ordinal response question, make sure that ordering structure is apparent by lining up all the answers along one line or column of the page.

Tip 19. Randomize response order where appropriate. All else being equal, earlier responses in a list are chosen more often, especially when there are many items. To smooth out this bias, scramble the order of responses differently for each survey. This is only appropriate when responses are not ordinal. Example of an appropriate question: 'Which of the following topics in this course did you find the hardest?'

Tip 20. A missing value for a variable does not invalidate a survey. Even if the variable is used in an analysis, the value can substituted with a set of plausible values by a method called imputation. A missing value is not as bad as a guessed value, because then the uncertainty can be identified.

Tip 21. Restrict your language to 'international English' (assuming the questions are in English). This means that idioms, or local names for things should be avoided when possible. When there are two or more competing names for a thing, rather than one internationally recognized one, use all major names for an object that are in use for your target demographic.

[As time permits, Exercise prompt: Try to figure out what 'Your potato is baking.' means without knowledge of Brazilian Portuguese]

Main Inspiration Source for tips: Fink, Arlene (1995). "How to Ask Survey Questions" - The Survey Kit Vol. 2, Sage Publications.

Digital Writing Tools

Showcase:

Hemmingway, find/replace, text diff, texrendr

Caveat / Pitfall 1: Digital tools are not a substitute for judgement. In one book, every instance of the word 'mage' was to be replaced with the synonym 'wizard', according to the style guide of the publisher. (Both 'mage' and 'wizard' are words that refer to people with magic-using abilities. However, the publisher may have preferred to use one term over another for internal consistency.) Rather that make a case-by-case replacement, the person responsible for making the change simply used a digital 'replace all' function, changing every instance of 'mage' to 'wizard'. Unfortunately, this particular text also included the word 'damage', which was changed automatically to the nonsense word 'dawizard'.

Caveat / Pitfall 2: Another issue with digital tools is that they can't all be depended upon to be available in their current forms forever. Microsoft is moving towards a SaaS (software as a service) model, where access to tools like Word with its grammar check are based on a subscription rather than a one-time fee. This means in the future you may lose access to that tool for reasons beyond your control. Web-based tools like Hemmingway carry an even greater risk, because the server for Hemmingway could be shut down without any warning and leave you without access.

Also, you may need to send your writing or other material (e.g. figures, tables) to a remote server to be processed in order to use those tools. If your writing contains sensitive or confidential material, you may be breaking legal agreements with your data providers by using these tools.

Further Homework and Reading

This is based on Chapter 8 of the book Successful Surveys - Research Methods and Practice by George Gray and Neil Guppy. The chapter is "Designing Questions of the book Successful Surveys."

Q1. Give an example of a numerical (e.g. quantitative) open-ended question and a numerical closed-ended question.

Q2. Give an example of a non-numerical (e.g. nominal, text-based) open-ended question and a non-numerical closed-ended question.

Q3. In your OWN WORDS, give two advantages and disadvantages of open-ended questions.

Q4. In your OWN WORDS, give two advantages and disadvantages of closed-ended questions.

Q5. How do field coded questions combine the features of both open- and closed-ended questions.

Q6. For what kind of surveys are open-ended questions more useful? When are they less useful?

Q7. What are five features that make for well worded questions.

Q8. What is the name used for a survey question that asks about and focuses on two distinct things?

Q9. What is a Likert scale?

Q10. What are four things that all have important effects on how people respond to survey questions?

About the Author

Jack Davis is a teaching professor in Statistics at the University of Waterloo, Canada.

Their research spans statistics in sport, data mining, adult education and Bayesian computation.

They have a course called "Statistics, Gambling, and Games of Chance" on Udemy.

They also pretend to be an expert in writing and game design, which is why they wrote a textbook called "Writing for Statisticians" and why they programmed a video game called "Doc Logic" for Xbox 360.

They enjoy chess and chess variants, but they are so, so, very bad at them. They want to try living on a houseboat sometime.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science