Thursday, 19 July 2018

Stat Writing Exercise - Improving Graphs

This is an in-class exercise that I gave to the 3rd year undergrads in a Statistical Communication class. It was designed to take 20 minutes to explain and 40-50 minutes to execute, including instant feedback. It went well enough that I felt it was worth sharing.

Consider the following graphs. For each one,
- Describe a problem with the graph. Some graphs have more than one problem. You don’t need to mention every problem. The example below describes several problems to show a set of possible solutions.

- Name an improvement or alternate way in which you present the information in the graph. You don’t need to make a graph (or table, or plain text) of the improved version, but you do need to describe your solution clearly.

- Do not use the same problem and improvement twice.

You only need to do this for four of the six exercises.

Example 1

Example 1 Problems:
- This bar graph does not have a labelled y-axis.
- The axis starts around 100 M instead of zero. This magnifies the apparent difference between ‘people on welfare’ and ‘people with a full time job’.
- There are only two pieces of data here. Text would have been enough.
- The graph does not show people with part time jobs, and apparently includes children and retired people as ‘people on welfare’. This graph describes 210 million people out of 300 million in the US.

Example 1 Possible solutions:
- A pie chart showing the breakdown of people without jobs, with part time jobs, and with full time jobs as a proportion of the US population.
- A simple sentence stating “108.6 million people are on welfare, while 101.7 million have full time jobs.”
- A bar chart with a labelled y-axis that starts from zero, or at least contains some space and a break at the bottom.
- Remove the confusing green gears background.
- Remove the title entirely
- Add guide lines to the y-axis.

Example 2:

Example 2 Problems:
- It is difficult to connect a certain part of each trend line to a particular year.
- The scale for property crime is in ‘thousands per 100,000 people’ while the others are in ‘per 100,000 people’. This implies, at a glance, that there are more murders than property crimes.
Example 2 Possible solutions:
- Add vertical guidelines every 5 or 10 years.
- Mark the crime rates in 1980 and in 2015, or the percentage increase or decrease.
- Change the scale of property crimes to number per 100,000 people.
- Change the sub-title from ‘rate per 100,000 people’ to ‘number of annual cases per 100,000 people’.

Exercise Plots:

Four points for each of problem/solution sets. Typically, this will be 2 points for the problem and 2 for the solution, but be prepared to assign 1 and 3 respectively if a solution fills in some missing detail from the problem.
Deduct 0.5 point for any non-trivial grammar or spelling mistake, or 0.5 points for any grammar or spelling mistake at all if the answer is particularly short.

For each problem explanation
0/2 – Vague answers that could be applied to almost any graph. “The chart is confusing”   “There is too much information” “This is the wrong type of graph”.
1/2 – Explanations that address a particular part of the graph, but do a poor job (or no job) of explaining why something is a problem. “The axis labels are missing”, “The sizes of the baseballs and basketballs is misleading.” “There are too many country labels”.
2/2 – Explanations that address some aspect AND explain why that aspect is problematic. “Without axis labels, it is unclear that these companies are arranged by most to least revenue.”, “Using objects of different sizes exaggerates the differences in ticket prices.”, “Without an explanation in the caption, it’s unclear if the ticket prices are in nominal or constant dollars.”, “The overlapping country labels are impossible to read, let alone interpret”.

For each solution suggestion
0/2 – Offering no additional information. “Fix the problem”
0.5/2 – Simply naming another graph that would work.  “Use a bar chart”
1/2  - A short description of a solution. “Two different charts, one for musician deaths, and one for the general population.”
2/2 – A full description of a graph/table/text that could be used in place of the original plot. “A table of the recipient countries from most to least with the actual values of aid as well as a percentage of the total.”
2/2 – A clear description of an alternative for a problematic aspect with enough detail to show improvement. “Make the company rank the y-axis, include the company name. Make the % devoted to the cloud the x-axis. Consider only including companies with >5% cloud revenue.”
2/2 – Two short descriptions for two problems.

Post-mortem reflection:
I think it went particularly well. I set aside 50 minutes for the exercise, and the median completion time was 35-40 minutes. Having the students choose four of the six plots to critique worked better than expected, as no plot in particular was neglected.
The two examples took 10 minutes each to explain. I chose one obviously bad example, and one good example that still had room for improvement or at least adaptation to a different audience. This exercise came after 2 hours each of lecture on making quality plots and on making quality tables, respectively, so these are not the first such examples the students had seen.