Friday, 25 May 2018

Analyzing Jeopardy in R - Part 2

My previous Jeopardy analyzer was built using a base of about 30 daily Coryat scores. This one has more than 1600 scores that were either recorded directly, e-mailed to me, or scraped from the forum at jboard.tv . Here we look at the consistency of tournament effects for different at-home players, and some long-term trends.


After recording 90 days over 2017, it's become apparent that I'm not getting any better at Jeopardy just by playing the game at home, as shown in Figure 1.*

Figure 1


The average Coryat is fitted using a spline smoother from the loess package found in base R. It's a more flexible model and a simpler model to code and manipulate than the previous one. The smooth line in Figure 1 shows that the Coryat score at the beginning of the year is roughly the same as it is at the end of the year. The two tournaments I recorded, the Teen Tournament on the Tournament of Champions are not used in the spline spline smoother, they are linearly interpolated. To measure the effect of these tournaments, I compare these linearly interpolated values to the mean value on each tournament. These results are shown in Table 1, including the estimate for tournament effect from my previous analysis which gives roughly the same value. According to these results, I get an average of 3128 more points in a College Championship game than a normal game, and 1726 fewer points in a Tournament of Champions game than normal.


Player
n
Regular Coryat
College
Champions
Teacher's
Jack
90
15570
+3128
-1726
NA
A
480
30637
+5443
+1866
+2354
B
707
34192
+4405
-86
+2090
C
55
35207
-15466 (n=1)
-370
-3829
D
27
16419
NA
NA
NA
E
45
23682
NA
NA
NA
F
119
18795
3329
-2044
NA

Table 1 - Coryats and Tournament Effects of Seven Players

Another trend I was interested in was more short-term. Looking at this chart of date today scores it almost seems like the scores are oscillating, with a high score followed by a low score and vice-versa. If this is a real effect we should be able to see it as a negative autocorrelation between one day's score and the next. Figure 8 shows a scatter plot of one day's scores in the X and the previous day's scores in the Y. The estimated Pearson correlation coefficient is almost exactly zero. Furthermore this lack of correlation is not an artifact of some nonlinear affect, because the same lack of patterns shows up when we compare the ranks of the the scores from one day to the next and take the Spearman correlation coefficient of these ranked values instead. In short, there are no day-to-day balancing effects and no hot or cold streaks. It's all just regression to the mean.

Even if the correlation did show up as statistically significant, it doesn't seem to be particularly meaningful. My hypothesis was that topics were chosen from day-to-day such that viewers at home would be more likely to have category or two that they could excel in every couple of days, and to avoid having long stretches we're home viewers may feel frustrated. Another effect of such a negative correlation would be that champions that stay on a long time truly would be outstanding players in multiple fields rather than specialists. However, I was unable to find any evidence of such a topic shuffling strategy in my own scores, and I would consider myself a fairly typical at-home player in my degree of specialization.**

Now let's try these analyses with some other players and see if the same sort of trends appear, as shown in Figures 2-7 and Table 1. According to the figures, a whole range of trends appear. The only common one seems to be quick improvement at the beginning of tracking Coryat scores. From the table, we see that the Tournament of Champions tends to vex people. The other tournaments not so much. One note about player C's College Championship effect is that it represents only a single measurement, which you can see by the triangle on their chart on day 53.

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7
What about that streak or balance hypothesis? Does the near-zero correlation hold for other players as well? Only for player F did a statistically significant correlation appear (p = 0.005, before multiple testing adjustments), and even then that could be an artifact of gradual improvement (a similar check on the model residuals could adjust for improvement, but we're p-hacking at this point). Figures 8 to 10 show the scatterplots of me and two of the other players to show how weak such a relationship is, if there is one at all.

Player
Pearson r
Spearman r
Jack
0.004
-0.006
A
0.045
0.036
B
0.112
0.100
C
0.013
0.019
D
-0.062
0.091
E
0.242
0.206
F
0.252
0.309
Table 2

Another hypothesis I wanted to test was about the nature of Coryat scores. The spline smoother model in this analysis relies on the scores having some linear relationship to underlying skill, rather than something non-linear like an exponential one. That is, someone who averages 15,000 would be just as much better then someone who average is 12,000, as that second person would be better than someone else whose average is 9000. Or, in other words, that every point of average Coryat increased represents the same amount of latent skill improvement.

This sort of relationship may seem given, but it isn't in a lot of games and sports. Consider bowling, either 5 or 10 pin. Scores in bowling tend to compound because consecutive strikes are worth more than individual, unconnected ones. The distribution of a typical bowler's scores are right or positively skewed, meaning that there are more unusually high scores than unusually low ones. (For extremely good bowlers, the opposite is true because they will typically play close to perfection).

So do Coryat scores follow a similar set of patterns? Consider the histograms in the right-half of Figures 8-10. Player A is very strong, and their scores exhibit the negative correlation that we would expect of someone consistently at their peak. Player F's scores are approximately symmetric about 15000. My scores are positively skewed, implying that I'm consistently bad, but occasionally get lucky.

Figure 8


Figure 9

Figure 10


In this link, I have included the updated are code necessary to do these analysis with your own at home scores, as well as a sample dataset of the scores of a couple people that gave my explicit permission to share their scores.

I would be thrilled to have more data from more players, in order to further analyze the at-home experience of Jeopardy. I could use this data to further improve questions posed in this post, as well as answer queries from other players.

In the future, I would like to compare the difficulties of Jeopardy! vs Double Jeopardy!, and to see how dependent a typical score is on a few categories which could be answered with data of enough volume and resolution. Please feel free to add your own analysis questions in the comment section, or to my email (jackd@sfu.ca or Twitter @jack_davis_sfu ).

Thanks for reading!

* I did improve from 16/50 to 30/50 on the annual online test, but that could mostly be attributed to bad luck on the first test and good luck on the second.

**Specifically I nail the science questions do reasonably well on the academic questions, but I'm left silent when it comes to Americana and Opera.