Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Monday 29 May 2017

Detecting the Effects of Goalie Fatigue

This is a much-abridged version of a research paper of mine, currently in submission.

Do goalies fatigue? More specifically, does their performance worsen as their workload accumulates?

By examining the play-by-play data of the NHL's real-time event tracker, I can find the outcome and details of any individual shot attempt, including the number of shots or shot attempts that a given goalie has withstood up to this point. I aggregated the results from these shots to estimate the save percentage of starting goalies against their 1st, 2nd, ... , up to 50th shot of the game.
For example, of the 7822 game-sides with that had a 10th shot on net against either team's starting goalie, 694 scored goals, so the estimated save percentage is 1 – (694/7822), or 0.9113. These simple estimates are shown in the solid black line of the first figure. I also apply a smoothed fit, weighted by sample size, which is shown in the red dashed line.

It appears that save percentage starts high, reaches a minimum between the 20th and 25th shots faced before improving slightly. Using only the number of previous shots as a guide, each of these later shots is 20-30% more likely to score a goal than the first shot (9% chance instead of a 6.5-7% chance) . A goalie's endurance limitations is but one of many possible explanations for this phenomenon.

The same pattern appears when considering previous shot attempts faced, rather than just shots. The next figure shows the same analysis, but including missed and blocked shots in the 'previous shots' count.

In the first 60 minutes, a starting goalie faces an average of about 28 or 29 shots, so anything beyond the 30th shot may indicate poor defensive skaters, or lots of low quality shots, or simply lots of penalties against that goalie's team.
The next thing I looked at was the shot quality of the 1st, 2nd, ... , 50th shot each team makes. Although there are many factors that go into the quality of a shot, king among these is distance to the net. This next figure shows the relationship between the save percentage and distance from the net to the shot. The solid black line represents the simple estimator of save percentage of shots from a given distance, binned into distance groups of five feet. The dashed red line represents a smoothed, weighted fit. The chance of a shot becoming a goal diminishes linearly until about 60 feet from the net, after which distance matters much less*.

From this model, I find the predicted save percentage for each shot, based on its distance. Then I took the mean of the predicted save percentages for each shot number to get an estimate of the save percentage for each save number based solely on shot distance. If shot distance, a surrogate for shot quality, can explain the differences in save percentage by shot number, I would expect the estimate based on distance, the red line in this next figure, to follow the same pattern as the estimate based on shot number, the black line. Although the expected save chance is biased upwards, likely due to the pattern of the missingness, the disjoint between distance-based shot quality and the number of previous shots is clear. The flat red line implies that shot quality stays the same across shot number, at least when averaged across many games.

Is this phenomenon universal to all starting NHL goalies, or are some able to hang onto their peak skill longer?
We also stratified by global goalie skill and fit the curve to each tier. For the games observed, I found the number of saves and goals allowed for each starting goalie. I removed those goalies that faced fewer than 400 shots during the observed time, leaving 123 goalies. These 123 goalies were split into 3 tiers of equal size, according to their save percentages during the observed games. As before, I used a simple estimator and weighted smoothed fit of save percentage against the 1st, 2nd, ... , 50th shot. This time, analysis is done separately for each tier in order to estimate save percentage of the best third of goalies, the middle third, and worst third. 
The results are shown in this last figure. The simple estimators of any given shot are a mess of overlaps, so they're faded to grey. Instead, consider the red curves for each tier. Aside from the differences after the 40th shot, which could easily be the result of instability due to data sparsity, all three regression-based predictions follow the same fundamental pattern. Better goalies appear to reach their worst performance earlier, and the difference between their initial and worst performances appears smaller. This suggests that what separates elite goalies from the rest is not just their high level of performance, but the consistency of that high level through a game.

*About 5% of the shot locations are missing from the data, and that a disproportionate number of missing locations were from shots that were likely near to the goal, such as wraparounds (the type of shot is still recorded, even when the location is missing).

This post strongly inspired by Rob Vollman's Hockey Abstract, 2014 edition. Specifically question 1 in 'Goaltending Q&A', 'how are goalies affected by workload'.