Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Wednesday, 10 June 2020

Transient Goal Differentials: Are teams better when behind?

In the 2014-15 season, the Vancouver Canucks were a big orca in a small tank, and they could coast into the playoffs.

Some writer that I couldn't find implied that the Canucks were playing a meta game of "how badly we start a game and still win?". That is, how late can they leave a one, or even a two-goal deficit on the board and win?

The data can't tell us WHY something like this would happen, but it could tell us IF it was happening.
For this, we examine the transient goal differentials (TGDs), the number of goals ahead or behind a team is at any given time in regulation play (Overtime always has a TGD of zero for both teams, and shootout is a completely different game.). For a single game it would look something like this:

Figure 1: Home team ahead-by over time for two select games

In the first game, the Washington Capitals take an early 2-goal lead and only build upon it from there, eventually finishing 7 goals ahead of the Bruins.

In the second game, the St. Louis Blues gain, and then quickly lose, a 2-goal lead against the Chicago Blackhawks. After some back-and-forth, the game ends up tied after 60 minutes and goes to overtime.

These graphs are taken at a resolution of 1 minute for convenience, but this could be easily changed if it would be worthwhile. One drawback of this low level of time-resolution is that it misses events like offsetting goals that happen in the same minute.

Aggregating the goal differentials for all home teams for the 2016-17, 17-18, and 18-19 seasons, we see this:

Figure 2: Home team advantage accumulating over time

This is the average goals worth of advantage that the home team has. Over the course of 60 minutes, it adds up to about 0.3 goals, which we see as a 55-56% win probability for the home team. The rate that this advantage accumulates slows to almost nothing in the second period. Given the striking difference between the second period and the other two, we may be on to something.

A word of caution here about reading into period-by-period differences. In the 2016-17 season, it looked like the home-team advantage didn't take hold until after the first period. In the 2017-18 season, it looked like the advantage didn't apply during the second period when the players' benches are closer to their opponent's nets than their own nets. In the 2018-19, the home team advantage appeared consistent over the course of the game. The takeaway message is that there is a lot of variation at play here, and that even striking patterns should be verified.

Figure 3: Average amount each time is ahead or behind

This is the average transient goal differential of each team in the 2018-19 regular season. Here, each team has 41 home games and 41 away games, so home-team effects should disappear. Nobody really stands out except the Tampa Bay Lightning at the top, which averaged more than +1 goal per game, including the ones in which they lost.

Without normalization, this just shows that some teams are better than others. We are more interested in whether some teams are better than their usual selves at different times. To find this, we subtract a constant rate from each team. Any teams with upwards slopes are better than usual during that time and are worse than usual with downward slopes. Curves that are far above or below zero indicate times when a team's accumulated over- or under-performance has placed it far above or below their whole-game average. In Figure 4, we see this for three teams.

Figure 4: Times when teams are doing better or worse than their normal

"Thing about Arsenal is they always try to walk it in." - The I.T. Crowd   ( https://www.youtube.com/watch?v=6yN2H3--1aw )

Except in this case, Washington is trying to walk it in. After 35 minutes, the Caps' goal difference is 0.4 goals better than it should be at the 35-minute mark. It's a constant slope down from there.

By contrast, the Flyers take a few extra minutes to get ready (or let whatever Gritty is sneaking them take hold), and the Flames really go hard in the last 15 minutes, making up almost half a goal per game of under-performance.

It's still possible that none of this is outside the realm of random variation, but it's large enough to pay attention to in the future.

Finally, let's look at the absolute value of the goal differences over time. The grey lines in Figure 5 are individual teams and the red line in the middle is the mean of all teams. Unlike all the other graphs, we don't expect these to follow a straight line. Theory and simulation both show that under a constant scoring rate by equally good teams that the mean absolute difference should follow a square-root curve, which is almost exactly what we see in the mean.

There are differences between teams which contribute a linear component to this curve, but the inherit randomness of the game dominates, so we see something very close to the equal teams' simulation.

The last 2-3 minutes for each team have a sharp uptick in absolute goal differential, which is just the manifestation of the empty net. A team that is behind by 1 or 2 goals will frequently pull their goalie to put an extra skater on the ice.

While this high-variance strategy increases the losing team's chance of making the game tied, it also dramatically increases their chance of losing by even more - this results in an average increase in difference between scores.

Figure 5: Mean absolute difference in scores over time

Teams that are diverge below the red curve might be ones that tend to equalize. That is, they overperform when they are behind and overperform when they are behind. Likewise, teams diverging above the red curve might tend to 'snowball'; that is a small difference in goals has an above-average tendency to accumulate into a much larger difference, like a snowball rolling down a hill. An alternate explanation is that very strong and very weak teams are just more prone to blowouts than usual.

As with Figure 3, we can normalize this by subtracting out some moving average. In this case, we opt to subtract out the average curve, which produces Figure 6. Note that we remove the last 3 minutes of play from this graph. Figure 6 shows three teams with late-game equalizing tendencies: The NY Rangers, the Chicago Blackhawks, and the Detroit Red Wings. Original sixers love a close game.

Figure 6: Teams' individual snowball (positive) or rubber band (negative) factors

 Here's some unqualified speculation as to why goal differentials might be good early and bad late in the game:

- The coach wants to keep his stars rested for the playoffs, or to minimize injuries.

- The coach is focused on developing his second and third lines, at least until the game is on the line. (Making the playoffs is a lot more important than playoff seeding)

- The team has more collective depth or stamina than their opponent.

- The team's goalie fatigues slower than the opposing goalie (More on goalie fatigue here )


This work was part of my post doctoral fellowship funded by Simon Fraser University, MITACS, and  Sportlogiq.

Finally, a shout-out to the Black Girl Hockey Club for their work towards diversity in hockey. Diversity leads to more players and fans, which leads to a better game and product for everyone.

No comments:

Post a comment