Thursday, 9 August 2018

I read this: Visualizing Baseball

What Tim Swartz teaches statistics in sports, he does so through case studies. He will take a paper such as one about the optimal times in which to substitute players in soccer and use that as a platform to demonstrate analysis of soccer data. Visualizing Baseball by Jim Albert is a collection of nine case studies most of which would be excellent for a course like Tim Swartz's.

This is a book written by a statistician for the stats enthusiast about baseball. I say stats enthusiast and not statistician because little to no stats experience is required to understand the book. Likewise little expertise about baseball is assumed, and key concepts are explained clearly but dryly.





As the name suggests the big appeal to Visualizing Baseball is the graphs. My copy has a dozen dog-eared pages already for my favourites. This includes a graphical version of the run expectancy matrix, which shows the expected additional runs to be scored in the remainder of a half inning based on the runners on base and the number of outs. The graphs are all done with the ggplot2 R package and have a consistent visual style almost to a fault.

For example when multiple lines are tracked, they are differentiated only by their dash and dot pattern, which can be confusing.

Another case of a graph suffering for the sake of a consistent style is the breakdown of pitch types by a pitcher, which is represented as a dot on an x-axis. A bar graph would have been a better choice. Normally this wouldn't justify nitpicking, but the book IS about visualizations.

The choice and the order of the topics was an asset. The book starts with the very general topic of the different eras of baseball, such as the dead ball era before 1920, and the steroid era of the 1990's. It moves into career trajectories using the very general wins over replacement, or WAR stat.

From there, the analyses gets increasingly specific, moving gradually from fan appeal to deep analytics. The next chapter after career trajectories looks at the value in runs for a single/walk, double, triple, and out. The next chapter covers the runs value of a ball and of a strike, followed by chapters on pitch quality, plate discipline, within-game prediction, and finally streaks and clutch ability.

I enjoyed this book. I want to emphasize that before I discuss these next two weaknesses.

First, the prose is academic and clinical. As far as academic writing it was delightful to read. Compared to general interest books, the writing was pretty flat. As a basis of comparison, the analytics book Hockey Abstract by Rob Vollman's was a lot more entertaining.

Second, the author ignores home team advantage completely. It would be one thing to write it off as minor, irrelevant, or factored out of certain analysis, but it isn't even mentioned.

But there is a home field advantage in baseball. The home team had a 1312 - 1118 record in the 2017 regular season (54.0% for the home team, p = 0.000038 1-tailed vs 50%).

In the chapter of visualizing baseball that looks at predicting games as they are happening, claims that when two teams are evenly matched that the home team will win 50% of the time. This is only true if the home team and visiting team are evenly matched after factoring in the home team advantage.

This claim comes right after the chapter on fielding, which demonstrates, among other things, that there are large differences between certain baseball fields. The differences, are not just related to the locations of the field, which also matter because differences in temperature and air pressure affect wind resistance and therefore ball trajectories. There are also differences in the topology of baseball fields, which is unique among top-tier sports.

For example the right hand side wall from the perspective of the batter is closer than usual at Yankee Stadium where the New York Yankees play. This has a demonstrated and substantial beneficial effect on left-handed batters. Likewise, division rivals Boston Red Sox play at Fenway Park, where the left hand side wall is much closer than usual, which benefits right-handed batters. Considering that there are park differences, it stands to reason that a team would try to leverage these differences to their advantages in team selection. As such one might expect a team to be best adapted to their on field, thus creating a home field advantage.

Even if there were no park differences, a home team advantage still exists because the home team bats second, which confers an informational advantage.

Ultimately though I learned a lot from Visualizing Baseball, and that's impressive given that the book is only 142 pages. I'm glad I spent the time to read it.