Statistics et al.: I read this: Smart Baseball

Smart Baseball, by Keith Law, exposes many common baseball statistics like batting average for the garbage they are. He does this in an approachable, narrative style that will appeal to long-time baseball fans. It's very light on statistics, but Law obviously knows his stuff; he just isn't interested in showing off the details.

The book is organized into three parts: Smrt Baseball (That's not a typo, it's an homage to Homer Simpson), Smart Baseball, and Smarter Baseball. In the first part, he tears down some traditional, misleading statistics. In the second part, he explains the statistics that are working, both new and traditional, and shows relative strengths and weakness. In third part, he goes into detail about the scouting process, hall-of-fame elections, and some predictions about the future.

Here are some points he made that stood out.

On WAR (Wins Above Replacement): WAR is a measure that describes a player's total contribution to a team's success over and above a typical replacement AAA league player. It's meant to try and get an all-encompassing value of how good a player is that can be compared between years and positions.

WAR measures are central to modern baseball analytics / sabermetrics, but it's easy to misunderstand, so Law has this warning: "WAR is a construct, not a statistic". The word "construct" is good here, but "parameter" is better.

Wins above replacement is a measure of a player's true value over a year, but that true value is unknowable. That makes it a parameter. It's estimated by many different formulas; that makes these estimates statistics in both the scientific and common definitions. The two more popular estimates of WAR are fWAR and bWAR, used by Fangraphs and Baseball Reference, respectively. They aren't the only estimates you could use of WAR, but they're pretty good.

On Pitch Framing: Pitch Framing is a strategy that takes advantage of the necessarily imprecise determination of called balls and strikes by umpires. Because of the position of the home plate umpire, he cannot perfectly determine where a ball was when it crossed the plate. He can, however, use the location in which the catcher caught the ball as a reference.

Catchers can 'frame' a pitch to make a borderline pitch to look like it was a strike when it should have been a ball. Some catchers are better than others at this in a measurable fashion, especially since the advent of PitchFx, which records pitch location.

Law doesn't think highly of 'pitch framing', and refers to it as 'stealing strikes'. It's not ideal, but it's a part of the game. It's a strategy to deceive the judges of the game in order to get an advantage. To me, this is akin to diving and faking injuries in football/soccer. Except, in baseball, pitch framing is considered a necessary catcher skill and not a blight on the game like diving.

The author presents Jose Molina as a case study on pitch framing. When he played for the Toronto Blue Jays, he was an "awful hitter".* Toronto traded him to Tampa Bay, where his pitch framing was so good that he was saving his team an average of about 0.30 runs EVERY GAME. I had no idea.

On the "Clutch": If a player is described as a "clutch" player, then supposedly they perform well under high leverage situations like "tied in the ninth inning" where the probability of each team winning can change a lot based on the outcome of that situation. Law argues that any ability to perform better in high leverage situations is too small or too inconsistent to matter. Players that were the best or worst in the league in clutch situations one year quickly regress to the mean the next year, implying that any such differences don't reflect the individual players much.

One caveat that's given is that pitchers (and supposedly fielders) do worse when there is a runner on base. This has to do with the added burdens of avoiding committing a balk and of defending against stolen bases. Unlike general clutch performance, this effect is consistent from year to year and team to team.

It's not mentioned in the book, but this is also good justification for the context-free nature of fWAR and bWAR. Both statistics actually estimate the number of runs above replacement that a player produced, and then divide by the number of total runs that happen in a baseball game (about 8-10, but it changes year-to-year).

* From watching Jose Molina on TV, it wasn't the hitting so much as the running that was awful - when he hit a home run, the TV would run commercials until he had at least rounded 3rd. It turns out he never rushed because he was a wizard, and a wizard is never late.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science

Saturday, 26 January 2019

I read this: Smart Baseball

No comments:

Post a Comment