Edit: Please see my later post as well, which corrects an omission here.
Miguel Cabrera has a shot at the Triple Crown this year. No one has done it since Carl Yastrzemski. Is it really possible that he could win the Triple Crown and not win the MVP? Well, yes. Every advanced stats guy out there is trumpeting Mike Trout for MVP, with his “wins above replacement” (WAR) above 10 (next best in the majors is 6.8) and his 13 “total zone total fielding runs above average” (basically, this is the number of runs he has saved with his fielding, compared to an average fielder).
The discussion is eerily similar to the AL Cy Young conversation in 2010. Felix Hernandez won because he led the AL in innings pitched, ERA, and, most importantly, WAR, even though his win-loss record was a mediocre 13-12.
The 2010 Cy Young was a victory for sabermetricians. Pitchers can’t control how many runs their offense scores. All they can do is put up a low ERA and stick around for as many innings as possible. Strikeouts help too, since they reduce the risk of errors, and walks hurt, since fielders can’t do anything about a walk. There might be some cases where pitchers rise to the occasion in a close game to get a win, but for the most part, getting a “win” has little to do with pitcher skill after accounting for pitchers’ direct performance statistics.
2012 MVP: the Saber-Men After Party?
This time around, sabermetric thinking is stacked heavily against Cabrera (and the media is paying attention):
- RBIs are meaningless. After accounting for total bases and on base percentage in some way, RBIs have little to do with individual skill
- Cabrera LEADS THE AL IN DOUBLE PLAYS with 28, which is not captured by any traditional stat (granted, he has Austin Jackson’s high OBP in front of him, so he has lots of chances)
- Trout steals lots of bases and never gets caught (46 for 50 this year), which also isn’t captured by traditional metrics
- Cabrera is a poor fielder (10 runs worse than average at third base), Trout is a good fielder (mentioned above)
All these factors lead to Trout’s 10.4 to 6.7 WAR advantage over Cabrera. If voters take these numbers seriously, it seems that we’ll be looking at another win for the number crunchers.
But What is WAR Anyway?
Four extra wins is a lot and WAR is widely accepted as meaningful, but before I leap on the Trout-wagon, is WAR actually a good statistic? Here’s a snippet from Baseball Reference’s WAR explanation:
There is no one way to determine WAR. There are hundreds of steps to make this calculation, and dozens of places where reasonable people can disagree on the best way to implement a particular part of the framework.
Uh oh . . . hundreds of steps is never a good sign, especially when people can easily disagree on many of them. Increased complexity is not always better! Whether accurate or not, this is guaranteed to get you a stat that is (a) very hard to understand, (b) impossible to reverse engineer, and therefore (c) difficult to evaluate. Loosely, WAR tells you how many wins a player generates above what a “replacement player” would generate (i.e., a pretty good Triple A player that is readily available). It would add 500-1,000 words to this post to explain how it’s calculated, and I probably wouldn’t get all the details right because it’s nearly impossible to cover every assumption.
A New Saber: True Runs
I’ve been suspicious of WAR and other complex wins measures for awhile, and since I’m a Tigers fan, I figured this is as good a time as any to try an alternative. I’ll stick to the offensive side due to data constraints for now. I just want a measure of how many runs a player accounts for over the course of the season with his bat.* There will be none of this “replacement player” business, which introduces another parameter to be estimated (read: more obfuscation and error) with no clear improvement in determining the MVP.
Here’s the procedure:
- Using the last 20 years, regress total runs scored by each team each season on total singles, doubles, triples, homers, stolen bases, caught stealing, walks, strikeouts, double plays, and hit by pitches in that season
- Take the coefficients from this regression, multiply them by each individual’s stats, and add up the result
Two steps, that’s it. The team stats are readily available at Baseball Reference with 20 copy-pastes. I’ve made a few assumptions where “reasonable people can disagree,” so I’ll mention those:
- I included all outcomes of an at bat that are directly under a player’s control, with perhaps the exception of sacrifices (not very common, and many of these are probably accidents)
- I excluded all outcomes of an at bat that are outside a player’s control, with the exception of double plays, which are partially due to player skill (fleetness of foot, ground balls vs. balls in the air), but partially due to teammates and luck (who’s on base)**
- Due to data constraints, I don’t account for base running apart from stolen bases and caught stealing***
In honor of my True Wins football measure of team quality (which awards a half win for close wins and losses and a full win for blowout wins), let’s call the resulting measure “True Runs.” Like True Wins, True Runs is designed to be simple and understandable. True Runs has a regression in it, but the intuition is clear: the regression tells you how many runs are generated by each action on average, which I then apply to each individual’s stat line.
Who’s the Real MVP?
It’s the moment you’ve been waiting for . . . who’s the MVP according to True Runs? Here are the numbers for Cabrera and Trout, along with the regression coefficients (which I also find very interesting):
So, Cabrera has generated about 15 more runs on offense than Trout. It’s worth noting that Cabrera has 67 more plate appearances (Trout started this year in the minors, if you can believe it), giving them nearly identical 0.2 and 0.19 Total Runs per plate appearance. For MVP consideration, I think total True Runs is more important.
Cabrera has an advantage offensively, but I don’t want to ignore defense completely, because Trout is clearly better. For that, I’ll have to resort blindly to the Total Zone Total Fielding Runs Above Average mentioned at the outset. Cabrera is 10 runs worse than average third basemen and Trout is 13 runs better than average center fielders, so Trout seems to be 23 runs better on defense, overcoming his 14.6 True Runs deficit. However, these defensive measures are position-specific, and it’s unclear whether average runs attributed to center fielders and third basemen are the same or different. It seems likely that third basemen face more situations where fielding skill matters, though I have no hard evidence to back that up.
The long and short of it: it’s a close call. With defense (sloppily) included, Trout is about 8 runs better — less than one win using the usual approximation of about 10 runs per win. Maybe it wouldn’t be so bad if traditionalists got their way (I think they will if the Tigers make the playoffs and the Angels miss), and maybe the saber-men should reflect more on these incomprehensible stats before they use them to argue about the MVP.
A couple interesting notes from the table above:
- A single (0.38 True Runs) is worth more than a walk (0.35 True Runs) and a walk is worth more than a steal (0.16 True Runs) because singles and walks can advance existing baserunners.
- The benefit of a steal (0.16 True Runs) and the cost of a caught stealing (-0.23 True Runs) suggest that running will increase True Runs whenever the guy has at least a 59% success rate. This is for the average situation. It does not make sense to send a 60% guy with Cabrera or Trout at the plate.
Below, I calculated True Runs for 19 top AL players. I’m pretty confident I got the top 10, though after that there might be some others that I missed (I listed offensive WAR for comparison):
There are some curious differences here. A quick glance suggests that True Runs rewards homers more than oWAR (Adam Dunn!), but I’ll have to investigate further. Also, looking at Cabrera’s and Trout’s oWAR, it’s hard to figure how Trout ends up so far ahead in total WAR. Yes, Trout is somewhat better than average on defense, but is he 2.5 wins better? Cabrera is worse than average by a similar amount and his WAR is only 0.7 lower than his oWAR. I wish I could tell you what’s going on here, but I’ve given up trying to understand the WAR spider web.
*I could convert the runs stat to wins in some way. Most wins stats divide by 10 or do something more complex, but that’s more steps and more parameters. I can’t think of a good reason why this parameter should be different for different players (I don’t believe in “clutch hitting”), so it adds nothing but unnecessary complexity.
**I included double plays in part because Cabrera has so many. I don’t want to discount that since I’m a Tigers guy, even though there’s never anyone on for Trout’s first at bat.
***Trout is surely a better baserunner than Cabrera, so this will hurt Trout, but it’s very hard to quantify without additional data. Using more data might help accuracy, but it also makes things more complicated, which I’m trying to avoid.