Tag Archives: using regression in baseball stats

Sabermetrics: Cabrera vs. Trout, Round 2

Last week, I entered the fray on the Mike Trout versus Miguel Cabrera AL MVP debate. It’s similar to the 2010 AL Cy Young discussion — Felix Hernandez led the AL in strikeouts and ERA but managed just a 13-12 record because Seattle couldn’t score. The new era of baseball stats won out. Voters ignored wins, which have little to do with pitching quality, and Hernandez won the award.

Likewise, Trout lags Cabrera in highly publicized but somewhat meaningless  stats (RBI, Triple Crown). Some saber-men would have you believe that Trout laps Cabrera in the only stats that matter (WAR over 10 compared to 7 for Cabrera), but that requires a level of trust that I don’t have. WAR — Wins Above Replacement — is complicated to the point of complete confusion. Cabrera contributed more in some categories (doubles, homers, total bases, batting average) but less in others (triples, baserunning, defense). Is WAR capturing these contributions accurately?

True Runs Revised (A WAR Replacement)

Rather than critique WAR (which would take days), I developed a new, simpler stat: True Runs. True Runs (named in honor of my True Wins football statistic) estimates a player’s contribution to his team based only on simple statistics. I got some good comments on the methodology, and what better time to revise it than now, while listening to MVP chants ring out at Comerica Park in Detroit.

Per DRDR’s comment, I included outs/reached on error in the revised methodology:

  1. Using data since 1990, regress total runs scored by each team each season on total singles, doubles, triples, homers, walks, hit by pitches, usual outs/reached on error, strikeouts, double plays, stolen bases, and caught stealing in that season
  2. Take the coefficients from this regression, multiply them by each individual’s stats, and add up the result

Intuitively, the regression finds the best way to add up all these stats to most closely approximate total runs scored across all teams in all years. The result: True Runs now captures the four basic things a hitter can do at the plate — walk, get a hit, make an out/reach on an error, strikeout — as well as steals. The regression coefficients approximate how many runs each of these actions is worth, on average.*

Here’s the top 10 for 2012 across both leagues Continue reading

Advertisements