It seems like everywhere you turn in sports statistics, someone is talking about Pythagorean expectations. The Pythagorean expectation is a formula created by Bill James to estimate team quality in baseball, with runs scored and runs allowed as inputs. Proponents argue that luck plays a big part in close games, making point totals a better measure of team quality than wins. The formula is
Pythagorean = Runs scored^c / (Runs scored^c + Runs allowed^c),
where c is some exponent (usually greater than one) that can be calibrated. This kinda looks like the old Pythagorean formula from grade school (hence the name), though not really. The Pythagorean rises (at a decreasing rate) as you score more and drops (at a decreasing rate) as your opponents score more. In other words, the Pythagorean rewards teams for blowouts and punishes them for getting spanked, since these scoring outcomes may reflect team quality. If you score the same amount as your opponents, your Pythagorean is 0.5. The max is one and the min is zero, like a winning percentage. “Pythagorean wins” are given by the Pythagorean multiplied by the number of games.
Over time, this formula has been exported to basketball and football, and probably other sports. My buddy Adrian the Canadian sent me this week’s DVOA update at Football Outsiders, in which Aaron Schatz informs us that Football Outsiders has upgraded from the Pythagorean expectation to the “Pythagenport,” and Baseball Prospectus (the original Pythagenporters) has moved on to the Pythagenpat! These are identical to the Pythagorean, but allow the exponent c to depend on the number of runs scored and/or the teams involved.
Maybe this sounds reasonable to you, but these formulas make my head hurt. The Pythagenport even gets a logarithm involved. Where do these crazy functional forms come from? The intuition for these stats is simple: luck wins close games, not skill. How about a simple formula to go with this simple intuition:
True Wins = Blowout wins + Close wins/2 + Close losses/2 + Ties/2
“True Wins” says just what we want. You get full credit for blowouts, but only half credit for eking out close wins, since luck could have swung those either way. For the same reason, I toss in half credit for your close losses. In a probability sense, I’m assuming your expected value for wins in close games is half the total number of close games. The only work left is to calibrate the cutoff for close games versus blowouts. Since it’s football season, let’s consider the NFL. The obvious options are to define close games as those decided by 3 points or less or 7 points or less.
Here is Football Outsiders’ table for this season (sorted by Pythagorean wins) with a column added for True Wins with 7 as the cutoff:
Here it is again with 3 as the cutoff:
First off, the Pythags are nearly identical. True Wins differs primarily because it does not reward/punish teams for massive blowouts (see: Saints 62, Colts 7), and I don’t think that it should. Once a game gets out of hand, teams probably change their effort level. The second table also verifies my belief that the Cowboys are garbage, but one observation and my personal opinions aren’t enough to measure a stat’s worth. In comparing the Pythagenport to the Pythagorean, Schatz states:
The improvement is slight. The correlation between Pythagorean wins and actual wins for 1990-2010 is .9120. The correlation between Pythagenport and actual wins for 1990-2010 is .9134.
Hmm. By this measure, I can think of a stat that would trounce these Pythags — it’s called “actual wins,” and its correlation with actual wins is one. The whole point of constructing these measures is to filter out the luck that drives actual wins, which will reduce the correlation. Whenever I read about exponent choice for the Pythagorean, or new permutations of the Pythagorean, there’s never any objective analysis that drives the decision making. In fact, it’s not obvious how to evaluate these stats. They are supposed to measure “team quality,” but we don’t have an unbiased measure of team quality with which to compare.
One idea is to test if these measures are stable over time. If team quality is a consistent characteristic, then a good measure of team quality should also be consistent. Since 2002, the correlation between True Wins (3 point cutoff) over the first eight games of a season and True Wins over the second eight games is 0.50. The same correlation for the Pythagorean (c = 2.37) is 0.53. Between consecutive seasons, the correlation for True Wins is 0.33, and the correlation for the Pythagorean is 0.38. The numbers are pretty similar using 7 points as the cutoff. The Pythagorean does a little better, but I bet I could overcome the difference with some adjustments for garbage time scoring.
Next, let’s consider whether these stats predict actual future wins (GAMBLING ALERT!!!!!!!!!!!!). These correlations will never be one (since there is random, unpredictable luck in every game), but a stronger correlation means a more useful stat for team management and fans. The correlation between first half True Wins (3 point cutoff) and second half actual wins is 0.47. The same correlation for the Pythagorean is 0.50. However, the correlation between actual wins from first half to second half is 0.45. Between seasons, the correlations are 0.29 (True Wins to actual wins), 0.31 (Pythagorean to actual wins), and 0.30 (actual wins to actual wins). When thinking about future performance, teams can learn as much from their win total as from the formulas.
The Pythagorean has a great name and pedigree (Daryl Morey discovered 2.37 as the “best” NFL value), and it can be calculated from league standings tables on most websites. True Wins is more intuitive and nearly as powerful. But, this whole process is a bit misguided. The Pythagorean functional forms are ad hoc and complicated, and these stats ignore lots of useful information (for example, strength of schedule). I’m sure the “computer rankings” in the BCS or Football Outsider’s DVOA stat could do better by almost any measure. Simple stats are transparent, catchy, and easy to calculate, but if you want powerful stats, I think you need the big guns.