Category Archives: Sports Stats

Michigan: Destined for an early exit?

Posted on February 16, 2013 | 3 comments

Michigan is my favorite college basketball team, and for the first time in awhile, they are threatening to make a deep tournament run. However, they just lost three of four during a tough stretch against Indiana (L, away), Ohio State (W, home), Wisconsin (W, away), and Michigan State (W, away). I’m not writing them off — they only lost the away games — but some bad signs appeared in these games. Here’s the Game Stack for all four combined:

Michigan looks good on turnovers, but that comes at a cost — they get crushed on free throws and two point percentage. Having watched the games, I can connect the dots for you: the Wolverines don’t drive to the hoop much against good teams. They have some great shooters who can get reasonably open (Trey Burke, Tim Hardaway, Jr.) who are happy to “settle” for jumpers.

This keeps the ball out of danger in the lane (low turnovers), but it means that Michigan never gets to the line and shoots a lower percentage on twos as well. Michigan also rebounds a lower percentage of their own misses than the opponent, which could be related — a lot of “second chances” are just put-backs after a shot close to the hoop.

So, is Michigan sunk? We’ll see. I have some faith that Mitch McGary can improve and find some high percentage twos down low, but right now, Michigan is probably not efficient enough offensively and not good enough on the boards to compete with the best teams in the country. I would worry less about four games if the problem was just poor shooting in a small sample, but the problem seems to be about playing style against good defenses. I don’t think that’s going to change.

If you’re interested, here are the Game Stacks for all four games. The trends I discussed are pretty consistent across the games:

3 Comments

Posted in Basketball, College Sports, Commentary, Sports Stats

Tagged basketball, basketball alternative box score, basketball analysis, Big Ten Conference, box score, college basketball, game stacks, Indiana, Michigan, Michigan basketball, Michigan basketball analysis, Michigan free throws, Michigan Indiana, Michigan Michigan St, Michigan Ohio State, Michigan Wisconsin, Michigan Wolverines, Ohio State University, Sports, sports column chart, sports data graphs, sports data visualization, sports stacked bar chart, Tim Hardaway, Tim Hardaway Jr., Trey Burke, U of M, visual box score, visualization, Wisconsin, Wolverine

Basketball Stacks part 2: Rebounding

Posted on February 8, 2013 | 4 comments

Yesterday, I posted a new idea for visualizing box scores: Game Stacks. While the first version did a good job of showing shooting percentages and turnover rates, it didn’t do a good job on rebounds. As my pops pointed out, Indiana had a big rebounding advantage over Michigan by the basic numbers (36-22), so it seemed wrong to rely only on the height of the stacks to determine who rebounded better. The reality: Michigan got more chances not because they rebounded better, but because they had more misses — and you have to miss to get a second chance. The height of the stacks just showed that Michigan got more offensive rebounds, even though their rebounding rate was terrible.

So, round two. Here’s the Michigan-Indiana Game Stack redesigned to capture rebounding:

Without play by play data, I had to keep the rebounding simple — I figured out the offensive rebound rate for each team:

Off reb rate = your off rebs/(their def rebs + your off rebs).

Then, I multiplied this rate by the relevant number of shots to generate the “Missed (O Reb)” category for each type of shot (the dashed regions). Each dashed/empty combo now visualizes the offensive rebound rate for the relevant team.

Now the picture is clearer:

Michigan got a couple extra chances, but Continue reading →

4 Comments

Posted in Basketball, College Sports, Sports Stats

Tagged basketball, basketball graphic, Boston, Boston Celtics, box score, Celtics, Celtics offensive rebounding, Clippers, college hoops, defensive breakdowns, Dick Vitale, Free throw, Game Stack, game stacks, Golden State Warriors, graphical statistics, graphics sports, Hoosier, Houston Rockets, Indiana, Indiana basketball, indiana game, Lakers, lakers pistons, Los Angeles Lakers, Michigan, Michigan basketball, NBA, nba game, offensive rebound, Pistons, point attempts, Rebound (basketball), rebounding advantage, Rockets, Rockets 23 three pointers, Rockets three pointers, shooting percentages, shot attempts, Sports, sports statistics, Three-point field goal, turnover rates, visual shooting percentages, visual statistics, visualization, visualizing basketball games, Warriors, Wolverines

Visualization: Basketball Game Stacks

Posted on February 5, 2013 | 4 comments

Note: On my dad’s advice, I posted another version of the Game Stacks that depicts rebounding rates, rather than just total offensive rebounds. The discussion in this post is a little naive on that point — the new version yields a better analysis of rebounding.

I have a general hang up when looking at the box score for basketball (or listening to announcers list off statistics). I see some rebounding numbers, but I can’t tell who rebounded better without offensive and defensive breakdowns, plus the number of shots missed by each team. And I see shooting percentages and shot attempts, but it’s hard to put it all together into how a team got its points.

I realized that what I really want to see is not complicated. Here’s the list:

What each team did with their scoring chances:
- Two point attempts
- Three point attempts
- Free throw trips (2 attempts)
- Turnovers
Efficiency on each type of shot
Rebounding advantage in terms of extra scoring chances
And, of course, total score

All these stats exist, but there should be an easy way to see all of it at once and get a sense for how the game was won. Here’s my first try, the Game Stack:

The picture shows total “plays,” or chances to score, for each team, and total points, broken down by type. In a quick glance, you can see that Indiana was out-rebounded (Michigan got three more chances to score) and turned the ball over a ton. However, on just over 60 non-turnover plays, the Hoosiers Continue reading →

4 Comments

Posted in Basketball, College Sports, Innovative Ideas, Sports Stats

Tagged basketball, basketball graphic, box score, Celtics, Celtics offensive rebounding, Clippers, college hoops, defensive breakdowns, Dick Vitale, Free throw, Game Stack, graphical statistics, graphics sports, Hoosier, Indiana, Indiana basketball, Lakers, Michigan, Michigan basketball, NBA, nba game, Pistons, point attempts, Rebound (basketball), rebounding advantage, shooting percentages, shot attempts, Sports, sports statistics, Three-point field goal, visual shooting percentages, visual statistics, visualization, visualizing basketball games, Wolverines

Playoff Appetizer: True Wins Plus (Fumble Adjusted)

Posted on January 5, 2013 | 6 comments

We might be halfway through the first quarter of the first NFL playoff game of 2013, but I’m still finishing up with baseball and just getting warmed up on football. Football month on the blog officially kicks off today — there’s lots of interest stuff to come, from innovative rule ideas and play calling to new prediction methods and game analysis. Today, I’m trying an addition to the measure of NFL team quality that I debuted last year: True Wins. True Wins are calculated as follows:

True Win = Blowout Wins + Close Wins/2 + Close Losses/2 + Ties/2

You may recognize the intuition from pythagorean expectations — you get full credit for blowout wins (I define this as more than 7 points), but no extra credit for winning by huge margins, and you get half credit for all close games, since those probably come down to luck more than skill. Last year, I showed that True Wins predicts a little better than pythagoreans, and it’s a whole lot more direct. Both measures are much better than using wins alone, which unfairly penalize (reward) teams that lose (win) a lot of close games.

What Else is Luck-Driven? Fumble Recoveries?

With the playoffs coming right up, I decided to try an improvement that adjusts for possible luck in fumble recoveries as well. Here’s the logic (from Football Outsiders):

Stripping the ball is a skill. Holding onto the ball is a skill. Pouncing on the ball as it is bouncing all over the place is not a skill. There is no correlation whatsoever between the percentage of fumbles recovered by a team in one year and the percentage they recover in the next year. The odds of recovery are based solely on the type of play involved, not the teams or any of their players . . . Fumble recovery is a major reason why the general public overestimates or underestimates certain teams. Fumbles are huge, turning-point plays that dramatically impact wins and losses in the past, while fumble recovery percentage says absolutely nothing about a team’s chances of winning games in the future. With this in mind, Football Outsiders stats treat all fumbles as equal, penalizing them based on the likelihood of each type of fumble (run, pass, sack, etc.) being recovered by the defense.

The keys are:

Fumbles are huge turning points in games
Teams don’t maintain high or low recovery rates over time

To quantify #1, I determined the point value of a recovery. A simple regression of point differential in each game on total fumbles and fumbles Continue reading →

6 Comments

Posted in Football, Prediction, Sports Stats

Tagged Andrew Luck, Cincinnati Bengals, Football Outsiders, Football Outsiders fumbles, fumble, fumble recoveries, fumble recovery, Green Bay Packers, Houston Texans, Indianapolis Colts, luck, luck football, luckiest NFL teams 2012, lucky, lucky teams, Minnesota Vikings, National Football League, NFL, nfl playoff game, NFL playoffs, NFL prediction, playoff predictions, playoffs, randomness, randomness sports, Seattle Seahawks, Sports, Super Bowl, True Wins, True Wins Plus, Washington Redskins

Sabermetrics: Cabrera vs. Trout, Round 2

Posted on October 6, 2012 | 6 comments

Last week, I entered the fray on the Mike Trout versus Miguel Cabrera AL MVP debate. It’s similar to the 2010 AL Cy Young discussion — Felix Hernandez led the AL in strikeouts and ERA but managed just a 13-12 record because Seattle couldn’t score. The new era of baseball stats won out. Voters ignored wins, which have little to do with pitching quality, and Hernandez won the award.

Likewise, Trout lags Cabrera in highly publicized but somewhat meaningless stats (RBI, Triple Crown). Some saber-men would have you believe that Trout laps Cabrera in the only stats that matter (WAR over 10 compared to 7 for Cabrera), but that requires a level of trust that I don’t have. WAR — Wins Above Replacement — is complicated to the point of complete confusion. Cabrera contributed more in some categories (doubles, homers, total bases, batting average) but less in others (triples, baserunning, defense). Is WAR capturing these contributions accurately?

True Runs Revised (A WAR Replacement)

Rather than critique WAR (which would take days), I developed a new, simpler stat: True Runs. True Runs (named in honor of my True Wins football statistic) estimates a player’s contribution to his team based only on simple statistics. I got some good comments on the methodology, and what better time to revise it than now, while listening to MVP chants ring out at Comerica Park in Detroit.

Per DRDR’s comment, I included outs/reached on error in the revised methodology:

Using data since 1990, regress total runs scored by each team each season on total singles, doubles, triples, homers, walks, hit by pitches, usual outs/reached on error, strikeouts, double plays, stolen bases, and caught stealing in that season
Take the coefficients from this regression, multiply them by each individual’s stats, and add up the result

Intuitively, the regression finds the best way to add up all these stats to most closely approximate total runs scored across all teams in all years. The result: True Runs now captures the four basic things a hitter can do at the plate — walk, get a hit, make an out/reach on an error, strikeout — as well as steals. The regression coefficients approximate how many runs each of these actions is worth, on average.*

Here’s the top 10 for 2012 across both leagues Continue reading →

6 Comments

Posted in Baseball, Commentary, Common Sense, Sports Stats

Tagged advanced baseball statistics, American League MVP, Angels playoffs, baseball, baseball statistics, baseball stats regression, Cabrera, Cabrera defense, Cabrera overrated, Cabrera Trout comparison, Cabrera Trout defensive statistics, Cabrera underrated, Cabrera vs Trout, Cabrera WAR, Carl Yastrzemski, comerica, Comerica Park, Cy Young, Cy Young Award, Cy Young Award 2010, Cy Young Felix Hernandez, Detroit Tigers, double plays, felix hernandez, Felix Hernandez WAR, Los Angeles Angels, Miguel Cabrera, Miguel Cabrera Triple Crown, Mike Trout, Most Valuable Player, MVP advanced statistics, offense scores, oWAR, performance statistics, R squared, regression analysis, regression baseball, regression coefficient, regression sports, replacement player, Run batted in, Sabermetrics, sabermetrics Mike Trout, simple sabermetrics, simplified stats, Sports, strikeout, Tigers playoffs, Total Zone Fielding Runs, Total Zone Runs, traditional metrics, Triple Crown, Trout, Trout defense, Trout overrated, Trout underrated, Trout WAR, using regression in baseball stats, WAR, WAR complicated, WAR confusing, WAR definition, WAR explanation, WAR formula, WAR MVP, who should win the AL MVP, Who's better Cabrera or Trout, Wins above replacement

Is Joe Flacco Elite? Barnwell strikes again!

Posted on October 2, 2012 | 1 comment

Bill Barnwell is up to his usual tricks at Grantland. This time, he’s tired of hearing that Flacco is an elite quarterback and wants a new measure of quarterback value. Flacco gets credit for piling up wins, which Barnwell thinks is unfair:

For whatever good or bad Flacco provides, he has spent his entire career as the starting quarterback of the Baltimore Ravens, who perennially possess one of the league’s best defenses. He also has Ray Rice and a solid running game to go alongside him on offense. It’s safe to say that a win by, say, Cam Newton usually requires more work from the quarterback than one by Flacco.

I agree with this wholeheartedly. In response, Barnwell tries to capture quarterback value by creating an “expected wins” measure based on points allowed by the defense and comparing this to actual wins. He argues that a quarterback with more actual wins than expected wins is doing well because he is scoring more points than average.

An example helps explain the concept. First, Barnwell notes that teams have won 86.5% of games recently when allowing between 8 and 12 points. Imagine a team that allows between 8 and 12 points in all 16 games. They are expected to win 86.5% of those games, or 13.8 games. If the team won 14-16 games, Barnwell would argue that the quarterback is doing better than average, while if the team won fewer, Barnwell would argue that the quarterback is doing worse.

As hoped, Flacco is unimpressive by this measure (while the usual suspects — Tom Brady and Peyton Manning — are top dogs). He has 44 wins in 64 regular season games, but because the Ravens D is so good, an average QB would have managed about 42.

Before going farther, I’ll warn you: these numbers are pretty meaningless. I’ll start by explaining Continue reading →

1 Comment

Posted in Baseball, Commentary, Common Sense, Football, Sports Stats

Tagged Baltimore Ravens, Barnwell evaluating quarterbacks, Barnwell football stats, Barnwell NFL, baseball, basketball, Bill Barnwell, Bill Barnwell expected wins, Bill Barnwell football, Bill Barnwell stat, Cam Newton, Chicago Bears, Dallas Cowboys, Flacco, Flacco overrated, Flacco underrated, football, Geno Smith, Grantland, Joe Flacco, Matt Ryan, Monday Night Football, National Football League, NBA, NFL, Peyton Manning, Ray Rice, Romo five interceptions, Romo lost the game, tanking, Tony Romo, WAR

Cabrera Might Get the Triple Crown, but Does He Deserve the MVP?

Posted on September 25, 2012 | 22 comments

Edit: Please see my later post as well, which corrects an omission here.

Miguel Cabrera has a shot at the Triple Crown this year. No one has done it since Carl Yastrzemski. Is it really possible that he could win the Triple Crown and not win the MVP? Well, yes. Every advanced stats guy out there is trumpeting Mike Trout for MVP, with his “wins above replacement” (WAR) above 10 (next best in the majors is 6.8) and his 13 “total zone total fielding runs above average” (basically, this is the number of runs he has saved with his fielding, compared to an average fielder).

The discussion is eerily similar to the AL Cy Young conversation in 2010. Felix Hernandez won because he led the AL in innings pitched, ERA, and, most importantly, WAR, even though his win-loss record was a mediocre 13-12.

The 2010 Cy Young was a victory for sabermetricians. Pitchers can’t control how many runs their offense scores. All they can do is put up a low ERA and stick around for as many innings as possible. Strikeouts help too, since they reduce the risk of errors, and walks hurt, since fielders can’t do anything about a walk. There might be some cases where pitchers rise to the occasion in a close game to get a win, but for the most part, getting a “win” has little to do with pitcher skill after accounting for pitchers’ direct performance statistics.

2012 MVP: the Saber-Men After Party?

This time around, sabermetric thinking is stacked heavily against Cabrera (and the media is paying attention):

RBIs are meaningless. After accounting for total bases and on base percentage in some way, RBIs have little to do with individual skill
Cabrera LEADS THE AL IN DOUBLE PLAYS with 28, which is not captured by any traditional stat (granted, he has Austin Jackson’s high OBP in front of him, so he has lots of chances)
Trout steals lots of bases and never gets caught (46 for 50 this year), which also isn’t captured by traditional metrics
Cabrera is a poor fielder (10 runs worse than average at third base), Trout is a good fielder (mentioned above)

All these factors lead to Trout’s 10.4 to 6.7 WAR advantage over Cabrera. If voters take these numbers seriously, it seems that we’ll be looking at another win for the number crunchers.

But What is WAR Anyway?

Four extra wins is a lot and WAR is widely accepted as meaningful, but before I leap on the Trout-wagon, is WAR actually a good statistic? Here’s a snippet from Baseball Reference’s WAR explanation:

There is no one way to determine WAR. There are hundreds of steps to make this calculation, and dozens of places where reasonable people can disagree on the best way to implement a particular part of the framework.

Uh oh . . . hundreds of steps is never a good sign, Continue reading →

22 Comments

Posted in Baseball, Commentary, Common Sense, Sports Stats

Tagged advanced baseball statistics, American League MVP, Angels playoffs, baseball, baseball statistics, Cabrera, Cabrera defense, Cabrera overrated, Cabrera Trout comparison, Cabrera Trout defensive statistics, Cabrera underrated, Cabrera vs Trout, Cabrera WAR, Carl Yastrzemski, Cy Young, Cy Young Award, Cy Young Award 2010, Cy Young Felix Hernandez, Detroit Tigers, felix hernandez, Felix Hernandez WAR, Los Angeles Angels, Miguel Cabrera, Miguel Cabrera Triple Crown, Mike Trout, Most Valuable Player, MVP advanced statistics, offense scores, oWAR, performance statistics, replacement player, Run batted in, Sabermetrics, sabermetrics Mike Trout, simple sabermetrics, simplified stats, Sports, Tigers playoffs, Total Zone Fielding Runs, Total Zone Runs, traditional metrics, Triple Crown, Trout, Trout defense, Trout overrated, Trout underrated, Trout WAR, WAR, WAR complicated, WAR confusing, WAR definition, WAR explanation, WAR formula, WAR MVP, who should win the AL MVP, Who's better Cabrera or Trout, Wins above replacement

More NBA spatial data

Posted on May 24, 2012 | 2 comments

Adrian the Canadian — my designated Deadspin trawler — sent me an interesting graphic by Kirk Goldsberry and Matt Adams showing the highest percentage shooters from various regions of the court. You might recall that Goldsberry presented similar work at the Sloan Sports Analytics Conference in March (runner up for the research award). My take on this work is that, while interesting and impressive in terms of data, much of the spatial variation in shooting could be explained by factors other than location-specific shooting ability (this will sound familiar if you read my post yesterday on player tracking data).

First, random chance is an issue, especially when trying to identify the best shooters at each location. I think Goldsberry requires a certain number of shots for inclusion at each spot, but he doesn’t do the statistical analysis to determine whether the differences he presents are statistically significant (i.e., large enough such that they are probably not due to chance variation). His big surprise — Rondo leading the league in one mid-range zone — is likely based on a fairly small sample of shots.

Second, defensive position is missing from the analysis. A big red flag for this one is that Durant, at only 40% shooting, leads in the three point zone just to the shooter’s right at the top of the key. Every other three point zone has a guy over 50%. Unless there’s something challenging for right handers Continue reading →

2 Comments

Posted in Basketball, Causal Analysis, Common Sense, Sports Stats

Tagged Adrian the Canadian, basketball, best shooters in the NBA, causal analysis in sports, Deadspin, field goal percentage, field goal percentage bad statistic, field goal percentage by zone, field goal percentage misleading, Goldsberry, highest shooting percentages NBA, Kirk Goldsberry, Kobe, Matt Adams, National Basketball Association, NBA, pick and roll, playing tracking data, Ray Allen, shooting percentage graphic, shooting percentage NBA, shot clock, Sloan Sports Analytics Conference, spatial analysis NBA, spatial variation, Sports, sports statistics, statistics, three point shooting percentage, video tracking NBA

Sloan Sports research rundown

Posted on March 7, 2012 | 1 comment

Following on my general analysis of the Sloan Sports Analytics Conference, here’s a look at the research presentations (you’ll note: nothing on the sports side of football or soccer! I submitted one of each but they were rejected . . . ):

An Expected Goals Model for Evaluating NHL Teams and Players (Brian MacDonald)

This paper tries to predict future performance better by incorporating more measurable statistics than past models (goals, shots, blocked shots, missed shots, hits, faceoff %, etc.). His prediction tests show that he makes improvements, and at the team level, I think these results have some value. However, moving to the individual level in a sport like hockey (or basketball, football, soccer, or rugby) is hard because of complementarities between players. For example, trying to measure one player’s contribution to team wins or goal differential based on the number of shots they take is hopelessly confused with the actions of other players on the ice that affect the quality and number of these shots.

Another issue in the paper is that MacDonald controls for team level statistics (such as faceoff win percentage) in the individual level regressions, when in fact much of player value may be driven by these statistics. For example, one of Red Wing Pavel Datsyuk’s strengths is faceoff win percentage, while one of his weaknesses is hitting. The value that individuals bring through these variables is caught up in MacDonald’s team level control variables. Still, the team-level analysis is a reasonable way to improve what’s out there.

Big 2’s and Big 3’s: Analyzing How a Team’s Best Players Complement Each Other (Robert Ayer)

This paper categorizes the top three players on each team Continue reading →

1 Comment

Posted in Baseball, Basketball, Causal Analysis, College Sports, Hockey, Probability Analysis, Research Papers, Sports Stats

Tagged Allan Maymin Philip Maymin, An expected goals model, basketball, Big 2s and 3s, Brian MacDonald, Celtics, CourtVision, cumulative win probabilities NCAA basketball, deconstructing the rebound, Effort vs. Concentration, experience and winning NBA, free throw shooting under pressure, Gartheeban Ganeshapillai, Goldsberry, hockey, James Tarlow, John Guttag, Justin Rao, Kirk Goldsberry, machine learning, Mark Bashuk, Matt Goldman, MIT News, MIT Sloan Sports Analytics Conference, motion tracking analysis NBA, National Basketball Association, National Hockey League, NBA, NBA chemistry, NBA synergies, NHL, optical analysis NBA, optical tracking data, Peter Dizikes, Predicting the Next Pitch, Rajiv Maheswaran, Ray Allen, Rebound (basketball), rebound study wrong sloan sports, research papers Sloan Sports Conference, Robert Ayer, Sloan Sports Conference, Sloan Sports Conference research overview, Sloan Sports Conference review, Sloan Sports Conference summary, spacial analysis NBA, USC rebound study

Hockey Night in America! Part 1: NHL shootouts and playoffs

Posted on March 1, 2012 | 3 comments

With the NFL all wrapped up, it’s time for Hockey Night in America! A few weeks ago, I watched the extremely exciting Edmonton Oilers play my Detroit Red Wings. The Red Wings nearly got the win in regulation, but the Oilers scored with 39 seconds remaining (highlights here). Four on four overtime favored the fast skating Oilers, and Detroit needed two open net saves from defensemen to stay alive. The Wings are an excellent shootout team, but they lost this one.

The Wings are 7-2 in shootouts this year, which has earned them some extra points in the standings (shootouts fueled their record home winning streak as well). Back in December, I questioned whether these extra points are deserved. Shootouts reward individual skill that may not be related to game performance. In the interest of crowning the best team champion, maybe we’d be better off giving the Oilers and Red Wings one point each and going home at the end of overtime (the dreaded tie . . . ). But do teams that get into the playoffs with many shootout victories actually perform worse once they get there?

I started by calculating shootout-free points totals Continue reading →

3 Comments

Posted in Hockey, Prediction, Rules Analysis, Sports Stats

Tagged Detroit, Detroit Red Wings, Edmonton Oilers, get rid of shootouts, hockey, Hockey Night in America, Hockey Night in Canada, hockey no more shootouts, National Hockey League, NHL, NHL overtime alternatives, NHL overtime rules, overtime proposals, Red Wings, Red Wings home winning streak, shootout is dumb, shootouts, shootouts lucky, shootouts not fair, Vancouver Canucks