Following on my general analysis of the Sloan Sports Analytics Conference, here’s a look at the research presentations (you’ll note: nothing on the sports side of football or soccer! I submitted one of each but they were rejected . . . ):
An Expected Goals Model for Evaluating NHL Teams and Players (Brian MacDonald)
This paper tries to predict future performance better by incorporating more measurable statistics than past models (goals, shots, blocked shots, missed shots, hits, faceoff %, etc.). His prediction tests show that he makes improvements, and at the team level, I think these results have some value. However, moving to the individual level in a sport like hockey (or basketball, football, soccer, or rugby) is hard because of complementarities between players. For example, trying to measure one player’s contribution to team wins or goal differential based on the number of shots they take is hopelessly confused with the actions of other players on the ice that affect the quality and number of these shots.
Another issue in the paper is that MacDonald controls for team level statistics (such as faceoff win percentage) in the individual level regressions, when in fact much of player value may be driven by these statistics. For example, one of Red Wing Pavel Datsyuk’s strengths is faceoff win percentage, while one of his weaknesses is hitting. The value that individuals bring through these variables is caught up in MacDonald’s team level control variables. Still, the team-level analysis is a reasonable way to improve what’s out there.
This paper categorizes the top three players on each team into various player types (e.g., versatile power forward, three point shooting wing) and then measures whether specific combinations of two or three types of players lead to team success or failure. While an interesting approach, the paper has at least three flaws that left me incredulous of Ayer’s conclusions.
First, the player categories and choice of the best three on each team are his ad hoc decisions, and most of the player categories proxy for quality (e.g., “defense-oriented big man” sounds like code for a guy that can’t shoot, and “versatile power forward” sounds like a very skilled offensive player). By construction, combinations of the high quality player types will do better than combinations of the low quality types. It would be more interesting to analyze just the high quality types or define “types” in a more quality-agnostic way (e.g., by height, weight, shooting percentages, position).
Second, the makeup of the rest of the team will surely depend on the makeup of the stars. For example, maybe three point shooting wings are rare, but good players at other positions are more common. This means that a team with a good three point shooting wing will be surrounded by better supporting players on average than a team with a good point guard and big man.
Third, and most problematic, Ayer tests hundreds of player combinations. By random chance, some combinations will show up performing better than the average. Simply looking at the statistical significance for each combination is uninformative in this case. After all, “statistically significant” generally means that the result would happen a full 5% of the time by random chance alone. In other words, even if the combinations have no actual impact on winning, about 5% of Ayer’s combinations should show up as statistically significant. A better statistical approach is a test of joint statistical significance of all the combinations (i.e., an F-test for whether the dummy variable coefficients are jointly significantly different from zero). I wouldn’t be surprised if such a test attributed Ayer’s results to random variation.
CourtVision: New Visual and Spatial Analytics for the NBA (Kirk Goldsberry)
This paper was the runner up for the research award. I give Goldsberry a ton of credit for being one of the only guys to assemble non-proprietary spatial NBA data (that I know of). He collected data from shot charts available on most sports reporting websites and presents pictures of each player’s shooting percentage from different places on the floor. He used this information to create a statistic for overall shooting ability: the number of cells on his grid at which a player has an expected value of one point or more per shot. For three pointers, the player needs to shoot 33%; for two pointers, 50%.
The weakness of this approach is that shooting percentage at a given point on the floor may be driven by more than player ability. For example, Goldsberry shows that Kobe shoots quite poorly on long baseline two pointers (in fact, most players do), but these shots may be taken later in the shot clock or against tighter defense, on average. Similarly, Ray Allen has one cold area on the three point line (left of center when looking at the hoop), but the defense might contest those threes better, due to nuances of the Celtics offense. I would be really interested to see these charts for shots with more than 10 seconds left on the shot clock, or somehow restricted by defensive positioning. Without that, it’s hard to draw strategy conclusions for the offense or defense.
Another minor point: this statistic is a bit hard to interpret. If a player shoots only when wide open, he might have a very high percentage from many points on the floor but be fairly useless in most situations. Again, the issue of defensive positioning is relevant.
Deconstructing the Rebound with Optical Tracking Data (Rajiv Maheswaran, Yu-Han Chang, Aaron Henehan, Samantha Danesis)
This paper won the research award. It is certainly one of the most complex. The authors use coordinate data on the position of the ball and the players at every moment to analyze rebounding. Their main results show that nearly all rebounds drop to eight feet in height within an eleven foot radius of the hoop, and that the team with more players close to the ball at this point tends to get the rebound. I wasn’t surprised by either of these findings, but I was glad to verify my hunches. They also look at variation in rebounding based on shot type and find that offensive rebounds are more likely on close shots.
While I can’t quibble with these statistics, I don’t agree with the strategy arguments that the authors make. First, they argue that rebounding will improve if players move closer to the rim (a combination of their findings that most rebounds are available close to the hoop and that the team with the closest players to the ball generally gets the rebound). This goes against general rebounding theory, which says that you should put your body on your man and shield him as far away from the hoop as possible, so that the ball falls in front of you (where he can’t get it). In fact, I would argue that this technique ensures that defenders are usually closest to the ball and that the closest players to the ball are most likely to get the rebound. If players follow the authors’ suggestion and just race towards the hoop, I imagine there would be many more offensive rebounds (what restricts the offensive player from racing to the hoop as well?).
Second, they argue that teams should shoot more close shots, since those are more likely to generate offensive rebounds. This seems fair based on their statistics, but teams often take a long two point shot when they can’t get a good inside shot. To give the extreme example, if teams could get a good shot attempt from zero feet (a dunk), they would do it every time — dunks have the highest percentage conversion rate among all shots taken. The problem is that teams can’t get a good look at a dunk on most possessions. Similarly, it may be that offensive rebounds are only likely on “good” inside shots, where a player has strong position against a single defender. If teams followed the authors’ advice and went inside all the time, they would shoot a lower percentage on inside shots and possibly get fewer offensive rebounds as well. Offensive rebounding effort may differ on long and short shots, too. I would encourage players to try harder for offensive rebounds on long shots, rather than change shot selection. It’s hard to see any downside to that.
Effort vs. Concentration: The Asymmetric Impact of Pressure on NBA Performance (Matt Goldman, Justin Rao)
This is one of two papers whose conclusions I (mostly) believe. The authors look at free throw shooting at the end of NBA games and find that home players (especially poor shooters) shoot worse under pressure. There’s no effect for away players, suggesting that either (1) home players feel more pressure and over think a subconscious skill, or (2) all players suffer from this pressure effect but away players benefit from the distraction of crowd noise (preventing over thinking). They also find that home teams get more offensive rebounds at the end of games (a pure effort task), but away teams do not. This reinforces hypothesis (1), since effort changes the most for the home team.
Their results are somewhat different than an earlier paper on free throw shooting, which finds big choking effects for home and away players (again, mostly by poor shooters). I would like to see a reconciliation of these findings, and although teams might be able to use these results to improve their endgame fouling strategies, neither paper has strong strategy implications.
While this paper addresses an interesting question (do coach and player experience impact regular season and playoff winning percentage?), the statistical approach is inadequate. Tarlow uses regressions to relate winning percentage to various experience variables. These regressions show some interesting correlations, but causality could go in either direction. For example, an experienced coach will rarely take on a very bad team, which generates a correlation between coaching experience and team performance, even if coaching experience has no causal effect on winning percentage.
Tarlow also shows that teams that keep the same core players together have higher winning percentages. Again, the causation could go the other way: these players are likely kept together because the team is playing well. Although he controls for past performance in his regressions, I can make similar arguments that still suggest reverse causality (e.g., players are kept together because teams know that they are improving year to year). Another issue is what “experience” actually measures. Since bad players and coaches tend to drop out of the league, “experience” measured by years of service probably reflects quality as much as seasoning.
NBA Chemistry: Positive and Negative Synergies in Basketball (Allan Maymin, Philip Maymin, Eugene Shen)
I wasn’t able to see this paper presented, unfortunately. From reading, it looks like a statistical mess. The authors claim to be looking for synergies between players, but there are no explicit “synergies” in their regressions. Each player has a separate estimated effect on the outcome of each play — there are no effects estimated directly for different groups of players (this seems to be the most obvious way to measure synergies). Instead, the authors use an elaborate procedure to determine whether players with big impacts on different play outcomes have a larger impact in combination than the sum of their individual impacts would suggest. Again, it’s not clear how the combined effects arise, since there are no “groups effects” in their procedure that are specific to the player combinations.
My buddy Chris and I discussed this paper, and our hunch is that their results reflect the nuances of the statistical models they used (nested probit regressions, then a variety of ad hoc methods). To truly estimate synergies between players, you need to observe them playing together (preferably in lineups decided by coin flips or some other randomizer) or make assumptions that there are common player types (as Ayer did in the paper above). I think these guys achieved victory by confusion instead, and I don’t know what to make of their results.
Predicting the Next Pitch (Gartheeban Ganeshapillai, John Guttag)
Brothers in “G,” Ganeshapillai and Guttag address an interesting question: what pitch is a pitcher most likely to throw next? They use some fancy techniques (“machine learning” was the buzzword for a couple projects this year), but their approach boils down to predicting pitch selection based on all the known factors before the pitch. They consider the inning, the score, how many runners are on base, the count, the number of outs, the specific pitcher-hitter matchup, etc. I think simple regressions would give them similar results, but their fancy methods show that many of these variables matter, and they claim to “improve” pitch prediction. They could add in more variables (e.g., the steal threat of runners on base), but I don’t have any statistical issues with their approach.
But how do they judge “improvement”? Unfortunately, the authors only compare their accuracy to the frequency of the pitcher’s most common pitch. Surely players do not think a pitcher will throw their most common pitch every time. Choosing “fastball” as the prediction on all 3-0 and 3-1 counts and a known “out pitch” on all 1-2 and 2-2 counts might do just as well as the authors’ model, but we can’t tell from their paper. Still, the overall technique is sound, and I can imagine teams running these models in important game situations to signal to a hitter or figure out when to hit and run, for example.
While this is probably quite important for the business side of sports, I couldn’t bring myself to attend the presentation or read the paper. If you read it and find it really interesting, let me know.
Bashuk uses past win probabilities to predict future success. Specifically, he breaks games into eight 5 minute pieces and calculates the average win probability in each 5 minute segment (win probability is the chance that you will win the game, given the current score and the time remaining on the clock, according to how games progress and finish on average). He then uses those win probabilities (testing different weightings) to predict success in future games, along with the win probabilities of your opponents. The method seems to work pretty well; it does best when he weights endgame performance much more than early game performance.
I like this general strategy, but I spent a lot of time debating this paper at the conference with Chris. Although his predictions are good, it’s hard to tell if they do as well as possible. Win probabilities are a defined function of two variables: current score and time remaining. By allowing different weightings for different parts of the game, Bashuk is implicitly using time remaining as a separate variable as well. I like the intuition of using scoring and time remaining (since blowout scoring doesn’t matter in the second half), but I’d prefer to predict future games by allowing past score differentials and time remaining to interact more flexibly in one step in a regression, instead of relying on the set relationship between these variables in the win probability formula. By using win probability, Bashuk limits the flexibility of his prediction model somewhat and thus potentially its accuracy. Regression would tell us the best way to incorporate these variables. Still, it’s a pretty good paper.
If you’d like to read some more about the conference, Peter Dizikes at MIT News gives a more balanced summary.