I just finished reading “56,” a retelling of Joe DiMaggio’s hit streak by Kostya Kennedy (thanks to my buddy Jake for the book!). He unfolds the 1941 streak like a story, complete with what the players were thinking/saying and lots of contextual details concerning DiMaggio’s family life, World War II, Italian American immigrants, etc. The book has a bit too much typical baseball nostalgia, perhaps (witty newspaper reports, grand ballparks and announcers, exaggerated personalities), but the story is undeniably fascinating and the writing is pretty good. Kennedy also sprinkles in some discussion of other hitting streaks and finishes with a good summary of quantitative work that’s been done on streaks.

The big debate about both good and bad streaks is whether they arise due to chance alone or whether they reflect actual shifts in underlying percentages. Most quantitative work suggests that streaks are an illusion; they are the product of chance. Basketball provides an easy example. Imagine a team employed the hack-a-Shaq (Shaq is about a 50% free throw shooter), but Shaq hit 6 free throws in a row. The team would stop fouling him, worrying that he is on a hot streak (i.e., probability greater than 50% that he will make his next free throw). However, if Shaq were truly a 50% free throw shooter on every shot, he would generally get around 5 or 6 streaks of at least 6 makes in a row over the course of a season (around 700 attempts in his heyday). A streak of 6 in a row seems crazy for Shaq, but it’s not crazy when you think about how many free throws he took each year.

Most studies against “streaks” rely on the argument that the observed data are not unusual if the true data-generating process reflects independent “Bernoulli trials” (i.e., repeated coin flips with constant probability). However, shifting probabilities due to changes in confidence or finding one’s range could also explain the observed data. Generally, studies don’t test such alternative models or control for behavioral shifts that affect success probabilities (for example, while writing this I watched Michigan sharpshooter Stu Douglass make threes on consecutive possessions then brick a 30 foot bomb off of a curl play on the next trip down the floor; unsurprisingly, Indiana’s defense was all over him on the last play).

Now what about DiMaggio? He hit for average and hardly walked, which is the best combination for generating hit streaks (from the hit streak perspective, a walk is as bad as an out). He also hit a lot of home runs, where there is no variance in the outcome. The only sure way to avoid “the old at’ em ball” is to hit it out of the park (copyright, Jim Price, Tigers radio color broadcaster).

Given these advantages, are these streaks unusual enough to prove that DiMaggio really “caught fire”? The second longest streak in modern Major League history is Pete Rose’s 44 games (Willie Keeler also got to 44 straight before foul balls counted as strikes), so 56 games seems pretty incredible. As Kennedy notes in his book, Rose’s streak is only 79% of DiMaggio’s streak, and he bunted more than usual to keep it alive (DiMaggio never bunted). This is a tiny number compared to most records. For example, Barry Bonds has the most home runs in a season at 73, but Mark McGwire has 70 (96%). If you prefer pre-steroids numbers, we have Maris’s 61 to Ruth’s 60, which aren’t far from 73 themselves. Hank Aaron has the career record for RBIs at 2,297; Ruth is right on his heels at 2,213. However, this is not Kennedy’s best argument.

Let’s try the usual Bernoulli approach to test Kennedy’s claim. I simulated 5,000 seasons 0f 150 games each for a player with stats similar to DiMaggio. First, I simulated the number of plate appearances in each game by assuming that the player got 4 plate appearances in a game with 50% probability and 5 plate appearances with 50% probability. With the sequence of plate appearances per game set, I did Bernoulli trials with a constant 31% probability of a hit in each plate appearance (DiMaggio’s hit rate was around 31% during his best years; this includes walks in the denominator, so his batting average was even higher).

Here’s the histogram of all hitting streaks over 10 games across all 5,000 seasons (keeping track of shorter hit streaks is computationally cumbersome without adding much information):

Next, I did the same thing for a top home run hitter. I set the probability of a home run to 6% in every plate appearance, which generates around 40 home runs per season. Here’s the histogram of home runs each season:

The graphs look quite different because I truncated hit streaks at 10 games, but if you look only at the right tail of the home run distribution, you’ll notice that it’s much less bunched than the right tail of the hit streak distribution. The top ten hit streaks are 66, 50, 48, 46, 45 (3), 44, and 43 (2). The top ten home run tallies are 63, 62 (4), 61 (2), and 60 (3). This difference in bunching is very similar to what we see in the real data.

Comparing hit streaks to aggregate stats like home runs in a season or RBIs in a career is unfair. The key is timing: the order of events is essential for hit streaks and completely unimportant for aggregate stats, which generates much more variability in hit streaks (i.e., it creates extreme outliers).

However, this is not proof that DiMaggio’s streak was random chance. Far from it, in fact. I found one hit streak of 56 or more games (66 games) in 5,000 seasons, which equates to a 0.02% chance of DiMaggio (or an equally talented player) getting such a streak in a given season. Imagine a more reasonable number: 500 seasons logged by players with a 31% hit rate per plate appearance. This still might be too high, and the chance of a hit streak of 56 or more games in 500 seasons is only 1-0.9998^500 = 9.5% by my estimation.

Let’s get beyond the math. There’s one really good reason to think that there was some extra consistency about DiMaggio: 56 games is not his longest hit streak. He hit safely in 61 straight games as an 18 year old against good competition in the Pacific Coast League. However, there’s also evidence to suggest DiMaggio’s 56 benefited from altered pitching (guys who ran 3 ball counts against him were loudly booed, even in their home parks) and help from his teammates (in one game, a good hitter bunted with one out and DiMaggio on deck to avoid a late game double play). I’m not sure how much of this goes on during streaks today, but Pete Rose’s incessant bunting suggests that batters’ behavior still changes as streaks climb.

The book is a good read. It doesn’t resolve the debate over whether streaks are just luck or whether they reflect true shifts in probabilities — Kennedy argues both sides, as he should. There were days DiMaggio got extremely lucky (a bad bounce ruled a base hit rather than an error), but he streaked for another 16 games immediately following the end of the 56 gamer and had a tremendous hit per plate appearance rate over multiple seasons. In the game that separated the two hit streaks, he hit two at ’em balls — rockets down the third base line — that (unluckily) did not get through for hits.

What is undeniably is that fans, players, and commentators are all enamored with streaks. The 56 game hit streak made DiMaggio a star; he was named the Greatest Living Player in 1969, and the streak was the reason why. Ted Williams, Willie Mays, Mickey Mantle, and others might have laid claim to the title. The streak put DiMaggio in another category entirely.

Edit:

My friend Bill suggested that Ted Williams’s .406 average, which also occurred in 1941, is a greater achievement. I did 5,000 simulated seasons of 540 at bats for Ted Williams and got 29 seasons of at least .400 and 14 seasons of at least .406 (running Bernoulli trials with his .346 batting average during the 2 years before and 5 years after he hit .406). That works out to a 0.58% chance of hitting .400 in a given year, and a 0.28% chance of hitting .406. This is ten times more probable than DiMaggio’s streak, though very few players have batted .345 over a long stretch in the live ball era (since 1920), which reduces the chances that .406 would happen.

Bill wanted to know the likelihood of these two events happening in the same year. Well, if you believe the Bernoulli process to be the correct model (I don’t, but it will give us some idea at least), and assume that there are two guys sitting at those probability values in each year since 1920 (91 years), then the probability that both occur together in at least one year is around 0.0007%, which is basically zero (based on 30,000 simulations of 91 seasons). In some years there might be more than two guys or there might be guys with higher underlying probabilities, and Kennedy mentions that Williams and DiMaggio used each other’s numbers for motivation during 1941 (and the rest of their careers). Still, that’s a really small probability, which speaks to the greatness of their achievements.

I’m not a huge sports fan, but I remembered reading this commentary by Stephen Jay Gould (http://www.nybooks.com/articles/archives/1988/aug/18/the-streak-of-streaks/?pagination=false) reacting to “Streak: Joe DiMaggio and the Summer of ‘41” by Michael Seidel. I was thinking to send it to you when I first saw this blog open, but now that you posted this, now seems like a good time to send it!

You will definitely enjoy it.

I have to say though, Gould’s commentary on the 1988 book about Joe DiMaggio’s streak “one-ups” your review of the 2011 version because he includes beautiful quotations from the old Persian tentmaker, Omar Khayyám.

Thanks Dana – Kennedy quotes a small section of that. I enjoyed reading the whole thing!

You know, as well as I know, of course, that when you’re on a “streak” everything feels different – every shot seems to come out of your hand just right, the ball seems to move in slow motion, you seem to gain extra meters every time you touch the ball. The problem is that the data doesn’t reflect our own sensory experience, which makes it hard for us to believe that our objective, underlying probability has not changed.

I like to think about the silly alternative model with shifting probabilities, where the probability of success on every observed “make” was 100% and the probability of success on every observed “miss” was 0%. This is clearly not correct, but I don’t think there’s any quantitative way to distinguish between the Bernoulli models and some lesser version of this silly model. One big problem for testing is that, even if some makes increase your confidence and actually lead to a streak, not every make will lead you to think you’re on a good streak (for various reasons, including simple inattention). The shifting psychology makes it almost impossible to test between all possible models.

Informative and interesting, even for those among us for whom the mention of Bernoulli evokes painful memories.

DiMaggio’s streak is a singular accomplishment – athletically, psychologically and statistically – but I still believe that .406 was the premier baseball achievement of 1941. Rhetorical question (perhaps not here): what is the likelihood of those two feats happening in the same season?

You’re revealing yourself as a Red Sox fan Bill! However, I agree that Ted Williams .406 is at least as impressive, since it reflects performance over an entire season (I would think that would have mattered more for the Greatest Living Player decision). Although your question was rhetorical, I couldn’t resist answering. See the edit in the post above.