sun, 26-feb-2006, 09:57

A couple days ago, in an article about prospect analysis in baseball (subscription required) Nate Silver produced a cool table showing the year to year correlations of the six major batting events. This morning while I wait for my dough to rise, I decided to replicate this analysis with my new found baseball hack ability.

You can download the R program code for the analysis by clicking on the link.

Here's the result, showing the 2004 to 2005 correlations for rate-adjusted batting statistics for all players with more than 250 at-bats in both seasons:

Hits / PA           0.422
Singles / PA        0.663
Doubles / PA        0.369
Triples / PA        0.501
Home Runs / PA      0.702
Walks / PA          0.718
Strikeouts / PA     0.813
Plate appearances   0.405

What Silver was trying to show by presenting his table (which included all year to year correlations since World War II) is that "there's really no such thing as a doubles hitter."

You can see from looking at the table that there's very little relationship between how many doubles a hitter hit in 2004 and how many they got in 2005. But a home run hitter in one season is likely to hit them at the same rate in the next season. Also note that strikeouts and walks are very highly correlated. So, 2004 and 2005 strikeout leader Adam Dunn is likely to strike out more than 150 times in 2006. Thankfully for Reds fans, he'll probably also hit more than 40 home runs.

The last number is also interesting. There isn't a great correlation between plate appearances between seasons. This is probably a combination of older players breaking down between 2004 and 2005, and younger players stepping in to take their place at the plate.

tags: baseball  books 
Meta Photolog Archives