Correlations and Tacos

Publication Date: October 18, 2005

Wanna Score More?

Last time I did a piece on the value of using correlations to decide what should be the most important factors in building your team. I want to continue that this time, albeit with a bit less of the tutorial flavor this time around, by looking at the component pieces of offense and how those contribute to winning.

Now, ideally, you'd be able to just take your annual stat report, look at the columns, compare them to the winning percentages, and spot the most important thing. The problem with doing that is the concept of statistical independence, which, despite my statement of non-tutorialness above, I'm going to explain now. Some things depend on other things. There, that wasn't so hard, was it? Less tersely, it isn't really useful to note, for example, that the best correlation between a team offensive stat and winning percentage comes from runs (0.67 for the 2005 Division I season, by the way). After all, "score more runs" is about as useful a piece of advice as "breathe in, then out; repeat", because scoring a run is dependent on the smaller events that go into it, unless you're talking about the batter on a home run. I'm glossing a bit in this definition of independence, since there are also factors like one component factor affecting another, but that's enough to get us going.

For the stats that are reasonably identifiable as component events or useful composites, here are the correlations for last year between the stat and winning percentage:

OBP   0.64
TB    0.62
AVG   0.61
H     0.61
SLG   0.59
2B    0.57
SF    0.55
BB    0.51
HR    0.49
HBP   0.38
SH    0.37
SB    0.35
ATT   0.35
3B    0.32
GDP   0.26
SO    0.06

