Boyd's World-> Breadcrumbs Back to Omaha-> Correlations and Tacos | About the author, Boyd Nation |
Correlations and Tacos
Publication Date: October 18, 2005
Wanna Score More?
Last time I did a piece on the value of using correlations to decide what should be the most important factors in building your team. I want to continue that this time, albeit with a bit less of the tutorial flavor this time around, by looking at the component pieces of offense and how those contribute to winning.
Now, ideally, you'd be able to just take your annual stat report, look at the columns, compare them to the winning percentages, and spot the most important thing. The problem with doing that is the concept of statistical independence, which, despite my statement of non-tutorialness above, I'm going to explain now. Some things depend on other things. There, that wasn't so hard, was it? Less tersely, it isn't really useful to note, for example, that the best correlation between a team offensive stat and winning percentage comes from runs (0.67 for the 2005 Division I season, by the way). After all, "score more runs" is about as useful a piece of advice as "breathe in, then out; repeat", because scoring a run is dependent on the smaller events that go into it, unless you're talking about the batter on a home run. I'm glossing a bit in this definition of independence, since there are also factors like one component factor affecting another, but that's enough to get us going.
For the stats that are reasonably identifiable as component events or useful composites, here are the correlations for last year between the stat and winning percentage:
OBP 0.64 TB 0.62 AVG 0.61 H 0.61 SLG 0.59 2B 0.57 SF 0.55 BB 0.51 HR 0.49 HBP 0.38 SH 0.37 SB 0.35 ATT 0.35 3B 0.32 GDP 0.26 SO 0.06
Note that the correlation for OBP is almost as high as the one for runs. OBP is life; pass it on.
The grouping right after that are things that are easily recognizable. One interesting effect here that we won't really go into at this point is that there's a whole different set of correlations that are of interest here -- the ones between successive years for each player. It turns out that OBP and slugging are both more reliable than batting average, which is another point in their favor.
The high placement of SF is a great example of non-independence. The point there is not that you should have guys trying to hit sacrifice flies. The point is that you should have lots of runners on third (well, not at the same time, but you know what I mean). The same factor comes in for sacrifices and, oddly, GDP; you have to have runners on base to hit into double plays.
It's interesting that, in total, walks and homers both come in behind doubles. I'm going to have to think on that one (and check some other data sets to see if this is an anomaly).
Speed is OK, but don't overvalue it -- stolen bases and triples both come in relatively low.
Strikeouts are more or less neutral events, relative to other kinds of outs.
If you're interested in reprinting this or any other Boyd's World material for your publication or Web site, please read the reprint policy and contact me
Boyd's World-> Breadcrumbs Back to Omaha-> Correlations and Tacos | About the author, Boyd Nation |