Stats for Jocks

Boyd's World-> Breadcrumbs Back to Omaha-> Stats for Jocks About the author, Boyd Nation

Stats for Jocks

Publication Date: October 4, 2005

Why We Do This Stuff

Some colleges offer courses in more technical subjects specifically designed for the non-technically-inclined -- the canonical example, a geology class, is frequently referred to as Rocks for Jocks. Hence our title this week. One of the most common questions we get around here at Boyd's World World Headquarters of the World (after, "You gonna eat those?") is, "I'm glad to see somebody talking about college baseball, but why do you have to do all that stats stuff?" The answer is that, even though it can get lost in the mist as the numerical conversation goes on over the years on a particular topic, we use the numbers to try to find the answers to questions that we care about. This week, I want to use simple correlations to try to answer a question, both as an example and because I'm honestly interested in the answer to this particular series of questions.

Now, although correlations can be moderately complex to compute, they're not that hard to understand. Imagine that you have two sets of numbers, which represent two sets of things that have happened, and that you plot them with a standard line plot and then scale them so that they fit into the same size space. The correlation between the two numbers is a measure of how well the two lines match up. They range from -1 (the numbers are the complete opposite of each other) to 1 (a complete scaled match) with numbers around 0 implying no relationship at all. If the correlation is below 0, we call that a negative correlation, which means that the second thing is more likely to happen the less the first things happens. If you want a more detailed explanation, this one looks pretty good.

Now, there's an old saying that correlation does not imply causation. In other words, just because event A correlates with event B, that doesn't mean that B is happening because of A. A could be causing B, or B could be causing A, or C could be causing A and B, or A and B could have nothing to do with each other and just be a coincidence. The reason that we look at them, though, is that sometimes what we want to happen (like winning games) can be harder to predict than something that might contribute to that event (like taking walks), so we look for clues that might contribute to our knowledge.

So, let's formulate our question: If we have limited resources, in the form of recruiting time and effort and scholarships to offer, what sort of high school player should we focus our efforts on? We need some goal to set, so we're going to focus on maximizing winning percentage for whatever schedule we play, since that's all you can really do with the players on the field; scheduling issues are a separate discussion.

The first obvious decision to be made is whether to focus more on pitching or position players. There are no stats that directly address that decision, so we'll start by dividing the game in half: It turns out that the correlation between runs scored and winning percentage is .69, while the correlation between runs allowed and winning percentage is -.82. That means that preventing runs is a more reliable means of winning games than scoring them (not also that neither of those are perfect correlations, which means that you can win with an offensive team if that's what you've got; it's just not as reliable).

So, on the face of it, you might think that you pick up position players where you can and focus your attention on the pitchers. However, there's a clue in that name "position players" that we might need to follow up on -- there are actually two parts to preventing runs, pitching and defense. So let's look at a couple more correlations: What's the connection between a good pitching measure like ERA and winning, and what's the connection between a good defensive measure like defensive efficiency and winning?

Now, this isn't a pure split, because any pitcher will tell you that, the abstract notion of an unearned run aside, defense can affect your ERA, but it's a start. The correlation between ERA and winning percentage (all of these correlations are for the 2005 Division I season, by the way, but the ones I've checked hold up quite well for other seasons or divisions) turns out to be -.73, while the correlation for defensive efficiency and winning percentage is .69. That's close enough that you probably don't want to just declare that pitchers are more important than position players, especially since defense contributes to ERA, but it's probably not enough to overcome the difference in the runs scored/runs allowed correlations.

You can run this type of analysis out to about whatever level of detail you want to (and you probably should; go see if the math department has a grad student who's looking for a project), but I'll just do one more: For the pitchers that you have, assuming you don't have any one-pitch or low-stamina guys who have to be in the bullpen, should you put all of your best pitchers in the rotation? This one turns out to be interesting -- conventional wisdom is that the starters are far more important, but the actual correlations are -.76 for starter ERA and -.75 for reliever ERA. You're probably no worse off putting the best guys in the rotation, but those other innings really matter, so don't neglect that bullpen.

See, now, that didn't hurt a bit.

If you're interested in reprinting this or any other Boyd's World material for your publication or Web site, please read the reprint policy and contact me

Boyd's World-> Breadcrumbs Back to Omaha-> Stats for Jocks About the author, Boyd Nation