Predictability of Offensive Stats
Publication Date: January 11, 2005
What Can You Count On?
So, you've got a player coming back and you're trying to figure out what to do with him. Or you've got a group of players coming back and you want to know how many runs to expect from them, so you'll know how much to focus on recruiting hitters for this year (do you grab juco guys in order to shore up the short term or grab high school guys that'll be ready in a couple of years?). Or you just want to know what to expect from the returning players on your favorite team. You've got the stat lines from last year -- how much does that tell you about what to expect from them this year?
In an attempt to answer those questions, I designed a study. I'll do the hitters this week and then do a corresponding study for the pitchers for next week. Given the contents of The Hitting Stats Database over in The Filing Cabinet, I identified all the players who had played my definition of a full season (at least 160 plate appearances) in back-to-back seasons between 2002 and 2004 -- a total of 1439 pairs of seasons -- and computed the correlation between the stat for season N and for season N+1 for each of the stats in the standard offensive stat line. Here are the results:
Stat R X ATT 0.74 1.08 SB 0.73 1.11 SO 0.61 1.01 HR 0.58 1.16 BB 0.56 1.12 HBP 0.54 1.09 RBI 0.54 1.08 R 0.51 1.07 GP 0.50 1.01 SH 0.50 0.98 TB 0.49 1.07 AB 0.48 1.04 SLG 0.47 1.03 H 0.44 1.05 OBP 0.40 1.02 GS 0.38 1.03 3B 0.35 1.04 GIDP 0.32 1.03 AVG 0.28 1.01 2B 0.27 1.09 SF 0.18 1.12 R -- Correlation from year N to year N+1 X -- Multiplier from year N to year N+1
You could almost, but not quite, use this as a ranking of the value of the generally available stats. Things which are predictable from year to year tend, for the most part, to be things which actually measure skill rather than fortune. The problem with that is that the two on top are generally tinged largely with "reputation"; guys who get lots of stolen bases tend to be guys who get lots of stolen base attempts, who get them in part because they're fast and in part because their coaches believe they're fast. After that, though, you hit the stuff that's under the player's control -- sometimes called the Three True Outcomes. HBP finishing this high points out how much that's under the control of the batter. The bottom of the list points out that sacrifice flies are almost entirely random (that a small correlation exists at all is a minor reflection of the same sort of lineup effects that produce RBI's and runs) and that the doubles are too dependent on the sort of luck that makes batting average hard to predict.
The multipliers are interesting in their own way. Just to make sure they're understood, the average value from the second year is the multiplier times the average value from the first year. The first interesting part is that they're mostly all so small. The years from 18 to 21 should be a time for large development, and 8-10% growth a year is actually a pretty good gain if it can be sustained on throughout the early 20's, but we tend to think in terms of guys growing explosively each year, and that's not reflected here at all. The 16% increase in home runs is the largest (and tends to be missed because the numbers are fairly small to start with; even a slugger will go from 10 to 12 or whatever), while batting average tends to remain almost unchanged. It's also interesting that strikeouts tend to stay more or less level.
Getting back to our initial questions, then, how much you should expect from your returning players depends on the shape of their contribution from last year. It's possible to succeed in all sorts of ways, but guys whose value came from lots of singles and a high batting average are less likely to be able to repeat their successes than those whose contributions came in the form of walks and homers. If your offense was successful but depended a lot on the former type, you may want to look for some short-term help.
