Estimated Pitch Counts

Boyd's World-> Breadcrumbs Back to Omaha-> Estimated Pitch Counts About the author, Boyd Nation

Publication Date: January 22, 2002

Look Out, He's Got Another Tool!

Back in November, I published a study on pitcher abuse in the college ranks. It was, if my logs are accurate, the most popular column I've done, but I haven't gone back to the subject since then. I had other things I wanted to write about, and the holidays came in there with the corresponding weeks off, but mostly it was because writing well about the subject of pitcher usage requires a good bit of data collection, and I've been doing a good bit of that in the background.

One of the problems I've referred to before is that actual pitch counts are really hard to find. Not all teams even track them, and many of the ones that do either don't get the data from the coaching staff to the SID or choose not to publish them. SID's as a lot are an extremely helpful bunch, but even they can't work magic, so it would be nice to have a good way to measure without a published pitch count.

Along those lines, I've developed a measure that I call Estimated Pitch Count, or EPC. The idea behind EPC is that, while actual pitch counts can be hard to find, box scores are fairly common, and you can come up with a reasonably close estimate of the number of pitches thrown from the information contained in a normal pitcher's line. In order to test this theory, I've gathered data from every start I could find from last year where I could get a pitch count, a little over 700 starts, and I've played around with linear regression until I came up with a formula that I think works reasonably well.

Although the process was fairly time-consuming, the end results are relatively simple. It's not quite a linear formula, because the number of pitches thrown per batter tends to go up rather sharply once you get past a certain number of batters, but you can get a reasonable approximation from a two-step formula. First:

EPC = 4.80 + .21 * IP - .07 * H - .31 * R - .07 * ER + 4.15 * BB +
      1.49 * SO + 1.27 * AB + 1.72 * BF

If that estimate gives a total higher than 130, use the following formula instead:

EPC = 87.64 - 10.73 * IP - 2.21 * H - 2.35 * R + 1.56 * ER + .49 * BB +
      .90 * SO + 2.64 * AB + 2.02 * BF

Before you all pull out your programmable calculators, I've put together a form you can use to try this out for your favorite pitcher. Feel free to submit your most egregious example of overuse; if I get enough I'll throw together a contest.

Does It Work?

I think so.

Within the 700 data points I used to create the formula, it's close enough to be useful. In judging this, you have to remember that an exact pitch count number isn't as useful as we might like, because the effects of each additional pitch are not always the same. Therefore, for the formula to be accurate within around ten pitches on the low end and within fifteen pitches on the high end is acceptable; that means, for example, that it will generally place correctly within the right start category used by Baseball Prospectus. It achieves that goal for the data used to create the formula -- it never misses by more than 15, and, when it's wrong on the high end, it more often underestimates than overestimates.

How it will perform with other data is an open question, especially if that data originates in a different context -- 1990 instead of 2001, for example, or minor league ball instead of college. I hope to be able to study this more as time goes on.

A Case Study

In order to provide a case study as an example of how well it works, I've gathered data for the last two years for one of the pitchers I wanted to include in my original study but was unable to get data for, Shane Komine of Nebraska. First, here are all but one of Komine's starts in 2000:

IP   H  R ER BB SO AB BF EPC Notes

5.0  4  1  1  2  8 19 21  86 2/12/0
6.0  5  2  2  4  5 23 27 105 2/20/0
9.0  5  2  2  1 12 31 33 124 2/27/0
6.2  5  4  3  4  9 22 29 112 3/4/0
7.1  8  4  3  1 10 28 30 111 3/10/0
9.0  4  1  0  2 12 33 35 149 3/17/0
8.0  2  0  0  1 13 26 28 111 3/24/0
4.2  5  3  3  1  1 17 20  66 3/31/0
8.0  5  3  0  1 17 30 31 126 4/8/0
9.0  6  2  2  1 16 32 33 142 4/14/0
7.0  5  2  2  1 11 25 28 106 4/21/0
9.0  7  0  0  1 12 32 34 127 4/28/0
9.0  6  0  0  1  5 30 33 113 5/6/0
8.0  7  2  2  2  4 29 32 111 5/12/0
4.2  8  7  7  3  7 21 25  95 5/17/0
7.0  6  5  3  2  5 28 30 107 6/3/0, 108 pitches

That's a PAP score of a little over 250,000 for the 2000 season, which puts him well up into severely overused territory if these estimates are at all accurate. Now, the 2001 season:

IP   H  R ER BB SO AB BF EPC Notes

4.0  8  5  5  2  6 21 23  87 2/11/1
5.2  8  6  6  4  7 25 29 112 2/16/1
7.0  7  3  1  3  6 26 29 109 2/24/1
9.0 10  2  1  4 10 34 39 145 3/2/1
7.0 11  6  5  1  9 31 33 117 3/9/1
8.0  3  1  0  0 12 27 29 108 3/15/1
7.0  7  4  4  3  5 26 31 111 3/24/1
9.0  4  1  0  2 11 32 34 130 3/30/1
7.0 10  6  6  0 11 30 32 113 4/7/1
9.0  9  2  2  1 11 38 39 159 4/13/1
8.0  7  0  0  1 10 28 29 110 4/20/1, 124 pitches
9.0 10  4  4  2  9 36 39 149 4/27/1
7.0  8  6  5  4  8 28 34 126 5/5/1
2.0  5  3  3  2  0  8 12  43 5/11/1
8.0  6  1  1  2 12 30 32 125 5/17/1
8.0  6  6  2  1  9 33 34 122 5/25/1, 121 pitches
9.0  3  0  0  3 12 31 35 149 6/1/1, 162 pitches
8.0  7  5  4  1  9 31 33 118 6/8/1

That's good for a PAP score of 620,000+, second only to Kenny Baugh among pitchers I've studied. If you replace the 149 estimate against Rice on June 1 (that was a really bad weekend for pitchers) with the actual 162 (which would be cheating), he ends up almost even with Baugh.

Finally, two more numbers:

Komine's 2000 ERA: 2.24
Komine's 2001 ERA: 3.35

Now, one player over two seasons is way too small a sample to draw any conclusions from, but it's an interesting trend. When you add to that the fact that he had shoulder surgery in September and that Nebraska coach Dave Van Horn's stock quote on Komine is, "He's a battler who wants the ball," you have to be a bit worried.

One interesting open question is whether it was the 2000 overuse that hurt him in 2001 or the in-season problems in 2001 (or some combination of the two, of course). Komine having been labelled the staff ace meant that he was going to be left in even when he didn't have his best stuff. The March 2nd start is a good example; he wasn't sharp, giving up ten hits and four walks, but he only gave up two runs, so he was left in for a complete game, facing 39 batters and an estimated 145 pitches. I have no proof, but I can theorize a cycle in cases like this where a bad outing or two that the pitcher gets through with a lot of pitches causes him to be less effective the next time out, which leads to more pitches, which leads to less effectiveness, and so on. I'll have to think of a way to study that one, too.

Finally, a closing addition to the coaching report a couple of weeks ago: Last year was George Horton's fifth year as Cal State Fullerton, so he goes onto the active winning percentage in seventh place.

Boyd's World-> Breadcrumbs Back to Omaha-> Estimated Pitch Counts About the author, Boyd Nation