Boyd's World-> The College Baseball Ratings Page-> Frequently Asked Questions | About the author, Boyd Nation |
The College Baseball Ratings Page
Frequently Asked Questions
The Questions:
What are the ISR's?
How are the ISR's computed?
Why are the ISR's needed?
Why don't you include my favorite factor -- such as home field
advantage, margin of victory, or past performance?
How can you rank Vine Covered U. over Enormous State U. when ESU
beat VCU twice?
Why do you have Podunk State ranked #2 on February 29 when
they've never even won their home tournament before?
What are the implied probabilities based on the ISR's?
What are the RPI's?
What are the pseudo-RPI's?
How closely does the selection committee follow the RPI's?
What's wrong with the RPI's?
Who is Boyd Nation, and why should anyone pay attention to this
stuff?
The Answers:
The ISR's are the results of an algorithm designed to measure the quality of a team's season to date by combining their winning percentage with the difficulty of their schedule. The algorithm computes all teams simultaneously and attempts to take advantage of inter-regional games more accurately than other rating systems.
The basic idea is an iterative one. Begin with all teams set to an even rating -- 100 in this case. Then, for each game played, give each team the value of their opponent's rating plus or minus a factor for winning or losing the game -- 25 in this case. Total all of a team's results, divide by the number of games played, and that's the end of a cycle. Then use those numbers as the start of the next cycle until you get the same results for each team for two consecutive cycles.
While it's still a great game, college baseball suffers from the lack of an accurate rating system for measuring team quality. The traditional polls suffer from voters running on auto-pilot, and the RPI's used by the selection committee have some serious problems with the method used to determine strength of schedule. Because of the small amount of inter-regional play in the sport, some regions tend to be under-represented in the NCAA tournament, and mid-rank large conference teams tend to be unfairly excluded. Although trying to get the selection committee to acknowledge this may be a hopeless case, the ISR's are an attempt to find a better rating system.
Because I can't measure whether it increases accuracy, and I intentionally don't trust "common sense", because so much of it is wrong when it comes to baseball.
Any rating system for sports is inherently going to have a bit of impreciseness built into it, because sports are inherently random; this is why we bother to watch the games rather than watching a pre-determined art form like film or ballet. This is especially true for college baseball, in part because of the relatively short season and in part because baseball is the most random of major sports. In professional sports, the best football teams generally win 90% of their games, the best basketball teams routinely win 80% of their games, and the best baseball teams struggle to win 66%.
Because of this, it's impossible to determine just how accurate any given ratings system is. It's possible to see how accurate the results "look", and the ISR's do very well in that regard by mid-season. It's possible to see how accurately the regular-season rankings predict the post-season results, but only an extremist who's never actually thought about it would claim that the best team always wins a championship, especially with a format designed more for television than fairness such as the College World Series.
With that in mind, I've chosen to keep the ISR's as simple as possible. I have experimented with many factors, including the ones above, and have failed to find any indication that they provide any better ratings than simply considering the current-season ratings in a straight-forward manner.
How can you rank Vine Covered U. over Enormous State U. when ESU beat VCU twice?
Because ESU really stunk up the joint against Podunk State and VCU swept Our Lady of Perpetual Victories.
One of the basic tenets of the ISR's is that each game is worth the same amount. Big weekend conference series may impress the pollsters more, but mid-week losses to small schools may indicate fatal weaknesses in the bottom half of the pitching rotation. Or they may not, there's no way to know. Given that, a team's entire season must be looked at, and it must be considered in the context of every other team's season. That's too much data for a human brain to get a good feel for, especially if they're primarily focused on one team; that's why we have computers.
Early season results generally do not provide enough information for the algorithm to give a clear picture of what's going on. Generally, I only provide early season ratings so that readers can get a feel for how the process develops; otherwise, they can be ignored until about mid-March, when things get more accurate. A good rule of thumb is to ignore the ISR for any team that has played fewer than eight games.
What are the implied probabilities based on the ISR's?
Over the 1998 and 1999 seasons, the results played out like this:
Gap Win % 0- 2 0.507 2- 4 0.558 4- 6 0.635 6- 8 0.674 8-10 0.706 10-12 0.760 12-14 0.776 14-16 0.845 16-18 0.873 18-20 0.898 20-22 0.896 22-24 0.944 24-26 0.938 26-28 0.950 28-30 0.936 30-32 0.968 32-34 0.985 34-36 1.000 36-38 1.000 38-40 1.000 40-42 1.000 42-44 1.000 44-46 1.000 46-48 1.000
In other words, when a team has had an ISR that was between 2 and 4 points higher than their opponent, they've won 55.8% of the time, for example. These aren't nearly as precise as they appear, of course, but they're fairly consistent between the two years, so it's probably a reasonably good approximation. This becomes more accurate as the year goes on and the ISR's are given more data for accuracy, of course.
The Ratings Power Index is the official NCAA formula designed to aid the selection committee for each sport in choosing the tournament field. It is based on a combination of a team's winning percentage, their opponents' winning percentage, and their opponents' opponents' winning percentage, with bonuses and penalties involved for road wins against top teams or home losses to lower-ranked teams. The official RPI document for baseball is here in Microsoft Word format.
The pseudo-RPI's are my best effort at a simulation of the RPI's. The full formula is not released, but my best guess is that the sizes of the bonuses are .001 for wins over teams between 51 and 75, .0035 for wins over teams between 26 and 50, and .006 for wins over teams between 1 and 25. The winning percentages are not the full winning percentage but rather the average of each opponent's winning percentage. I'm still uncertain about the handling of neutral site games.
How closely does the selection committee follow the RPI's?
It varies from year to year -- generally they seem to use it for justification more than guidance. Jim Carr has done a good bit of analysis on this.
Although things are improving, there's still a very limited amount of inter-regional play in college baseball. This means that in sections of the country with fewer Division I baseball schools, such as the West, the pool of available opponents tends to be smaller, which tends to pull winning percentages towards .500. As a result of the RPI only considering two levels of interconnectedness, teams from these regions tend to be underranked by the RPI's.
Who is Boyd Nation, and why should anyone pay attention to this stuff?
Boyd is a lifelong college baseball fan who has a master's degree in computer science with a focus on algorithm development. The ISR's are intended to improve enjoyment of college baseball by producing better-informed fans; some of us enjoy the games more when we have a feel for how likely certain results are. If that's you, enjoy.
Boyd's World-> The College Baseball Ratings Page-> Frequently Asked Questions | About the author, Boyd Nation |