Boyd's World-> Filing Cabinet-> Stuff They'd Be Better Off Not Knowing | About the author, Boyd Nation |
Stuff They'd Be Better Off Not Knowing
Executive Summary
Each year as the selection committee sits down to select the thirty-four at large teams that receive bids to the tournament (as well as making the other decisions around seeding and hosting that they make), they are given a report (officially, if playfully, called The Nitty Gritty Report) that contains a number of statistics for each team. Other than the RPI and the team's overall win-loss record, these stats represent some subset of the team's season -- things like road record, record in last ten games, or record against several RPI subsets. These sub-stats and their use are well-known; you hear them all time in bracketology discussions or in committee justifications of their decisions. It turns out that these sub-stats are detrimental to the process and shouldn't be used as part of the selection criteria.
Philosophy
It's easy to understand why the NCAA support staff provides the data and why the committee wants it -- in general, more data is a good thing during a decision-making process, so if more data was available, it would be a good thing. However, what is actually provided is not more data, it's actually the same data cut into smaller slices. The problem with this is that the college season is just barely long enough when considered as a whole for there to be a chance to rank teams, and any subset of the games just throws away too much information to be useful.
Before getting into the methodology and proof, it's worth a moment to ask a question that I suspect relatively few of the committee members have asked themselves: What question are they trying to answer? There are three possibilities that I can think of, so we'll run through those to see which one they act like they're working on.
Methodology and Proof
One of the side effects of my favorite ranking system, the ISR, is that there is a formula for predicting how often a team will win against a given opponent based on the gaps in their ISR's. There's a predictable adjustment for home field advantage, and, in the postseason, there's a documented advantage for postseason experience. If we use this formula to look at a large set of games, such as all postseason games from 1999 to 2009, we can look at a given characteristic to see if teams with that characteristic win their games about as often as the formula would predict; we'll call this a comparison of actual vs. expected wins.
If something is a useful predictor of postseason success above the information about the team's whole season contained in a full-season ranking like the ISR, then you would expect the actual wins to consistently exceed the expected wins. This is what happened with home field advantage and postseason experience.
As a control illustration, if you look at each letter of the alphabet to see how often the team with more of that letter wins (how often does the team with more a's win, how often does the team with more b's win, and so on), you'll find that the results vary compared to the expected wins but generally stay pretty close (almost all of the actual win numbers are within 3% of the expected wins) and are more or less split between higher and lower. In other words, it looks like random variation around a predictable pattern.
Now, let's look at what happens with the different factors that are presented in the Nitty Gritty Report:
Stat | Games | Expected Wins | Expected WP | Actual Wins | Actual WP | |||||
Record | 1477 | 952.3 | 0.645 | 941 | 0.637 | |||||
Non-conf record | 1489 | 965.8 | 0.649 | 945 | 0.635 | |||||
Conf record | 1488 | 824.2 | 0.554 | 819 | 0.550 | |||||
Road record | 1490 | 898.3 | 0.603 | 876 | 0.588 | |||||
Last 10 games | 1264 | 700.7 | 0.554 | 697 | 0.551 | |||||
Base RPI | 1477 | 952.3 | 0.645 | 941 | 0.637 | |||||
Non-conf RPI | 1485 | 1008.8 | 0.679 | 967 | 0.651 | |||||
Conf RPI | 1418 | 747.5 | 0.527 | 735 | 0.518 | |||||
OWP | 1487 | 976.3 | 0.657 | 958 | 0.644 | |||||
OOWP | 1477 | 994.3 | 0.673 | 947 | 0.641 | |||||
Non-conf OWP | 1487 | 857.0 | 0.576 | 834 | 0.561 | |||||
Non-conf OOWP | 1468 | 947.9 | 0.646 | 926 | 0.631 | |||||
Record vs RPI 1-25 | 1461 | 965.6 | 0.661 | 980 | 0.671 | |||||
Record vs RPI 26-50 | 1474 | 985.6 | 0.669 | 944 | 0.640 | |||||
Record vs RPI 51-100 | 1478 | 989.3 | 0.669 | 953 | 0.645 | |||||
Record vs RPI 101-150 | 1474 | 971.3 | 0.659 | 961 | 0.652 | |||||
Record vs RPI 1-100 | 1490 | 1018.5 | 0.684 | 996 | 0.668 | |||||
Record vs RPI 1-150 | 1493 | 1014.9 | 0.680 | 988 | 0.662 | |||||
Record vs RPI 151-bottom | 1472 | 948.2 | 0.644 | 935 | 0.635 |
As you can see, virtually all of the factors that are presented in the Nitty Gritty Report are counterindicated as predictors of postseason success. As an example, to be clear, based upon their ISR relative to their postseason opponents over the last 11 years, you would have expected the team with the better road record to have won 898 of their 1490 games, but they've only won 876, a 1.5% deficit. The only factor to have outperformed the expected win mark is record vs. RPI top 25, which has a 1% plus mark. Most of these are relatively small negatives, but the consistency of the data is telling; these are not factors which predict postseason success and, in fact, tend to predict postseason underachievement instead.
The reason for that is not particularly important, given that the facts above are enough reason to stop using them, but it's worth attempting an explanation to make things clearer -- in almost all of these cases, what's being looked at is a minority of the season. As a simplifying case, if you have two teams that are equal over the course of the season, the fact that one team is better in a small subset (like last ten games, for example) means that that team has actually been less good over the larger rest of the season.
Throwing away data is a bad thing. Remember that the next time you want to argue a team's case based on some small portion of their season like record vs RPI top 100 or something.
Boyd's World-> Filing Cabinet-> Stuff They'd Be Better Off Not Knowing | About the author, Boyd Nation |