A Look at the Distance Matrix

Boyd's World-> Breadcrumbs Back to Omaha-> A Look at the Distance Matrix About the author, Boyd Nation

Publication Date: August 15, 2000

Why Does That Work?

There's a brief but complete description of how the ISR's work in the College Baseball Ratings FAQ, but there's no explanation of why they work, so I'd like to spend a second looking at that as an introduction to this week's topic.

The ISR's attempt to solve a simple but difficult problem -- order all teams by team strength as accurately as possible given what we know about them from their on-field results. If baseball were deterministic -- if the better team won every game -- the problem would be easy. Of course, then we wouldn't bother to watch; for that matter, the teams probably wouldn't bother to play after the first half-dozen games of the season. So we have to try to order teams while knowing that some of our results are "wrong". Failure to recognize this can cause headaches in computers and most sentient humans when handling round-robin loops -- Enormous State beats Vine Covered U., Vine Covered U. beats Our Lady of the Hanging Slider, OLHS beats ESU. Most non-sentient humans just claim that ESU just had a bad day, and, besides, the umps cheated.

Further complicating things is the fact that team quality is not a single axis function, so it is actually possible for there to be real loops -- Enormous State consistently pounds Vine Covered U.'s predominantly right-handed pitching, VCU's plate discipline pushes the game deep into Our Lady of the Hanging Slider's weak bullpen, and OLHS's ground-ball-based offense exposes ESU's weak infield defense. If we designed virtual players (and I want the guy replacing me to have better knees if you decide to do this) and played ten-thousand game seasons, we might be able to identify these loops and more clearly place teams. In the real world, the best we can do is to produce a single-axis ordering and try to make it as accurate as possible. At its core, this can be reduced to taking each pair of teams and trying to compare them.

As an aside, you'd think that the longer major league season with a much smaller set of teams would produce a fairly accurate ordering, but the Lords of the Game have managed to make even that shaky at times in their greed. Large scale expansion, while a good thing overall, hurts the chances of the best team winning. More pettily, interleague play adds in a lot of randomness. The most extreme case so far came in 1999 when the Reds swapped a series of games with the Cubs, exchanging an easier opponent for a more difficult one, and finished one game out of the playoffs.

At this point, we need to consider the concept of the distance between two teams, and then we need to consider how much mental weight we need to allow for games at each distance. That weighting is there for illustrative purposes only -- there's not a direct corollary built into the ISR's; the same effect is a byproduct of the way that the system functions.

The distance between two teams is the minimum number of games that must be considered in order to compare them. For teams that play each other, for example, the distance between them is one. For teams that don't play but have at least one common opponent, the distance is two, and so on. We'll see later how many pairs actually fall at each distance.

Mentally, most of us have some feel for how to appropriately weight games at each distance. While we tend to focus on head-to-head matchups, especially when stating a case, we know that if VCU beat ESU two of three but went 2-10 against common opponents while ESU went 11-4, ESU is probably better. From that point, it's just a matter of understanding how to weight in the further out relationships. As it turns out, those further out paths don't matter all that much if there are sufficient direct matchups and common opponents. However, a large number of pairs of teams are at least a distance of three apart, so sometimes these tenuous comparisons that must be made due to the scarcity of inter-regional play. Fortunately, these aren't made in a vacuum, and the sheer number of these comparisons, which tend to correctly reinforce each other, means that the end results tend to be pretty good.

The Distances

Obviously, the better connected the whole network of college baseball teams is, the more accurate the ordering produced by the ISR's (and most other ratings systems more complicated than the RPI's) will be. So, let's take a look at how tightly connected things are.

The numbers in the following table represent the percentage of pairs for each season that fall at a given distance:

        Distance   1998  1999  2000

           1         7     8     8
           2        47    47    47
           3        43    43    43
           4         3     3     3
           5         *     +     0

         * 1 pair
         + 5 pairs

As you can see, over half of the pairs either play each other or have common opponents, and most of the rest are only one more step apart -- one team played someone who played someone who played the other team. This bodes well for the accuracy of the ISR's.

Looking at these numbers, things are slowly getting tighter at the margins, which is a good thing both for those of us working on rating systems and, probably, for the game as a whole, since it implies that there's more inter-regional play going on. It's dangerous to declare trends based on only three seasons, but I'm encouraged. Part of this trend is also due to the fact that lower-level teams are playing both more games and more Division I games on average; both of those numbers have gone up each year.

The extreme points are vaguely interesting: In 1998, the only pair at a distance of five was made up of Alcorn State and Santa Clara. In 1999, there were more pairs, as the SWAC teams became more and more insular. This year, there were no pairs that far apart, even though Alcorn State only played one game against a non-conference team; Mississippi State's trip to the West Coast helped out there.

That does, however, point out that distance is a somewhat shaky measure to use as proof of the validity of a ratings system, since it only requires one connection between the teams to declare them to be comparable with a given degree of certainty. If this turns into a real concern, it might be useful to look at a more complicated metric that takes into account the number of paths at each distance -- in other words, how many common opponents or second-order team-opponent-other opponent-other team chains there are.

Boyd's World-> Breadcrumbs Back to Omaha-> A Look at the Distance Matrix About the author, Boyd Nation