March Madness: A Statistical Tour
Mark Brown, City College, CUNY and Columbia.


I will discuss the NCAA Division I Men’s Basketball Tournament, one of the most popular sporting events in the U.S. By the time of this talk the original field of 64 teams will have been reduced to 4 (known as “the final 4”). There are several methods of comparisons of teams that are available to fans and bettors. These include the AP poll of sportswriters, the ESPN/USA Today coaches poll, the RPI (rating percentage index), the Sagarin ratings, the tournament seedings, the Las Vegas odds, and one which I am currently involved with, the LRMC (logistic regression Markov chain) model. The LMRC model introduced by Professors Paul Kvam and Joel Sokol of Georgia Tech, (NRLQ, 2006), has been very successful in statistical tests against the above methodologies. It has received a good deal of media attention. You can find its predictions updated weekly during the college basketball season on Joel Sokol’s website, (google “LRMC Information Page” to find it). Professor George Nemhauser of Georgia Tech is also involved in the LRMC project. In reading the details of the method it struck me that it may be advantageous to replace the LR component of the method by an empirical Bayes approach. I worked out the details jointly with Joel Sokol , and our modified model has worked well on past tournament data. The new approach, found in a paper by myself and Sokol, now has its rankings posted along with the original method’s rankings on the above website. This is the first year of its use.
Empirical Bayes, the brainchild of Columbia’s late eminent statistician, Herbert Robbins, is now a widely used statistical tool. Robbins original approach was non-parametric but most of the ensuing applications, including this one, has employed what is now known as parametric empirical Bayes methodology. However another eminent statistician, Professor Bradley Efron of Stanford, has recently found the original non-parametric empirical Bayes to be useful in the analysis of micro-array data. As he writes, ‘the kind of massively parallel data sets that really benefit from empirical Bayes analysis seem to be more a 21st century phenomenon”. Efron earlier applied parametric empirical Bayes to baseball data, his paper with Carl Morris (JASA, 68, 1973), being a classic. Some other famous scientists have dabbled in quantitative sports analysis, the paleontologist Steven Jay Gould being one prominent example. I hope that we can relax and enjoy ourselves. After all, my objective in this research is simply to predict basketball outcomes. It’s a fun activity!

Back to the schedule