Ever since I was a kid, I have been fascinated by March Madness. As a teenager, I created hand-drawn brackets, many of which still reside in a laminated green folder on my book shelf. As a young adult, I started recording the results on a simple spreadsheet. Over the years, that spreadsheet became more complex and powerful.
Today, I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even how to get some hints as to how the tournament might play out.
While there is no foolproof way to dominate your office pool, I have discovered a few tricks along the way that I find helpful. While we wait for the games to begin in earnest on Thursday, I will share some of what I have learned and how it applies to the 2023 bracket.
Methodology Overview
The foundation of my methodology is an observation that I made several years ago that boils down to this:
When it comes to NCAA Tournament upsets, the behavior is exactly the same as in regular season games. The odds are largely predictable based on Vegas points spreads and by tools that can predict point spreads, such as KenPom efficiency margin data.
All of my analysis of college basketball odds is based on this premise. KenPom efficiency data can be used to assign probabilities to any arbitrary basketball match-up. Knowing this, the full season and any tournament can be mathematically modeled and its odds can be calculated.
My favorite plot to highlight this fact is shown below.
This figure compares the winning percentage for the higher seeds in the NCAA Tournament to the odds expected based on the average point spread of games with that seed combination. The figure shows that data for all seed combinations that have occurred at least 40 times since 1979.
Figure 1 tells us why No. 15 seeds have beaten No. 2 seeds 10 times over the past 37 years (7% of the time). It is because, on average, No. 15 seeds are 15-point underdogs, and 15-point underdogs win 7% of the time whether the game is played in March or in November.
There are a few notable deviations from this correlation. For example, No. 10 seeds have surprisingly good luck against No. 2 seeds, and No. 5 seeds do not upset No. 1 seeds in the Sweet 16 as often as expected. But in general, the correlation is very strong.
As for the correlation between the Vegas points spreads and the point differentials predicted by KenPom efficiency margins, Figure 2 below shows how strong this correlation is for the first-round games in the 2023 NCAA Tournament.
Figure 2 gives me confidence that Kenpom efficiencies can be used to model the results of the NCAA Tournament.