What's the Best Way to Build the NCAA Tournament Bracket?
Humans or computers? Best teams or best résumés? We debate the criteria used to determine the field of 64.
Thanks for reading the Her Hoop Stats Newsletter. If you like our work, be sure to check out our stats site, our podcast, and our social media accounts on Twitter, Facebook, and Instagram. You can also buy Her Hoop Stats gear, such as laptop stickers, mugs, and shirts!
Haven’t subscribed to the Her Hoop Stats Newsletter yet?
Calvin Wetzel (WAB Zealot): Hey Aaron, when we first discussed my piece to introduce Wins Above Bubble (WAB) to our readers, your first reaction was that we should separate it into two articles: one on how it works and another on why I’m such a proponent of it. As I have learned, you are not, so you mentioned the idea of writing up a piece debating WAB, NCAA ranking systems, the tournament, and whatever else we want. Ready to dive in?
Aaron Barzilai (Founder and CEO of Her Hoop Stats): Absolutely. So much of what we do is watching the game, writing about the game, writing the code for our stats site, and more. It’s fun to step back and talk about some of the philosophy behind how we see the game. Especially when we don’t completely agree. If only we were doing this over Zoom so people could see our eye rolls and mischievous grins as we try to turn the knife while making our point.
So, to start, why don’t you just remind readers what WAB is and why you like it.
Calvin: Sure. The best way I can describe it is that WAB tells you how many more wins a team has than the number of wins a bubble team would be expected to have against the same schedule.
There are a few main aspects of WAB that I like:
It’s objective
It levels the playing field between power conference schools and mid-majors
It doesn’t incentivize running up the score
There are no arbitrary cutoffs (i.e. “quadrants”)
Aaron: Well, when you say it like that, what could we even debate? I’ll just catch you next week….
Of course, I do have a few comments about your statements, or as I might say it, claims, about WAB. First, though, that’s all about WAB. You told me last week one other thing which you haven’t mentioned yet. Do you believe that the selection and seeding of NCAA Tournament teams should be completely based on a formula such as WAB, or do you think that it should be done by a committee as it has been until now?
Calvin: I love WAB, but I’m certainly open to other objective systems, whether that’s another metric (such as strength of record) or a combination of metrics. But I think the biggest problem with having humans determine which teams make the field is that humans are, well, human. Everyone has subconscious biases and limitations. Professional leagues don’t use humans to determine who makes the playoffs or seeding, and it’s mildly absurd to even think about that possibility. Last season in the WNBA, the Mystics edged out the Wings for the final playoff spot by finishing one game ahead of the Wings in the standings. But what if it had been left up to humans? A committee could have awarded Dallas the No. 8 seed if it wanted to.
There is one notable difference between pro sports and college sports, of course, and that is the disparate nature of scheduling. Pro teams don’t need to schedule their own games — their leagues do it for them in a generally equitable manner. In contrast, college coaches are tasked with finding their own opponents, which leads to a gap between the haves and the have nots. Two pro teams at 18-4 are probably fairly evenly matched; two college teams at 18-4 may be at two completely different levels.
While that does mean college sports need a different postseason determination system than pro leagues (which typically just use straight win-loss record), it doesn’t mean humans need to be involved. Using WAB to determine at-large bids and/or seeding would remove human subjectivity from the process.
Aaron: Well, that’s the first place that I differ with you. As I think is often the case with people who work with data to build predictive models (such as our Her Hoop Stats rating), I feel like I’m too familiar with the weaknesses of all the different stats, ratings, and formulas to ever think we can just decide the best teams based on math. I mean, you wouldn’t name the national champion based on WAB or any other metric, would you? Nor would you use it to just decide who plays in the championship game. So, why would you use it to decide who makes the tournament?
I think we both agree that one of the best things about the NCAA tournament is that it’s settled on the court. The team that wins six games in a row is the champion, and with a field of 64 we’re confident that the teams that have a reasonable chance to win the tournament are in it. Also, I’m a firm believer that the champion is the champion, even if I think the “best” team lost during the tournament. There’s a human element to it that is essential to the game; I always say you can’t just “manage a game by spreadsheet.”
I think that applies to the tournament bracket as well. I don’t think it can be done just by spreadsheet, because a spreadsheet (or rating system) can’t account for every piece of information that might be relevant. Just like the human limitations you mentioned above, any of the formulas that could be used for the bracket aren’t perfect either. Plus, one thing I think would help the game grow would be more debates about the bracket the committee does. That’s something I’ve heard coaches say is a feature of the coverage of the men’s tournament but not as much on the women’s side. Don’t you want us to be able to debate the bracket for the three days between the selection show and the first game tips?
Calvin: Actually, I don’t. I do feel that we should be having the same conversations on the women’s side as we are on the men’s side, and I know that those debates are very prevalent on the men’s side. However, I often find myself watching TV every year during that week and wishing that they’d stop debating the selection and start previewing the upcoming matchups. And I feel that way on the men’s side too. I can have fun with the seeding and selection debates leading up to the Selection Show and maybe even for a few hours afterwards. By the time I wake up the next day, however, I no longer care whether or not Team A deserved a bid; I’d much prefer the talking heads tell me why or how Team A can upset Team B.
Aaron: Normally I like a good day of analyzing the bracket (some might call it complaining) before really digging into the matchups, so that’s not much different than a few hours. This year, though, the Selection Show is Monday night and the first games aren’t until Sunday, so I think there’s time to do both, but we can agree to disagree.
Let’s dig into one of your other points: “It doesn’t incentivize running up the score.” It sounds good when you say it that way, but from my point of view you’re throwing out useful information about the teams. All wins are not created equal. If two teams play the same schedule and have the same won-loss record but one team wins by an average of five points a game and another wins by 15 points a game, don’t you think the latter team is better? I sure do. I get that we don’t want to incentivize teams to try to win by 50 or more, but that can easily be solved by treating all wins by more than, say, 30 points as equivalent.
After all, people loved to complain about RPI and one of the issues there was that RPI only considered wins and losses, not scores. By ignoring margin, aren’t you worried WAB is a little closer to RPI than it needs to be?
Calvin: I definitely hear you about margin being useful information. And I would agree that in your example, the team with the average margin of 15 would be the “better” team. I guess I’m not sure that I would want to use “better” as a selection criterion when “better” is used predictively. To use an extreme example, let’s say a team faces only top ten competition all year and loses every game by one point. That team is probably actually really good (and the unluckiest team in sports history). But I still don’t think an 0-30 record would deserve a bid.
If we do base selection on a predictive kind of “better,” I’m not sure if there’s a way to incorporate the useful information that margin provides without incentivizing running up the score. Even if we cap it at 30, there’s still an incentive for a team that’s up 20 with under two minutes left to leave the starters in. I’m not sure many coaches would take it to that extreme, but I do know of at least one example of a coach altering her decision-making when there was a reward for margin. In the Reef Tournament of the 2019 Paradise Jam, South Carolina had a 13-point lead on Baylor with the ball and under 10 seconds left. Baylor had begun walking off the court, but Ty Harris took the free layup to extend the lead to 15 with a second remaining. South Carolina needed to win that game by at least 15 in order to win the tournament championship due to the tournament’s margin tiebreaker. Dawn Staley explained her intentions after the game, and I absolutely believe she made the right decision given the rules, but I also don’t like the system that necessitated that move.
Now, it’s true that incorporating margin into tournament selection wouldn’t be quite as immediate and tangible of a payout for any given garbage time bucket as what we saw at the end of the Paradise Jam, but for me it just comes down to a general principle of what we are rewarding. I think maybe that’s where we differ; you’d like to see teams rewarded more heavily for blowing someone out than for winning by one whereas I’d like the win to be the only goal.
In terms of the RPI comparison though, you are correct that WAB and RPI both ignore margin. I think the biggest difference that makes WAB more trustworthy in my eyes is the fact that it still takes into account how hard a specific game is to win. There is a predictive measure of team strength in the formula, just not a team’s own rating. The WAB awarded for each game is based on the opponent’s team strength and a bubble team’s strength using a predictive rating for team strength, which is an upgrade over the RPI. In a sense, margin of victory is a component of the WAB formula, but only your opponent’s margin and the bubble team’s margin are particularly significant, as opposed to your own.
Aaron: Sorry, you haven’t persuaded me. In our crazy hypothetical world, join with me on two ridiculous assumptions:
We have perfect information on how good each team is
For some reason, the ten best teams are all in the same conference and only play conference games
If the worst team in that conference loses every single game, I still think they’re the tenth-best team in basketball.
Now, you bring up a good point that I do want the best teams in the tournament. Turning devil’s advocate against myself, I suppose that the logical extension of that argument would be to do away with automatic bids altogether and just pick the best 64 teams, probably leaving out some conferences completely. I’m not quite ready to go there, but I do think when selecting the at-large teams we want them to be the best teams that aren’t conference champions. That’s who I want to reward. Philosophically, I’d say we want to make sure that every team that could win the tournament is in the tournament.
I’m obviously not as concerned as you about running up the score. As long as everyone knows the rules upfront, I think it’s fine. That clip you showed isn’t the greatest, but as they say, “If you don’t want them to run up the score on you, then stop them.” Part of the issue in that situation is that the score only affected one team at that moment. If everyone agreed on the importance of score, you’d see teams playing hard till the end, not unlike bench players in the WNBA trying to prove they belong in the league even if the game is out of reach. And, if you think about it, goal differential hasn’t exactly held back the popularity of the World Cup.
I’m sure I haven’t persuaded you either, so let’s talk about one other thing: You mentioned the word “predictive” above. To me, a predictive metric would be the metric of choice, as I think you want to have the teams in the tournament that will be the best going forward as I alluded to earlier. I think one of the reasons you like WAB is because you disagree. Is that right?
Calvin: Correct, although before I get to that I want to quickly clarify my thoughts on running up the score. I actually don’t have a problem with running up the score when it comes to meaningless buckets like the one in the video above. (My intent in sharing the video was more to say that rewarding margin can and has influenced coaching decisions.) I agree with the “if you don’t want them to run up the score then stop them” premise. But I do think that starters should be out of the game at a certain point. If a bench player wants to take a free layup with one second left in a blowout, go for it. But let’s at least make sure the bench players are seeing the court in 30-point games.
As for the debate on whether selection should be predictive or whatever the opposite of predictive is (retrodictive?), I do believe that selection should be retrodictive. I’m certainly okay with agreeing to disagree on that one. You could probably twist my arm into using a predictive rating for seeding though, after the 64 teams are chosen.
To continue with the crazy hypothetical world, let’s say that in addition to this winless team that is still a top ten team, we also have a team from a low major league that goes 33-0 before falling in its conference’s tournament championship game. In my view, that team deserves a bid almost regardless of who those games came against, and I’d automatically take it over the winless team. I understand the idea that we want the best teams in the tournament and that the winless team in this scenario is the better team. But I also want winning to be rewarded, at least at a greater level than it is in predictive metrics. For the sake of our readers, it should be noted that most predictive metrics treat a one-point loss and a one-point win as nearly identical, which makes sense when forecasting future results. But doing so for tournament selection would seem to me to dilute the value of winning and take excitement away from regular season buzzer beaters and overtimes.
Aaron: Well, in a not-so-hypothetical world, would you invite an undefeated DII team into the DI NCAA tournament? I’d say that’s kind of the situation you’ve outlined.
I do think that the committee would say they are rewarding performance over the course of the season in selection. I attended a mock selection exercise for media and coaches in the summer of 2018, and according to my very rough notes, they talked about “selections consider what a team accomplished during the season to be under consideration while seeding is about who that team is NOW!” I remember at the time having the opinion it should all be about who that team is now, and I haven’t changed my thinking yet even if I’m in the minority.
In particular, I think it’s really challenging how to handle teams that have players who miss time. I remember contemplating those notes about the bracketing process when Katie Lou Samuelson missed four games at the end of the 2018-19 season (the final regular-season game and the AAC conference tournament). If a team loses a game in that situation, do you treat it like a normal loss? Somehow weight it less? And if you do weight it less, aren’t you really predicting how they would have done during those regular-season games? That gets to another point I’ll make in a minute, but that’s why it’s cleaner to me to have one approach to both selection and seeding.
By the way, this also comes up if a player ends up being out for the season. If that would drop a team from maybe the 30th-best team in the country (let’s also say they have the 30th-best résumé looking backward) to 80th, should that team be in the tournament or not? As I mentioned above, I want the teams with the best chance of winning in the tournament, so as harsh as it is I’d drop them out. Of course, I’d do it the other way as well, where a team returns a key player and gets into the tournament because they’re the 30th-best team now (e.g. the last five games) even if they have the 80th-best résumé.
Calvin: It sounds like that’s basically what our disagreement boils down to — you prefer that selection criteria look forward while I prefer that they look back. I can see both sides. It makes sense from a viewership perspective to want the current best teams on display in any league’s postseason. I do agree with you that it’s harsh to drop the 30th-best team with the 30th-best résumé out of the field because they lost a star to injury in the conference tournament. Harsh enough that I wouldn’t be in favor of it. I wouldn’t mind at all dropping them a few seed lines, but I think if they’ve earned it then they’ve earned it. And a team with a top-30 résumé has earned the right to dance.
I know I brought up the comparison to pro leagues earlier when we talked about objective vs. subjective postseason selection, but I think it’s a relevant point here too. Pro leagues don’t currently adjust selection or seeding based on available personnel. Would you be in favor of that changing? For example, if A’ja Wilson had gotten injured right before the playoffs last season, would you have been in favor of moving the Aces down on the seed list, or even dropping them out of the playoffs?
Before I toss it back to you to answer that question though, I will answer yours about an undefeated DII team. Since DII has its own postseason, I wouldn’t allow that team in the DI tournament. I don’t think we’d need to since they would have a championship to compete for either way. But if DII and DI for some reason combined postseasons and only one tournament was held between them, then I would let that team in. I am a strong proponent of undefeated teams in any sport automatically qualifying for their league’s postseason (which is what I hate about NCAA football, but that’s a can of worms for another day).
Aaron: Maybe my DII example isn’t the best since at some level they’d be like another conference. But if there were two undefeated teams…
Back to higher ground for me. You bring up the case of pro leagues. At some level, I’d argue that’s exactly what they’re doing with play-in games/wildcards. They may call it the playoffs, but aren’t the first two “rounds” of the WNBA playoffs that are really single elimination a mechanism to get the teams that are playing the best now into the semifinals when the teams play a series? That’s sort of what happened this year when Nneka Ogwumike missed the game for the third-seeded Los Angeles Sparks due to a migraine and the seventh-seeded Connecticut Sun won and moved on to face the Las Vegas Aces in the semis. The new NBA play-in games basically achieve the same thing; it’s a mechanism to let a hotter tenth seed playing better than the tenth-best team in the conference replace a seventh seed that is now weaker.
Final question for you: One thing you mentioned is that you don’t like a predictive model as much and think the committee should look back at the résumé over the course of the season (I can’t bring myself to call it retrodictive). I’d argue that because WAB is using a prediction for how a bubble team would have performed against a team’s schedule, it’s still a predictive metric, just in a slightly different way. I get it that for a pro league a team’s record is really that team’s resume looking back, and it’s a good way to establish the playoff bracket. I’d never recommend using a predictive metric in that situation, so looking back at the course of the season makes sense in the pros. I just think that even WAB isn’t as “retrodictive” as you’d like it to be in the case of the NCAA, and if you’re going to use predictions, let’s use predictions now. What do you think about that issue?
Calvin: You raise a good point about the “play-in” games in the WNBA or the NBA, although that doesn’t address the issue of moving a No. 1 seed in those leagues down several lines due to injury.
To your question about WAB’s predictive value, you are correct that it uses predictions for how a bubble team would do. However, I don’t think that using predictions in the formula is the same as being predictive. Predictive metrics are designed to predict games (duh!), and I don’t think WAB would do a particularly good job at that. WAB currently ranks Bowling Green ahead of Oregon. I love a good mid-major victory over a Power 5 school as much as anybody, but I’d pick Oregon in that game at any location.
Like I mentioned earlier though, I’d be open to using other metrics. WAB itself isn’t the hill I want to die on — my broader hopes would be for the criteria to be objective (replacing the opinions of biased athletic directors with computers) and for there to be no incentive to run up the score. The former can be accomplished with a truly predictive metric. We could achieve the latter by capping the margin at, say, 10 points rather than 30. (The original men’s NET claimed to cap the margin at 10 points, but its inclusion of uncapped net efficiency ratings sort of negated that, so they did away with it altogether.) A better idea might be to only incorporate the point differential for the first three quarters into the formula (which would be admittedly messier on the men’s side since the men don’t use quarters). It probably wouldn’t be my first choice, but I could compromise on using a predictive rating if it ignored the fourth quarter entirely.
Aaron: Well, since it sounds like we’re agreeing a little, I think this is a good place to wrap up the conversation. I’m definitely all for objective and, most importantly to me, consistent decision making. I think that consistency is definitely one argument for a purely algorithmic approach, even if we know the algorithms aren’t perfect (though I’d hope they are improving over time).
Most importantly, what’s great about this discussion is that we don’t have to agree on the specifics to agree that we’re approaching “the most wonderful time of the year.” Plus, while we can differ on what we think the “perfect” bracket is, we both know that teams will have a chance to prove on the court who is the 2021 NCAA champion. Here’s to March!
Calvin: Exactly! Life would be much less fun if we all agreed on everything all the time. But we can definitely agree on that last point — it’s the most wonderful time of the year!
Thanks for reading the Her Hoop Stats Newsletter. If you like our work, be sure to check out our stats site, our podcast, and our social media accounts on Twitter, Facebook, and Instagram. You can also buy Her Hoop Stats gear, such as laptop stickers, mugs, and shirts!
Haven’t subscribed to the Her Hoop Stats Newsletter yet?
I wanna hear Aaron’s real answer on if a #1 seed should be moved down due to key injury. Cop out answer w the play-in comments 😁