Those Husky bastards have done it again. After winning the 2023 national championship in one of the most dominant tournament performances ever (which you can read about here), they decided to just go ahead and do the same thing again in 2024. Except this time, they did it in even more dominant fashion. Disgusting.
I am, as you can probably tell, mad about it. But just because I already wrote the code, let’s go ahead and quantify their performance to see exactly how it stacks up to last year, and the other great championship runs in March Madness history.
By the way, almost all of this code is just copy / pasted from my post from the previous year. All I had to do was re-run it – that’s the magic of R!
library(dplyr)
library(lubridate)
library(janitor)
library(rvest)
library(hoopR)
library(kableExtra)
# Start by grabbing the html from our website
page <- rvest::read_html("http://www.hoopstournament.net/DynamicDrilldown.php")
# html_table() will pull the info out of all the tables on the page. Easy!
content <- page |>
html_table()
# The first and second tables are just for structural purposes on the site,
# so we're interested in the third one, where the actual data is.
ds <- content[3][[1]] |>
clean_names() |>
# Let's clean it up and add some useful variables
mutate(
date = mdy(date),
year = year(date),
team_year = paste0(team, " (", year, ")"),
margin = score - opp_score
) |>
# Lastly, filter out the years we don't care about
filter(year >= 1985)
head(ds)
## # A tibble: 6 × 15
## date seed team seed_2 opponent w_l score opp_score round
## <date> <int> <chr> <int> <chr> <chr> <int> <int> <chr>
## 1 2024-04-08 1 Connecticut 1 Purdue W 75 60 Nati…
## 2 2024-04-08 1 Purdue 1 Connect… L 60 75 Nati…
## 3 2024-04-06 4 Alabama 1 Connect… L 72 86 Nati…
## 4 2024-04-06 1 Connecticut 4 Alabama W 86 72 Nati…
## 5 2024-04-06 11 North Carolina S… 1 Purdue L 50 63 Nati…
## 6 2024-04-06 1 Purdue 11 North C… W 63 50 Nati…
## # ℹ 6 more variables: location_city <chr>, location_state <chr>,
## # box_score <chr>, year <dbl>, team_year <chr>, margin <int>
# First we need to figure out which team_year values ended up winning it all.
champs <- ds |>
filter(round == "National Championship" & w_l == "W")
tourney_runs <- ds |>
# champs only!
filter(team_year %in% champs$team_year) |>
group_by(team_year) |>
summarize(
team = unique(team),
seed = unique(seed),
cum_margin = sum(margin),
avg_margin = mean(margin),
n_games = n(),
list_margins = paste(margin, collapse = ", ")
) |>
arrange(desc(cum_margin))
# Let's look at the top 10:
tourney_runs |>
select(-team) |>
slice_max(order_by = cum_margin, n = 10) |>
kable()
team_year | seed | cum_margin | avg_margin | n_games | list_margins |
---|---|---|---|---|---|
Connecticut (2024) | 1 | 140 | 23.33333 | 6 | 15, 14, 25, 30, 17, 39 |
Kentucky (1996) | 1 | 129 | 21.50000 | 6 | 9, 7, 20, 31, 24, 38 |
Villanova (2016) | 2 | 124 | 20.66667 | 6 | 3, 44, 5, 23, 19, 30 |
North Carolina (2009) | 1 | 121 | 20.16667 | 6 | 17, 14, 12, 21, 14, 43 |
Connecticut (2023) | 4 | 120 | 20.00000 | 6 | 17, 13, 28, 23, 15, 24 |
Nevada-Las Vegas (1990) | 1 | 112 | 18.66667 | 6 | 30, 9, 30, 2, 11, 30 |
Villanova (2018) | 1 | 106 | 17.66667 | 6 | 17, 16, 12, 12, 23, 26 |
Duke (2001) | 1 | 100 | 16.66667 | 6 | 10, 11, 10, 13, 13, 43 |
Louisville (2013) | 1 | 97 | 16.16667 | 6 | 6, 4, 22, 8, 26, 31 |
Florida (2006) | 3 | 96 | 16.00000 | 6 | 16, 15, 13, 4, 22, 26 |
ds_teams <- espn_mbb_teams() |>
# The team names don't quite match up, so I'm just gonna fix em manually
mutate(team = case_when(
team == "UConn" ~ "Connecticut",
team == "LSU" ~ "Louisiana State",
team == "UMass" ~ "Massachusetts",
team == "Nevada" ~ "Nevada-Las Vegas",
team == "NC State" ~ "North Carolina State",
team == "VCU" ~ "Virginia Commonwealth",
team == "Miami" ~ "Miami, Florida",
TRUE ~ team
))
tourney_runs <- left_join(tourney_runs, ds_teams, by = "team")
Well would you look at that. By margin of victory, the 2024 UConn Huskies are the most dominant national championship team in the history of the NCAA tournament. Even more dominant than last year. Even better than the best ever Kentucky and North Carolina teams. Even bigger victories than that horrifying, 3-point raining 2016 Villanova team, who I had to watch crucify my Jayhawks with nobody for company but the worst roommate I’ve ever had.
Anyway, next let’s re-make the big chart from last time to see where 2024 UConn ranks all-time. Then I’m getting out of here so I don’t have to look at this disgusting data anymore.
library(ggplot2)
library(ggimage)
library(ggthemes)
tourney_runs |>
ggplot(aes(x = reorder(team_year, -cum_margin),
y = cum_margin,
fill = team)) +
# St. Deviations
annotate(geom = "rect",
xmin = -Inf, xmax = Inf,
ymin = mean(tourney_runs$cum_margin) + 0.5,
ymax = mean(tourney_runs$cum_margin) + 1 * sd(tourney_runs$cum_margin) - 0.5,
alpha = 0.2) +
annotate(geom = "rect",
xmin = -Inf, xmax = Inf,
ymin = mean(tourney_runs$cum_margin) + 1 * sd(tourney_runs$cum_margin) + 0.5,
ymax = mean(tourney_runs$cum_margin) + 2 * sd(tourney_runs$cum_margin) - 0.5,
alpha = 0.4) +
annotate(geom = "rect",
xmin = -Inf, xmax = Inf,
ymin = mean(tourney_runs$cum_margin) - 0.5,
ymax = mean(tourney_runs$cum_margin) - 1 * sd(tourney_runs$cum_margin) + 0.5,
alpha = 0.2) +
annotate(geom = "rect",
xmin = -Inf, xmax = Inf,
ymin = mean(tourney_runs$cum_margin) - 1 * sd(tourney_runs$cum_margin) - 0.5,
ymax = mean(tourney_runs$cum_margin) - 2 * sd(tourney_runs$cum_margin) + 0.5,
alpha = 0.4) +
annotate(geom = "rect",
xmin = -Inf, xmax = Inf,
ymin = mean(tourney_runs$cum_margin) - 2 * sd(tourney_runs$cum_margin) - 0.5,
ymax = 0,
alpha = 0.6) +
# Columns
geom_col() +
annotate(geom = "text",
color = "#5A5A5A",
label = "AVERAGE",
size = 5,
x = 35,
y = mean(tourney_runs$cum_margin)) +
# Std. dv labels
annotate(geom = "text",
label = "+1 std. deviation",
size = 5,
color = "white",
x = 35,
y = mean(tourney_runs$cum_margin) + 1 * sd(tourney_runs$cum_margin) - 3) +
annotate(geom = "text",
label = "+2 std. deviations",
size = 5,
color = "white",
x = 35,
y = mean(tourney_runs$cum_margin) + 2 * sd(tourney_runs$cum_margin) - 3) +
annotate(geom = "text",
label = "-1 std. deviation",
size = 5,
color = "white",
x = 35,
y = mean(tourney_runs$cum_margin) - 1 * sd(tourney_runs$cum_margin) + 3) +
annotate(geom = "text",
label = "-2 std. deviations",
size = 5,
color = "white",
x = 35,
y = mean(tourney_runs$cum_margin) - 2 * sd(tourney_runs$cum_margin) + 3) +
# Arrows
annotate(geom = "segment",
x = 12, xend = 1.5,
y = 128, yend = 138,
arrow = arrow(type = "closed"),
linewidth = 0.5
) +
annotate(geom = "label",
x = 12, y = 130,
label = "UConn (2024)") +
# Other styling
scale_fill_manual(values = paste0("#", tourney_runs$color),
breaks = tourney_runs$team) +
labs(
title = "Most Dominant NCAA Tournament Championship Runs",
subtitle = "by cumulative margin of victory, all tournaments since 1985",
x = "Team / Year",
y = "Cumulative Margin of Victory",
caption = "Data from http://www.hoopstournament.net/"
) +
guides(fill = "none") +
theme_fivethirtyeight() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1)
)
They even cleared well over 2 standard deviations above the mean! It’s pretty damn impressive, I have to admit, even if I am a Husky Hater. Hopefully I will be making this post abotu KU next year, or at least some team that’s not UConn. Give me literally any other Big East team, seriously. I’ll even take Shaka Smart’s Marquette at this point.