The Plan
We’re going to start a long-running series where we take one model, interrogate it, and iteratively improve it. Although we typically prefer more obscure models like modeling how consistent players are, for this series, we’re going mainstream: predicting NBA final score lines. Why did we decide to model something fairly monotonous:
It’s complicated enough to keep us busy through 2025
Everyone is interested in this.
The Starting Model
Our starting model is extremely simple, with many flaws. It’s a hierarchical model where each team’s offense and defensive strengths are simultaneously learned and modeled hierarchically. As always, the full model code is at the bottom of the post.
In this post, we’ll show the model and what it’s outputting. In later posts, we’ll dive into the specific limitations and work on iteratively improving them.
One thing to note is that this model is predicting total regulation points. In the future, overtime might be incorporated. Surely this won’t cause confusion in later posts when we stop mentioning this detail.
Game Predictions
Since we have a fully Bayesian model, we can get point estimates (pun), but there is also uncertainty in every estimate. So we can get not just which team our model thinks will win, but their win probability. And not just how many points they are likely to win by, but: what’s the percent chance they win by 10 points? what’s the percent chance they win by 11 points? etc.
Predictions for tonight’s match-ups:
Bayesian Power Rankings
Although we have a deeply limited and flawed model, we can still get offensive and defensive power rankings out of it. As a reminder, everyone can publish power rankings, but we are the only outlet that puts error bars on our power rankings.
Offensive Power Rankings
Defensive Power Rankings
Look ahead
There’s enough low hanging fruit with this model to tackle. If there’s something in particular you hate about it, let me know and I can prioritize that.
The Model
// Heirarchical IRT regression
//
// This models the points of home and away teams
// as a function of the latent offensive and defensive
// strength of the teams.
data {
// Number of games
int<lower=1> N_games;
// Number of teams in the league
int<lower=1> N_teams;
// Home and away points scored in each game
array[N_games] int<lower=0> home_points;
array[N_games] int<lower=0> away_points;
// Team index for each game
array[N_games] int<lower=1, upper=N_teams> home_team;
array[N_games] int<lower=1, upper=N_teams> away_team;
}
parameters {
// Latent offensive and defensive strength of each team
// Hierarchical prior
vector[N_teams] theta_offense;
vector[N_teams] theta_defense;
real theta_offense_bar;
real theta_defense_bar;
real<lower=0> sigma_offense_bar;
real<lower=0> sigma_defense_bar;
// Noise in the points (same for home and away teams)
real<lower=0> sigma_points;
real home_field_advantage;
}
model {
// Priors
// Average strength of the teams
theta_offense_bar ~ normal(116, 10);
// Home field advantage, about 2 points
home_field_advantage ~ normal(2, 2);
// Variations of the teams strength
sigma_offense_bar ~ cauchy(0, 5);
sigma_defense_bar ~ cauchy(0, 5);
// Individual team strength
theta_offense ~ normal(theta_offense_bar, sigma_offense_bar);
theta_defense ~ normal(0, sigma_defense_bar);
// Gaussian noise in the points
sigma_points ~ cauchy(0, 5);
// Likelihood
for(game in 1:N_games) {
// Team points modeled as gaussian
real home_points_regression = home_field_advantage + theta_offense[home_team[game]] + theta_defense[away_team[game]];
real away_points_regression = theta_offense[away_team[game]] + theta_defense[home_team[game]];
home_points[game] ~ normal(home_points_regression, sigma_points);
away_points[game] ~ normal(away_points_regression, sigma_points);
}
}
generated quantities {
// Remove the mean from the latent variables
vector[N_teams] theta_defense_centered;
for (i in 1:N_teams) {
theta_defense_centered[i] = theta_defense[i] - mean(theta_defense);
}
vector[N_teams] theta_offense_centered;
for (i in 1:N_teams) {
theta_offense_centered[i] = theta_offense[i] - mean(theta_offense);
}
}
Excited to see this series of posts play out! Great choice!
The Wizards do surprisingly well in the offensive ratings. Clear sign something's off ;)