Our favorite type of question is “What can you actually infer when you only see a small sample of data”. Maybe a player shot 33% from three on a ton of shots last year. But then he goes 40% on his first ten threes this season. Is that enough data to conclude he’s a better 3PT shooter this year? We have all of last season’s data, but we only have an incomplete sample of data this season. So, mid-way through the season, who can we say is likely better this year? And who is worse? In this post, we’ll focus on 3PT shooting. But in later posts we’ll expand it.
The Model
As always, the full model is at the bottom of the post if you want to dive in. But here’s how it works at a high level:
3PT shooting for each player (and season) is modeled as binomial
Player shooting ability is modeled hierarchically. So when a player has only taken 5 attempts so far this season (and made all 5 of them), the model shrugs and says it won’t keep happening.
The difference in player shooting ability between this year and last year is sampled directly in the generated quantities.
If you’ve read enough of these posts, you probably could have guessed the model structure by now.
Improving Players
We’ll walk through LeBron James as an example. How well is LeBron James shooting from three this year and last year? And how certain are we in those estimates?
There’s a few things to notice:
It looks like he’s better this year than last. There is a bit of overlap between the season’s estimates, but it’s probably safe to say LeBron James is improving (we'll come back to this).
The uncertainty in this year’s estimate is greater than the uncertainty in last year’s estimate. As usual, this makes sense; we have less data to go on this year, so it’s harder to have precision in our estimates.
LeBron James’s actual 3PT% this season is 39.5%. Our hierarchical model gives him a slightly lower shooting ability. We don’t have enough data to think he’s going to keep shooting at 39.5% yet.
So, how much has he improved? And how certain are we?
He’s probably improved about 5 percentage points, but the uncertainty is pretty large. Maybe 2.5%, maybe 7.5%. But there is a pretty good chance he improved.
Here are the top players that look like they improved at 3PT shooting this season (and our certainty in that improvement).
What’s most interesting is how few players we can say with certainty are improving. Man, Jalen Smith though.
Declining Players
Now, let’s look at players that are worse this season.
Luke Kennard really stands out.
Looking Ahead
I want add two improvements to the model:
For each player, put a prior on this year’s ability using their last years ability. If we haven’t seen much from them, assume they’re the same as last year.
Use other stat improvements to improve the 3PT shooting improvement estimate. If a player has very clearly improved in free throw shooting, is that predictive that they improved in 3PT shooting (even if we haven’t seen very much 3PT shooting yet?)
I think I’ll tackle these in multiple steps/posts.
I didn’t realize substack’s spell check was broken until after I sent out my last email. Nothing like coming back from a 2 year hiatus looking drunk.
Model Code
// Year-over-year improvement model
data {
int<lower=1> N_players;
int<lower=0> player_attempts_last_year[N_players];
int<lower=0> player_attempts_this_year[N_players];
int<lower=0> player_successes_last_year[N_players];
int<lower=0> player_successes_this_year[N_players];
}
parameters {
vector[N_players] theta_last_year;
vector[N_players] theta_this_year;
real theta_bar;
real<lower=0> sigma_bar;
}
model {
// Priors, hierarchical
theta_bar ~ normal(-1, 10);
sigma_bar ~cauchy(0, 5);
theta_last_year ~ normal(theta_bar, sigma_bar);
theta_this_year ~ normal(theta_bar, sigma_bar);
// Likelihood
for(player in 1:N_players) {
player_successes_last_year[player] ~ binomial_logit(player_attempts_last_year[player], theta_last_year[player]);
player_successes_this_year[player] ~ binomial_logit(player_attempts_this_year[player], theta_this_year[player]);
}
}
generated quantities {
// Transform theta back to probability
vector[N_players] theta_last_year_probability;
vector[N_players] theta_this_year_probability;
theta_last_year_probability = inv_logit(theta_last_year);
theta_this_year_probability = inv_logit(theta_this_year);
// Calculate Difference
vector[N_players] theta_probability_difference;
theta_probability_difference = theta_this_year_probability - theta_last_year_probability;
}