Player Improvement Modeling

Midway through the season: who is actually improving?

Feb 15, 2024

Our favorite type of question is “What can you actually infer when you only see a small sample of data”. Maybe a player shot 33% from three on a ton of shots last year. But then he goes 40% on his first ten threes this season. Is that enough data to conclude he’s a better 3PT shooter this year? We have all of last season’s data, but we only have an incomplete sample of data this season. So, mid-way through the season, who can we say is likely better this year? And who is worse? In this post, we’ll focus on 3PT shooting. But in later posts we’ll expand it.

The Model

As always, the full model is at the bottom of the post if you want to dive in. But here’s how it works at a high level:

3PT shooting for each player (and season) is modeled as binomial
Player shooting ability is modeled hierarchically. So when a player has only taken 5 attempts so far this season (and made all 5 of them), the model shrugs and says it won’t keep happening.
The difference in player shooting ability between this year and last year is sampled directly in the generated quantities.

If you’ve read enough of these posts, you probably could have guessed the model structure by now.

Improving Players

We’ll walk through LeBron James as an example. How well is LeBron James shooting from three this year and last year? And how certain are we in those estimates?

LeBron James’s 3PT shooting ability estimates or this year and last year

There’s a few things to notice:

It looks like he’s better this year than last. There is a bit of overlap between the season’s estimates, but it’s probably safe to say LeBron James is improving (we'll come back to this).
The uncertainty in this year’s estimate is greater than the uncertainty in last year’s estimate. As usual, this makes sense; we have less data to go on this year, so it’s harder to have precision in our estimates.
LeBron James’s actual 3PT% this season is 39.5%. Our hierarchical model gives him a slightly lower shooting ability. We don’t have enough data to think he’s going to keep shooting at 39.5% yet.

So, how much has he improved? And how certain are we?

Lebron James’s 3PT% Improvement Estimates

He’s probably improved about 5 percentage points, but the uncertainty is pretty large. Maybe 2.5%, maybe 7.5%. But there is a pretty good chance he improved.

Here are the top players that look like they improved at 3PT shooting this season (and our certainty in that improvement).

What’s most interesting is how few players we can say with certainty are improving. Man, Jalen Smith though.

Declining Players

Now, let’s look at players that are worse this season.

Luke Kennard really stands out.

Luke Kennard’s 3PT% Estimates this year and last year

Looking Ahead

I want add two improvements to the model:

For each player, put a prior on this year’s ability using their last years ability. If we haven’t seen much from them, assume they’re the same as last year.
Use other stat improvements to improve the 3PT shooting improvement estimate. If a player has very clearly improved in free throw shooting, is that predictive that they improved in 3PT shooting (even if we haven’t seen very much 3PT shooting yet?)

I think I’ll tackle these in multiple steps/posts.

I didn’t realize substack’s spell check was broken until after I sent out my last email. Nothing like coming back from a 2 year hiatus looking drunk.

Model Code

// Year-over-year improvement model

data {
  int<lower=1> N_players;
  int<lower=0> player_attempts_last_year[N_players];
  int<lower=0> player_attempts_this_year[N_players];
  int<lower=0> player_successes_last_year[N_players];
  int<lower=0> player_successes_this_year[N_players];
}

parameters {
    vector[N_players] theta_last_year;
    vector[N_players] theta_this_year;
    real theta_bar;
    real<lower=0> sigma_bar;
}

model {
    // Priors, hierarchical
    theta_bar ~ normal(-1, 10);
    sigma_bar ~cauchy(0, 5);
    theta_last_year ~ normal(theta_bar, sigma_bar);
    theta_this_year ~ normal(theta_bar, sigma_bar);

    // Likelihood
    for(player in 1:N_players) {
        player_successes_last_year[player] ~ binomial_logit(player_attempts_last_year[player], theta_last_year[player]);
        player_successes_this_year[player] ~ binomial_logit(player_attempts_this_year[player], theta_this_year[player]);
    }
}

generated quantities {
    // Transform theta back to probability
    vector[N_players] theta_last_year_probability;
    vector[N_players] theta_this_year_probability;
    theta_last_year_probability = inv_logit(theta_last_year);
    theta_this_year_probability = inv_logit(theta_this_year);

    // Calculate Difference
    vector[N_players] theta_probability_difference;
    theta_probability_difference = theta_this_year_probability - theta_last_year_probability;
}

Binomial Basketball

Discussion about this post