Making Predictions on Player Shooting Ability with little data
For G-League players, there's not much data to predict a player's shooting ability. So how can we make use of whatever data we have?
For a hot minute, players were getting called up from the G-League at hilarious rates.
Let’s say you want someone from the G-League to come in and simply fill the role of a 3PT shooter. Who’s the best 3PT shooter in the G-League? You could simply sort G-League players by 3P%
As we’ve written about before (and is obvious), this suffers from small sample size theater, so people resort to hacks like “Top 3P% with at least 75 attempts”. This hack creates two problems. First, what if a player makes 74 out of 74 attempts? Surely they would be the best in the G-League (but would be exlcuded due to our hacky filter). Second, sample sizes in the G-League are tiny, so any reasonable threshold on shot attempts would disregard nearly everyone.
In our series on predicting player shooting ability, we started with a simple model and then expanded it to incorporate their free throw shooting into our predictions. The general idea was if we didn’t know much about their 3PT shooting ability but we knew they were a terrible free throw shooter, that might indicate they were bad at 3PT shooting.
Here, we extend our model further to use player position and height to predict 3PT shooting ability. The concept is simple: if all we know is that a player is a Center, that already tells us something about the player’s 3PT shooting ability. We’ll start by looking at actual NBA players, then in a later post move on to G-league players.
First, let’s look at how much information position gives us. As usual, we’ll keep track of the uncertainty in our model, so we get intervals on all of our estimates.
There are a few key things to notice:
The model can’t distinguish between 3PT shooting for Point Guards, Shooting Guards, and Small Forwards. A simple “point estimate” model where you put a number on each position would say Shooting Guards are the best, but since our model keeps track of uncertainty, we can see it’s nearly as likely that Point Guards or Small Forwards are best.
Centers and Small Forwards are significantly worse. Even considering the uncertainty in the model, this is clear.
The uncertainty in center shooting 3PT ability is massive. This makes sense though- centers take much fewer 3PT shots, so the model has less data to learn from.
Now, let’s incorporate player height into the model. What we’re asking here is Is a tall Point Guard a better 3PT shooter than an average height Point Guard? And similarly, is a tall Center a better 3PT shooter than an average height Center? For each position, we looked at how height affects 3PT shooting. Below we are comparing an average height player at each position to a tall player (4 inches taller than the average height for the position).
Again, there are a few things to notice:
Being 4 inches taller doesn’t affect shooting at most positions.
Maybe being taller helps small forwards shoot even better (look at the uncertainty to get a sense of how much you trust this conclusion)
The model is less certain about tall players than average height players.
We’re slowly building up our model. In future posts, we’re going to add more features and throw it all together.
Looking ahead
Winter has set in. On my end, that means more camping in the snow and a slower mindset. I’m less creative, less willing to debug Stan Sampling, and less willing to type up results.
Stan Model
You can stop reading. This section is only for people curious about the underlying probability model. Either because they want to understand the details or they want to expand on it themselves. The model is a hierarchical model on player position that incorporates their height into the regression.
// A hierarchical regression binomial model.
// Models the total free throw rate over recent years
// Using their position
data {
int<lower=0> players;
int<lower=0> n_attempts[players];
int<lower=0> n_successes[players];
real height[players];
int<lower=0> position[players];
int<lower=0> n_positions;
}
parameters {
real player_value[players];
real position_theta_bar[n_positions];
real beta[n_positions];
real<lower=0> position_sigma[n_positions];
}
model {
beta ~ normal(0, 1);
position_sigma ~ cauchy(0, 5);
position_theta_bar ~ normal(0, 5);
for (player in 1:players) {
player_value[player] ~ normal(position_theta_bar[position[player]], position_sigma[position[player]]);
}
for (player in 1:players) {
n_successes[player] ~ binomial_logit(n_attempts[player],
player_value[player] + beta[position[player]] * height[player]);
}
}
generated quantities {
vector<lower=0, upper=1>[players] player_estimate;
vector<lower=0, upper=1>[n_positions] position_estimate_mean;
vector<lower=0, upper=1>[n_positions] position_estimate_tall1;
vector<lower=0, upper=1>[n_positions] position_estimate_short1;
vector<lower=0, upper=1>[n_positions] position_estimate_tall4;
vector<lower=0, upper=1>[n_positions] position_estimate_short4;
vector<lower=0, upper=1>[n_positions] position_sample;
for(player in 1:players){
player_estimate[player] = inv_logit(player_value[player] + beta[position[player]] * height[player]);
}
for(p in 1:n_positions){
position_estimate_mean[p] = inv_logit(position_theta_bar[p]);
position_estimate_tall1[p] = inv_logit(position_theta_bar[p] + beta[p] * 1);
position_estimate_tall4[p] = inv_logit(position_theta_bar[p] + beta[p] * 4);
position_estimate_short1[p] = inv_logit(position_theta_bar[p] - beta[p] * 1);
position_estimate_short4[p] = inv_logit(position_theta_bar[p] - beta[p] * 4);
position_sample[p] = inv_logit(normal_rng(position_theta_bar[p], position_sigma[p]));
}
}