I recently developed a win probability model for the awesome py_ball package in Python. The package itself makes NBA/WNBA data accessible to a wide audience. If you haven’t seen it, you should definitely check it out. The link is https://github.com/basketballrelativity/py_ball.
In this blog post, I’ll describe the methods I used to develop the model.
Our model heavily relies on a series of logistics regressions, which are dependent on (a) the amount of time remaining in the game (b) the point differential and (c) who has possession. As of right now the only bias we introduce at the beginning of the game is home court advantage, which is why the home team always has slightly better odds than the away team. This is because we feed everything into the model with respect to the home team, so the model learns that the home team has a slight advantage. We are hoping to add betting odds to find true pre-game win probabilities.
In order to develop the model, we use a method that Brian Burke used in his win probability models, splitting up the game into multiple groups.
We split the game up into 960 groups (one group every 3 seconds), where we run a separate logistic regression each. Each logistic regression takes in the point differential and who has possession. We do not need to explicitly input the time, because each model is only trained on a specific timeframe.
For games that go into overtime, we treat the 5 minutes left as if there are 5 minutes left in the fourth quarter. This is to ensure that there is enough training samples for the model to actually learn something. For instance, there are very few games that go into 4OT, so a logistic regression model would not actually be able to recognize any trends with a lack of data.
The model is trained on 5 seasons worth of data from 2013-14 to 2017-18 games.
We evaluated our model on the 2018-19 data, using a brier score.
The brier score is the average of the mean squared error for every time frame. For instance if the model predicts a 0.58 probability of winning at a given time and that team won at the end of the game, we add (1-0.58)^2 to the brier score. We add all of these values for the entire game and divide by the total number of events. There is one event every 3 seconds.
We received a brier score of 0.167 for our model. This is a fairly decent value, because this means, on average, we are predicting the outcome of the game correctly.
The following examples are comparisons between our model (top) and inpredictable.com’s model
I recently tweeted some assist heat maps that were generated using 2015-16 SportVU data here.
Although the individual player heat maps are interesting, I wanted to look at more league-wide trends. I also wanted to explain my methods a little bit more.
The reason why I found this specific problem interesting was because of its potential implications.
Players in the NBA and all of basketball have inherent bias for where they prefer shots. For instance, if a player like Ben Simmons were standing at the 3-point line, you wouldn’t guard him as tightly as you would Stephen Curry. Essentially, you could adjust coaching strategy if you better understand player tendencies.
Analysis of the specific locations of which players prefer to pass and shoot could prove useful, as it would help players anticipate what could happen next in a specific play. This would eventually improve defensive strategy for teams.
As I stated above, I used the 2015-16 SportVU data for the generation of these graphs. This data captures every single player on the court and the ball ~25 times every second. Although it would have been optimal to have multiple seasons’ worth of data, I was unable to find it.
I cross-reference this SportVU data with play-by-play data from stats.nba.com to determine when assists occur.
Then, using an approximate timing of the assist event from the play-by-play data, I record the location of both the passer and the shooter. This entire process is done with pandas and Python.
After this, I use a KDE or kernel density estimator from Seaborn to generate the heat map plot. The KDE allows us to use our sparse data into more of a continuous spectrum for better visualization.
As stated in the caption, the left image is a heat map of all the assist locations in my limited dataset. The right image is a visualization of just the shots off of those assists.
Right off the bat, there seems to be more variation in the locations in which players take shots than where they pass from. This makes sense, as point guards are typically the ones assisting the ball, most of whom stay around the top of the key.
Clearly, based on the dataset, “drive-and-kick” assists aren’t as common as the normal, top of the key assist.
Further, as expected, we see that players standing in the corner are more likely to shoot than make a pass.
How good is the generalization?
Left to Right: Eric Bledsoe(PHX), Stephen Curry(GSW), John Wall(WAS)
The above charts show that the generalization above does not capture the variability per player.
Even for players who play the same position (PG): Eric Bledosoe, Stephen Curry, and John Wall, there are stark differences between their individual assist charts.
It was interesting to me that Steph has a tendency to assist from the right side of the court, while John Wall has a tendency to do so from the left side. However, with the limited size of the dataset I was working with, it’s possible that the data I was given does not capture the full picture.
What can we do with this information? Well, we clearly see that Eric Bledsoe is more likely to pass when he’s in the paint versus when he’s at the three-point line. If a coach is able to adjust his strategy of how to defend a player like Bledsoe, it would likely improve that team’s overall defensive numbers.
To me, LeBron specifically was especially interesting. It seems as though, of all of the stars with significant assist numbers, LeBron has the most unpredictable assist locations.
This is one of the aspects of LeBron’s game that makes him such a difficult player to defend. Not only can he shoot and pass, but he can do both of these actions pretty much anywhere on the court.
Left to Right: GSW, CLE, MEM
Not only that, but teams also have stark differences overall. Above are 3 different teams, the Warriors, Cavaliers, and Grizzlies. All of these teams have vastly different ways of play, and as a result have different locations in which they pass.
In the future, I want to be able to apply these similar types of visualizations to other seasons’ tracking data. As stated here, it would be interesting to see how players/teams change over time.
I also think it might be interesting to run a sort of clustering algorithm on this data combined with shot chart data, to identify types of players.
If you have any suggestions on what else I could do with this information, please let me know through email (email@example.com) or Twitter (@avyvar)
With the NBA season being postponed, there has been a lack of basketball in the world. As a result, I thought it would be interesting to look into depth about how the Elam Ending has a place in the current NBA and how it would work.
What is the Elam Ending?
If you didn’t watch the All-Star Game in 2020, the Elam Ending is an idea where each team at the start of a period has a target score rather than fighting against the clock. Rather than having a 5 minute overtime or a 12 minute fourth quarter, each team would have to score a certain number of points, based on the higher score in the game.
For instance, if Team X had 75 points and Team Y had 70 points at the end of the third quarter, the target score would be some number of points above team X’s score. In the All-Star Game, this number of points was 24, in memoriam of Kobe Bryant. If 24 was used in this hypothetical game, the target score would be 99, and the first team to reach 99 would win the game.
A more in-depth description of the Elam Ending can be found here.
Applying the Elam Ending to overtime in the NBA has been widely suggested by NBA fans. In fact, Daryl Morey, the Houston Rockets’s GM, supports the implementation of the Elam Ending as well. It seems like a perfect, non-intrusive way of applying the idea to the NBA. As a result, we will investigate how the Elam Ending in overtime would work in today’s NBA.
How many points till the target score?
Teams have scored, on average, 10.3 points per overtime period in the league from 2011-12 to 2019-20.
The above graph describes the year by year points in overtime for teams. It is evident that the number of points being scored in overtime is increasing each year, due to the rise of three point shooting and efficient basketball. As a result, if we would like to maintain roughly the same amount of game time, we should have the target score be 11 points from the score in regulation.
How would win probabilities change?
For the purposes of this article, I will be using the following probability values to examine how the Elam Ending would change things. Thanks to Mike Beuoy (@inpredict on Twitter) for providing these values so I didn’t need to find them myself. In addition, for my comparisons to the timed overtime period, I use http://stats.inpredictable.com/nba/wpCalc.php.
This graph gives the frequencies of points scored on a given possession. During the 2019-20 season, teams scored zero points on a possession 50.5% of the time, 1 point on a possession 3.1% of the time, etc.
I wanted to examine this in depth, so I started from the beginning of a play with the jump ball. I tried to answer the question: how much does the jump ball affect the outcome of the game?
Using the probability distribution described above, I ran 1 million simulations of an overtime period, going up to 11 points.
The team that won the jump won the game ~54.4% of the time, while the team that lost the jump won the game ~45.6% of the time. This implies a ~4.4% advantage for winning when game when your team wins the jump ball.
Comparatively, the probability of winning the game given a 5-minute overtime period is ~0.542, negligibly lower than the Elam Ending probability. As a result, it does not seem that the importance of the jump ball changes with the implementation of the Elam Ending.
The next step is to see how the probability of winning a game in the Elam Ending compares to the probability of winning a game in regular overtime.
The first thing to realize is that the win probability in normal overtime is a function of the score differential and the amount of time left in the game. In comparison, the win probability in the Elam Ending is a function of each team’s score and the number of points to the target score.
A major difference that we realize is that as the time approaches 0 in a normal overtime period, the probability of winning a game approaches 1 or 0 with few exceptions. In contrast, with the Elam Ending, the probability of winning the game does not approach 1 or 0, as the team score is not a continuous variable like time is.
This nature is, in part, what makes the Elam Ending so exciting. It makes the losing team always feel like they have a chance, which leads to good play throughout the overtime period.
For instance, if there are twenty seconds left in a game and it is a four point game in a regular overtime period, the game turns into a free-throw shooting game, which very likely leads to the leading team winning the game. This is also not an exciting game to watch.
Rather, if the score is 6-10 in an overtime period with the Elam Ending, there exists a higher probability that the losing team wins the game. This makes the game far more fun to watch and it prevents intentional fouls.
Below are two graphs highlighting the win probability of a team leading by 4. The x-axis on the normal OT graph is the amount of time left in the game, while the x-axis on the Elam Ending OT graph is the game score.
It is evident on this graph that the win probability for the winning team with the Elam Ending is not continuously increasing.
This can be explained through the following example.
When it is 6-10 vs 5-9 in the period, the losing team is closer to the target score than before. For the winning team, if they take a field goal, they still have the same probability of making it as before. As a result, the winning team does not gain any advantage, while the losing team gains an advantage.
In addition, it is evident that the win probability never approaches 1 in the Elam Ending. This means the game is harder to predict, again, making it more fun to watch.
What types of shots should you take?
I also wanted to look at what types of shots the winning teams were taking when they won the game. Obviously the winning team will score more points than the losing team, but what areas of the game was the winning team exploiting.
Below is a comparative bar chart which highlights how many 1 points possessions each team was having, how many 2 point possession each team was having, etc.
Not surprisingly, the winning team was scoring more 3’s and 2’s than the other team. Based on the difference in the heights of the winning and losing bars in each of those categories, teams, on average, outscore their opponents ~3 points more on 2’s and ~3 points more on 3’s. As a result, based on the average shooting tendencies of an NBA team, 3’s and 2’s are equally important in the Elam Ending.
However, it has been proven time and time again that shooting 3’s usually generates more points per shot. As a result, it is obvious that making a high amount of 3’s could prove useful in any game. However, there is also a point where taking more threes is detrimental in the Elam Ending.
In this graph above, I assume that the 3 point shooting percentage is 35% and the 2 point shooting percentage is 50%. Based on these numbers, it seems that you should shoot 90% of your shots from 3-point range in order to maximize your win probability. Of course, that number does not mean very much, as in-game dynamics such as defense could drastically affect this value.
Although the Elam Ending is nontraditional when it comes to professional basketball, the implementation of the rule in the NBA would make games more exciting. It would introduce more randomness to the game, and have fans holding their breaths until the final shot.
My previous blog post showed how cluster-able NBA shot charts were. I recently made a few improvements to the model and looked into things that I didn’t look into in the previous article.
A quick summary of that article is that I generated a 14 dimensional vector with shot frequencies for different locations on the court. Then I ran k-means clustering on this vector for each player over a season.
Most of the methodology is the same between the two, so please read the other article for more depth.
Number of Clusters
In my previous iteration, I used 3 clusters. However, I generated a plot that aimed to find the optimal number of clusters. Using the ‘elbow-method’ for k-means clustering, I found that the optimal number of clusters was probably a bit more, around 5.
After running the clustering algorithm, these were 5 example shot charts for each cluster.
Since we added more clusters, I interpreted what each of these clusters meant.
Cluster 0 seems torepresent players who mainly shoot in the paint, but can shoot outside the paint. They don’t shoot many threes. My assumption is that these players used to be traditional big man but are in the transition of becoming stretch forwards.
Cluster 1 seems to represent players who shoot threes and shots in the paint (Moreyball ideals). However, they seem to shoot more threes than paint shots.
Cluster 2 seems to represent players who prefer to shoot midrange shots.
Cluster 3 seems to represent players who play in the paint and leave the paint extremely rarely.
Cluster 4 seems to represent players who shoot threes and shots in the paint (Moreyball ideals). However, they seem to shoot more paint shots than threes.
These are some of the notable players from each of the clusters. Interestingly, LeBron James and Joel Embiid are in the same cluster. Obviously they are not the same type of player, but their shooting tendencies are quite similar. This is why adding something like assist data could be beneficial to the performance of this model.
I was curious so I looked at the Rockets’ distribution of clusters for 2018-19 and this is what I got.
In comparison, this is what the Knicks were.
This highlights that the Rockets really rely on Moreyball a lot (fitting :D), and mainly focus on the three-point aspect of their strategy. Further, the Knicks distribution shows that the Knicks aren’t that progressive in their methods (we knew that).
I then cross referenced the cluster with some statistics to see which clusters relied on the ball a bit more.
These two charts show how the midrange cluster tends to have more opportunity than other clusters. Personally, I believe this has to do with the close correlation between people who shoot from midrange and their reliance on isolation basketball. Players like Kevin Durant, Jimmy Butler, and Carmelo Anthony all fall into this cluster and they are known for playing isolation basketball.
I also cross referenced the clusters with some player statistics, like three point percentage and field goal percentage.
These two graphs help us see that cluster 1 shoots a lot of threes, as they have a higher three point percentage than all the other clusters, but a lower field goal percentage. Further, we can confirm that cluster 3 is the “traditional big man” and is full of extremely poor three point shooters.
Interestingly cluster 2 and cluster 4 have similar percentages for both three point percentages and field goal percentages. However, cluster 2 shoots less threes and more mid-range jumpers, which in general, is less efficient. This is highlighted with EFG% below.
Cluster 0 is also quite poor at shooting, but they still venture out of the paint more often than cluster 3. When we watch Giannis and Anthony Davis play, we can easily identify this, as we know they are trying to expand their game to the three point shot. However, they are not that efficient from the three point line at the moment.
These graphs also further confirm that midrange players are the least efficient shooters in terms of EFG% and that traditional big men (or merely players who don’t deviate much from the pain) are the most efficient in this sense.
Cluster Distribution Over Time
In the previous blog post, I generated different clusters for each of these years. However, I thought it would be interesting to use the same clusters and see how the distribution of the clusters changed over time.
We see that cluster 2 used to be the most popular for many years. However, with the rise of Moreyball and efficiency, we see that cluster 1 and 4 have become more popular in recent years.
The distribution of the clusters, interestingly, did not change much from 1999-00 to 2008-09. Over this entire timeframe, the number of midrange players decreased slightly, but it is not noticeable. Only recently do we see this complete change in the distribution of clusters.
I want to see if I can correlate these clusters to win percentage in some way. This way, we can see what clusters directly translate to winning. I also want to add other mapped data (such as where assists were made from, where rebounds were taken) and see if this helps better cluster players.
In the NBA, we often assign labels to players, not really looking in depth on what constitutes these labels. Something that we can do to figure out the “definition” of these labels and see whether these labels actually exist is to use an algorithm known as k-means-clustering to cluster shot charts (to find similar shot charts given a set of features).
My approach for clustering the shot charts was to bin groups of shots, much like we do sometimes with visualization. By binning the groups of shots, it means I used data in the form of a vector, highlighting the frequency for individual locations, like so.
I separated shots into 14 locations as given by the stats.nba.com API, and I created a 14×1 vector per player over each season, containing the shot frequency for each location on the court. The locations are highlighted in the shot chart above. The reason I do not include the field goal percentage is because I was trying to highlight tendencies of the player, and FGP is irrelevant to that in my opinion.
I can’t use the actual raw X-Y coordinates because players take a different number of shots per game, which would make the dimensions of the vector different for every player. This would prevent the usage of k-means clustering on the data.
I ran the clustering algorithm, with the steps highlighted above, for two separate time frames, to see how the clusters have changed over time. The two time frames I selected were the “2016-17”, “2017-18”, “2018-19” (recent) seasons and the “1999-00”, “2000-01”, “2001-02” (old) seasons.
The number I decided on for the number of clusters was 3, but that was an arbitrary number. I can definitely try with a larger number of clusters and see where that takes me.
I first ran UMAP dimensionality reduction and highlighted different clusters, just to verify that there was something to highlight.
It’s obviously not easy to make any conclusions from this UMAP visualization alone, so I took some samples from all of the clusters highlighted by the algorithm.
Above, each row represents one cluster highlighted by the algorithm. The first row is obviously a cluster that highlights players that do not deviate from the paint much. It includes players like Dwight Howard and Ben Simmons.
However, the other two clusters that the algorithm highlighted seem extremely similar (2 and 3). Personally, I don’t see any stark differences between the two clusters, but in general, it seems like the second cluster is more inclined to “Moreyball”, meaning people in the second cluster take less mid-range shots than do people of the second cluster. However, the difference seems very low-key so I’m not really sure.
These are the relative amounts of each cluster in the overall dataset. It makes sense, as the number of players who only play in the paint is very low.
Here, the first row highlighted seems to be players who exemplify the “perimeter game”. This makes sense as the perimeter game was very prominent in the seasons we’re looking at.
The second cluster seems to highlight players who mainly rely on the mid-range game, and don’t really venture much into three-point-range. The third cluster seems to use the mid-range game, but also goes to three point game. The distinction between these two isn’t too eye-catching.
These are the relative frequencies of each cluster in the dataset. The mid-range game was quite prominent during this age, and the algorithm seems to agree.
Really, the only cluster that seems to exist in both eras of basketball is the cluster with mid-range and three-point shooters. This really speaks to the quickly changing nature of basketball. The perimeter two is not being used much at all, nor is the pure mid-range game. This is clearly the result of analytics in the sport, as these shots just don’t provide as many points per shot taken.
There are definitely things that I can do better in this project. If you have any suggestions, I can definitely try implementing them.
Zion Williamson has been phenomenal this preseason for the New Orleans Pelicans. This has led to various opinions in the basketball world on how Zion will perform in the regular season.
Some say that Zion is going to be an All-Star in his rookie season. In fact, Stephen A. Smith made the bold claim that Zion’s rookie season will mirror Shaquille O’ Neal’s, based off Zion’s unparalleled efficiency in the paint.
To examine this idea more objectively, I attempted to look at a general case rookies in the preseason and the regular season and isolate some key statistics. One thing to take note of that affect interpretation of these results:
Teams have extreme variability of schedules in the preseason. For instance, one team could be playing a strong Lakers team each game, while another team plays teams like the Shanghai Sharks. Thus, it is tough to generalize anything from preseason to regular season. However, for the purposes of this article, we will assume that this will not affect player/team statistics by a wide margin.
I looked at three different basic statistics (points per game, assists per game, rebounds per game) in the regular season vs. the preseason. I plotted these values on separate histograms for regular season and preseason statistics.
These stats show that show that the overall spread of points scored during the regular season is skewed more right than the preseason points per games for rookies.
To examine this further, I wanted to look at whether players were less efficient during the regular season, or whether one of the main causes for this was because of less opportunity during the regular season.
In this scatter plot, the size of the dot corresponds to the rate of the increase of time played. Mathematically, it is simply expressed as
This constant value is just so that one can distinguish between the sizes of the dots.
If you look closely at this plot, there is a clump of small dots below the line. (which is the line that represents the values staying the same). This means that most of these players received far less minutes during the regular season, which will in turn, skew the points per game histogram more to the right.
Thus, we can assume that rookies maintain fairly similar point averages across the preseason and the regular season. We can also generalize this to assists, with the same type types of plots.
We can see from this plot that regular season assist averages are also more skewed right than preseason assist averages.
And again, we see the same trend as we did with points. However the data is far more clumped near zero, which makes sense. Most players do not have the ball in their hands in order to facilitate too much.
When we look at rebounds, however, we can see that the distributions remain fairly similar across the regular season and the preseason.
When we look at the scatter plot for this graph, it looks like the number of points above the y=x line and the number of points below the line and above the line seems around the same.
There is obviously some correlation between preseason stats and regular season. The main bottleneck for rookie players which causes regular season statistics to dip is that rookie players do not get as many opportunities in the regular season as they do in the preseason.
However, we should expect Zion to continue to be the main man for the New Orleans Pelicans, and with his body frame and ability to score at will, don’t expect too much of a drop off in regular season statistics.
Nearly every day in the NBA (playoffs included), there are close games that come down to the wire. We see teams with 3, 4, or 5-point deficits with only a shot-clock remaining quite often, and one of the questions commentators always ask during this situation is:
Do you go for the quick 2 and intentionally foul (and hope that the opponent will miss a free throw) or go for a three?
A lot of the time (we saw this with the Houston vs Golden State series), teams decide to go for the quick two, but other times, they go for the three (in the Golden State vs Portland series). Although it was a 3-point game in the latter game, versus a 5-point deficit in the second one, there should be some simple analytical way of determining when a team should go for the 3 or go for a 2.
Golden State is known for its shooting. Although we mainly consider Golden State to be a great 3-point shooter, GSW is full of great free-throw shooters as well. In the clutch, Steph shoots 97 percent from the line, which is one of the best in the league. In addition, Klay shot 100 percent from the line in the clutch this season, while Durant shot 91 percent. This means that the Warriors have at least 3 great options (out of 5) to give it to when they have a small lead with little time left.
Given all of these options, we’ll assume that the free throw percentage for the Warriors when they are intentionally fouled is 90% (which is obviously conservative). In addition, we will assume that the average 2 point percentage in the clutch is 55 percent (for the opposing team), while the average 3 point percentage in the clutch is 30 percent (for the opposing team).
Now assuming it is a 3 point game and the opposing team has the ball, we have a couple scenarios:
1. Shoot a 3 -> tie the game 2. Shoot a 2 -> intentionally foul -> opposing team misses a free throw -> Shoot a 2 3. Shoot a 2 -> intentionally foul -> opposing team misses both free throws -> Shoot a 2 4. Shoot a 2 -> intentionally foul -> opposing team makes both free throws -> Shoot a 3
In all of these cases, the opposing team will catch up, but is it more likely to beat the Warriors by shooting a 3 on that play or shooting a 2 on that play?
Well, when you compute the probabilities using some simple multiplication, you get that the probability of winning when you shoot:
A THREE = 31.52% A TWO = 19.11 %
Clearly, you should go for the three, and no matter what Mark Jackson and Stan Van Gundy said, this is the case.
However, this is only for the Warriors case. What if you have a bad free throw shooting team. Well, we can represent this as a graph, where the independent variable is the probability that the team with the lead makes their free-throw and the dependent variable is the probability of tying the game. (we are going to assume that the 3PT%=30%, while the 2PT%=55% for all teams)
In the above graph, the green line represents taking a three while the white line represents taking a two. Clearly, teams should always shoot the 3 when they are down 3 and have the ball.
Now let’s see what happens if it’s a 4 point game: There are more possibilities if its a 4 point game, but we’ll remove some of them when we are generating the model (the ones that are essentially negligible because there is such a low probability that it will occur).
1. Shoot a 3 -> intentionally foul -> opposing team misses a free throw -> Shoot a 2 2. Shoot a 3 -> intentionally foul -> opposing team misses both free throws -> Shoot a 2 3. Shoot a 3 -> intentionally foul -> opposing team makes both free throws -> Shoot a 3 4. Shoot a 2 -> intentionally foul -> opposing team misses a free throw -> Shoot a 3 5. Shoot a 2 -> intentionally foul -> opposing team misses both free throws -> Shoot a 2 6. Shoot a 2 -> intentionally foul -> opposing team makes both free throws -> Shoot a 3 ……
When you are playing against the Warriors, the probabilities are like so:
A THREE = 10.43% A TWO = 5.47 % Again, we see that the probability to tie is higher when you go for a three versus if you go for a two. Again, we represent this with a graph, with the same X and Y variables.
Interestingly, it is better to go for the two when the opposing team has bad free throw shooting (< 79%) and you are down 4. However, if you have a high FT%, you should always go for the 3. At ~79% FT% it does not matter whether you go for the 2 or if you go for the 3.
Finally, we’ll look at the 5 point game case (the hopeless cause, basically). At this point, you can basically accept the loss if you are playing against the Warriors. You need basically everything to line up in your favor (missed free throws, made threes, made twos). More concretely you need one of the following:
1. Shoot a 3 -> intentionally foul -> opposing team misses a free throw -> Shoot a 3 2. Shoot a 3 -> intentionally foul -> opposing team misses both free throws -> Shoot a 2 3. Shoot a 3 -> intentionally foul -> opposing team makes both free throws -> Take the L, basically 4. Shoot a 2 -> intentionally foul -> opposing team misses a free throw -> Take the L, basically 5. Shoot a 2 -> intentionally foul -> opposing team misses both free throws -> Shoot a 3 6. Shoot a 2 -> intentionally foul -> opposing team makes both free throws -> Take the L, basically ……
Against the Warriors:
A THREE = 1.59% A TWO = 0.647% With that probability to tie up the game, you should just take the loss 😞. Below, is the same graph as above, but for a five-point game.
CONCLUSION: There are different situations when teams should go for the 3 and the 2, but when it’s a 3 point game, ALWAYS go for the 3. Also, when you play the Warriors, you should always go for the 3. We should expect at least 2 close games this year during the finals, with the Raptors on the losing side. If the Raptors can play by this strategy, they should be able to win at least one of these games.
The 2019 NBA Playoffs have been excellent, with teams playing at their absolute best. We’ve seen teams like the Warriors and the Bucks absolutely dominate, but how have these teams, along with other teams, changed their playmaking strategies? For instance, if we look at the Bucks in the Playoffs, they have obviously decided to make Giannis drive into the paint more and pass out less. This is due to the fact that Giannis’s points in the paint generate more points per shot (field goal percentage*a three or a two) than would a three-point shooter typically.
However, the Bucks have obviously employed a different strategy than did other teams in the playoffs. For instance, we know that the Denver Nuggets’ whole strategy depended on Nikola Jokic. However, it wasn’t his extraordinary ability to score that makes Jokic so great. Instead, it is his ability to make plays and get his teammates points, that allows Jokic’s team to win games. Thus, we saw the Nuggets embrace this strategy. I expressed this idea of teams looking for playmaking with a simple statistic (Points from Assists adjusted for usage rate inflation). This stat basically allowed me to see i) the quality of the passes (if they didn’t pass well, the passes wouldn’t translate into points) ii) the number of passes (if they didn’t pass often, they wouldn’t generate points) Then, I graphed these players on a graph to see how (in general) teams change their strategies with regards to passing and playmaking.
In the graph above, the red line represents a player whose points from assists does not change in the playoffs. However, we see that the try line of best fit here has a slope that is slightly lower than 1 (actually it is about 0.79 with an R^2 of 0.65). This shows that teams in today’s league are becoming more focused on playing through a star player in the playoffs, as we have seen with the effects of superstars in the league. Although we saw (in my previous post) that teams typically employ the same usage rate to important players in the playoffs vs in the regular season, we see here that teams typically make their players shoot more and pass less. Instead of getting points off of assists like they did in the regular season, they find it more beneficial to score through their star players. However, outliers in this dataset remain. Steph Curry has actually created more points from assists in the playoffs than he did in the regular season. We’ve seen this with the type of play the Warriors play.
When we look at games in the playoffs, we see completely different strategies employed by teams. Star players seem to be more relied on than they would in the regular season, while players with smaller roles seem to be less useful than in the regular season. This ‘hunch’ can be represented with a graph of usage rates in the playoffs vs in the regular season. Here is a graph (with a line created with a basic linear regression algorithm).
This graph’s line has a slope of 0.966, which basically means that overall, players normally do not deviate from their regular season usage rate. However, the R^2 value (which here is essentially a metric that evaluates how good of a line of best fit is) is 0.679, which isn’t that good (The optimal R^2 value in statistics is 1.0). Further, when we examine the graph, we see that players with considerably higher usage rate tend to be above the line of best fit.
To isolate player’s into different types, I decided to split players by the number of minutes. I split them into the following groups:
35+ Minutes in the regular season
25-35 Minutes in the regular season
15-25 Minutes in the regular season
5-15 Minutes in the regular season
35+ Minutes (Slope = 1.01, R^2 = 0.808)
The graph above is quite interesting as it shows that heavily relied on players do not typically get used more in the Playoffs. Rather, they get used about as often in the playoffs versus in the regular season. The R^2 value is also close to one, so the line is fairly accurate in predicting this underlying relationship.
25-35 Minutes (Slope = 0.938, R^2 = 0.695)
With the slope of this graph, we see that players who play 25-35 minutes seem to get the ball less often than in the regular season. However, interestingly, the spread between the points and the line is more than with the players with more than 35+ minutes (This is shown with the lower R^2 value as well). This means that players who have 25-35 minutes have more variation in their usage rate. 15-25 Minutes (Slope = 0.907, R^2 = 0.521)
Here, we see that there is an even lower slope but also a lower R^2 value. This means that variation is even higher than players with 25-35 minutes and on average players get less usage. With this low of an R^2 value, the line of best fit barely works as a guideline. This means we cannot really predict how much the player’s usage rate will change in the playoffs. In the next group, we will see an even better example of this.
5-15 Minutes (Slope = 0.877, R^2 = 0.292)
When we look at this graph, we see that there really isn’t much of a trend in the data. This means that when a player’s minutes are this low, there isn’t a real correlation between a player’s usage rate in the regular season vs. in the playoffs. Instead, it requires more data (i.e a player’s points per minute, assists per minute, etc).
To see what actually determines usage at these lower percentages, I trained a neural network where I inputted some basic stats per minute, offensive rating, defensive rating, along with the player’s usage rate in the regular season. This got me a much higher R^2 Value for these lower minute value, which shows what teams really look for in these players who get fewer minutes. For the actual neural network code and the rest of the code used for this post look here.
Conclusion When players are more relied on (play more minutes), they are more likely to keep the same usage rate. However, as the number of minutes that a player plays decreases, the variation increases tremendously. Thus, player efficiency is integral to determine how much a player will be used at these lower minute values.
Using the stats.nba.com API data and matplotlib, I recently developed a pretty cool visual to see how players’ points (adjusted by usage rate and minute) changed over time. This means that an excess of usage leading to extra points will not affect the quantified improvement over time. The graph above shows the average player who scored >15 pts at least once during his career. Let’s see some actual players on this graph.
The x-axis represents the players’ year in the league(ex. rookie season would be 1, sophomore season 2, etc.) and the Y-axis represents the players’ improvement relative to the base year. A negative value would mean that the player has gotten worse since the rookie season and a positive value means they have gotten better since their rookie year.
For the actual code used for this you can go here.
1. Pascal Siakam
For those doubting that Siakam has improved tremendously, the numbers don’t lie. This serious Most Improved Player candidate has shown that he is truly a force to be reckoned with. Now, does this graph mean he is a phenomenal player already? No.
This graph is relative on the player’s first year, which basically means it is measured based on the players’ improvement since he started in the league. Thus, it shows improvement since the base year, not how good the player actually is.
At first glance, this may look a little suspicious. Does this graph say that LeBron is worse than Siakam? No!
Again, this graph acts comparatively to the player’s first season. This means that the player’s point production, adjusted for usage rate and minute changes, is measured against his own point production at the beginning of his career.
So, what does this graph mean for LeBron? Surprisingly, LeBron has been pretty average in maintaining his averages on the average player graph. This means, his improvement has basically been perfectly modeled by the average 15+ point scoring player in the NBA.
However, recently (after his 13th season), he began to break off of this trend as he has surged greatly in the last few years, making these last few years some of his best in his career. In fact, instead of following the trend at year 13, James completely defied this trend, going the exact opposite direction. Although he did not make the playoffs, this year was probably one of his most efficient shooting years.
We would expect that LeBron James would take a major hit in the next 1 or 2 years. It is very likely that LeBron’s value will drop by a great amount next year, as his success is clearly not sustainable. However, he is LeBron James, so we can never know what will happen 😅.
3. Stephen Curry
Steph is a strange case. His career started off extraordinarily but after his first season, he was clearly on a downward trend. However, after becoming considerably worse and worse, Curry began to take his skills back during his 5th season in the league (disclaimer: he didn’t actually lose his skill; it was probably because he shot less because of Monta Ellis). Compared with other players, Curry reached his prime late, and it doesn’t seem like he is on a downward trend either.
Some More Players I won’t explain these in detail, but you should be able to see how players improved/deteriorated.