Playing With Win Probability Models

I recently developed a win probability model for the awesome py_ball package in Python. The package itself makes NBA/WNBA data accessible to a wide audience. If you haven’t seen it, you should definitely check it out. The link is https://github.com/basketballrelativity/py_ball.

In this blog post, I’ll describe the methods I used to develop the model.

Methods

Our model heavily relies on a series of logistics regressions, which are dependent on (a) the amount of time remaining in the game (b) the point differential and (c) who has possession. As of right now the only bias we introduce at the beginning of the game is home court advantage, which is why the home team always has slightly better odds than the away team. This is because we feed everything into the model with respect to the home team, so the model learns that the home team has a slight advantage. We are hoping to add betting odds to find true pre-game win probabilities.

In order to develop the model, we use a method that Brian Burke used in his win probability models, splitting up the game into multiple groups.

We split the game up into 960 groups (one group every 3 seconds), where we run a separate logistic regression each. Each logistic regression takes in the point differential and who has possession. We do not need to explicitly input the time, because each model is only trained on a specific timeframe.

Logistic Regression in Machine Learning using Python | Towards ...
Graphical representation of logistic regression

For games that go into overtime, we treat the 5 minutes left as if there are 5 minutes left in the fourth quarter. This is to ensure that there is enough training samples for the model to actually learn something. For instance, there are very few games that go into 4OT, so a logistic regression model would not actually be able to recognize any trends with a lack of data.

The model is trained on 5 seasons worth of data from 2013-14 to 2017-18 games.

Results

We evaluated our model on the 2018-19 data, using a brier score.

Brier Score definition from wikipedia

The brier score is the average of the mean squared error for every time frame. For instance if the model predicts a 0.58 probability of winning at a given time and that team won at the end of the game, we add (1-0.58)^2 to the brier score. We add all of these values for the entire game and divide by the total number of events. There is one event every 3 seconds.

We received a brier score of 0.167 for our model. This is a fairly decent value, because this means, on average, we are predicting the outcome of the game correctly.

Comparison

The following examples are comparisons between our model (top) and inpredictable.com’s model

Kobe Bryant’s last game (LAL vs. UTA 2015-16)

ATL vs. NYK (2016-17)

DEN vs. DAL (2019-20)

Usage

The model will be available at https://github.com/basketballrelativity/py_ball. Example notebook using the win probability model is here https://github.com/avyayv/winprobability/blob/master/pyballpackage.ipynb.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s