Basketball: The Random Walk

May 06, 2026

Basketball produces more scoring events per game than any other major sport. An NBA game averages roughly 200 possessions and 220 total points. A college game, with its longer shot clock, averages fewer possessions but still produces 130-150 points. Every possession is a potential scoring event, and the gaps between scores are measured in seconds, not minutes.

This density of scoring is what makes basketball mathematically distinct. The score differential doesn’t jump between discrete states the way baseball does, and it doesn’t sit at low integers the way soccer and hockey do. It evolves continuously, like a stock price, which is why the natural mathematical framework for basketball win probability is the random walk.

Stern (1994): Brownian motion and score progressions

Hal Stern’s 1994 paper in the Journal of the American Statistical Association established the foundational framework. He modeled the score differential in a basketball game as a Brownian motion process with drift:

X(t) = μt + σB(t)

where:

X(t) is the score differential at time t (positive means the home team leads)
μ is the drift parameter, encoding the difference in team quality. If the home team is better, μ > 0 and the score differential tends to grow in their favor over time
σ is the diffusion parameter, encoding the randomness of scoring. Higher σ means more volatility in the score differential
B(t) is standard Brownian motion - a continuous-time stochastic process with independent, normally distributed increments

At any time t, the score differential is normally distributed:

X(t) ~ N(μt, σ²t)

The variance grows linearly with time. This means the range of plausible score differentials widens as the game progresses - consistent with the empirical observation that blowouts develop gradually, not instantly.

Win probability from Brownian motion

Given a current score differential d at time t in a game of total duration T, the probability that the home team wins is:

P(win | d, t) = Φ((d + μ(T - t)) / (σ√(T - t)))

where Φ is the standard normal cumulative distribution function.

This formula has an intuitive structure:

The numerator d + μ(T - t) is the expected final score differential: the current lead d plus the expected additional drift μ(T - t) over the remaining time
The denominator σ√(T - t) is the standard deviation of the remaining score differential - how much randomness is left
The ratio is a z-score: how many standard deviations the expected final lead is above zero

When the game is nearly over (T - t → 0), the denominator shrinks toward zero. If the leading team has any lead at all (d > 0), the z-score goes to infinity and win probability approaches 1. This captures the intuition that a lead becomes more secure as time expires.

When the game is just starting (t → 0), the large denominator means the current score provides little information - win probability is dominated by the drift parameter μ (pre-game team quality).

Empirical validation

Stern tested the Brownian motion model on 493 NBA games from the 1991-92 season. The key findings:

Score differential variance is approximately linear in time: The spread of halftime score differentials was consistent with σ²·(T/2), supporting the Brownian assumption
The model fit was good but not perfect: Deviations appeared in the final minutes, consistent with strategic behavior changes (fouling, clock management) that violate the stationarity assumption
Team quality heterogeneity matters: When μ was allowed to vary by matchup (using point spreads as a proxy for team quality), the model fit improved significantly

Gabel and Redner (2012): the random walk confirmed

Gabel and Redner extended Stern’s work with a much larger dataset - 6,087 NBA games from the 2006-2010 seasons - and confirmed the random walk characterization with additional detail.

Their key finding: basketball scoring is best described as a weakly-biased continuous-time random walk.

Scoring intervals are exponential

The time between successive scoring events in an NBA game follows an exponential distribution. This is the hallmark of a Poisson process - the scoring events themselves arrive at a constant rate, and the time between them has the memoryless property. Whether a team scored 5 seconds ago or 50 seconds ago does not affect the probability of scoring in the next 5 seconds.

Formally, if τ is the time between consecutive scoring events:

P(τ > t) = e^(-λt)

where λ is the scoring rate. For a typical NBA game, λ ≈ 1 scoring event every 25-30 seconds of game clock.

The Peclet number

Gabel and Redner introduced the Peclet number to basketball analytics as a dimensionless quantity characterizing the competition between team quality (bias) and randomness (diffusion):

Pe = v²t / (2D)

where v is the effective scoring bias (related to μ) and D is the diffusion coefficient (related to σ²).

When Pe << 1, randomness dominates - the better team’s advantage is masked by scoring fluctuations. This is the regime for most of the first half: the game is too young for team quality to have reliably asserted itself.

When Pe >> 1, team quality dominates - the game has gone on long enough that the better team’s systematic advantage has accumulated beyond what random fluctuations can reverse. This is the regime for blowouts in the fourth quarter.

The transition between regimes explains why leads in basketball feel unstable early but become progressively more secure as the game progresses. It’s not just that there’s less time left - it’s that the signal-to-noise ratio in the score differential increases with time.

Lead changes and the arc-sine law

A mathematical consequence of the random walk model is the arc-sine law for last lead change. In a symmetric random walk (equally matched teams), the last lead change is much more likely to occur near the beginning or end of the game than in the middle. This counterintuitive result means that many games appear to be “decided” early, with one team leading for most of the contest, even between evenly matched opponents.

Gabel and Redner showed that by including team strength heterogeneity (allowing μ to vary across matchups), the model explains essentially all statistical features of NBA game scoring, including the distribution of final score differentials, the fraction of time the home team leads, and the frequency of lead changes.

The logistic regression approach

While the Brownian motion framework provides theoretical elegance, most practical win probability implementations for basketball use logistic regression, which is more flexible in handling game-specific features.

Core variables

The minimal logistic regression model for basketball win probability uses:

z = β₀ + β₁(score_diff) + β₂(time_remaining) + β₃(possession) + β₄(score_diff × time_remaining)

The interaction term is essential. Without it, the model treats a point of score differential as equally important at all times. With it, the model learns that score differential matters more as time decreases.

Beuoy’s Inpredictable model

Mike Beuoy’s Inpredictable model (2013) demonstrated that a relatively straightforward locally weighted logistic regression could outperform ESPN’s proprietary model. The key innovations:

Locally weighted regression: Rather than fitting one global logistic regression, fit a separate regression at each prediction point, weighting nearby game states more heavily. This allows the relationship between score differential and win probability to change smoothly over the course of the game.
Vegas spread as a feature: Incorporating the pregame point spread dramatically improves early-game predictions. A team trailing by 5 in the first quarter that was a 12-point favorite has a very different win probability than a team trailing by 5 that was a 12-point underdog. Without the spread, both are mapped to the same probability.
Possession as an explicit variable: Having the ball in basketball is worth approximately 1 point of expected value (roughly half the expected points per possession). Including possession as a binary variable captures this.

Burke’s segmented approach

Brian Burke’s approach for college basketball segments the game into time intervals and fits a separate logistic regression within each:

One regression per 10-second interval from 40 to 1 minutes remaining
One per 2-second interval from 60 to 30 seconds remaining
One per 1-second interval for the final 30 seconds

This discretized approach approximates the continuously varying relationship that locally weighted regression captures, with the advantage of being simpler to implement and more interpretable (you can examine the coefficients at each time segment independently).

The increasing granularity near the end of the game reflects the reality that win probability changes most rapidly in the final seconds - a single possession can swing the outcome.KenPom and college basketball

Ken Pomeroy’s college basketball rating system (KenPom, launched 2002) provides the foundation for much of college basketball win probability modeling. The core metrics:

Adjusted Offensive Efficiency (AdjOE): Points scored per 100 possessions, adjusted for opponent strength
Adjusted Defensive Efficiency (AdjDE): Points allowed per 100 possessions, adjusted for opponent strength
Adjusted Tempo: Possessions per 40 minutes, adjusted for opponent

Pre-game win probability is derived from the Pythagorean expectation adapted for basketball:

Win% = (AdjOE)^n / ((AdjOE)^n + (AdjDE)^n)

where n is a tuning parameter (approximately 10-11 for college basketball, compared to the original Pythagorean exponent of 2 from baseball and James’ later adjustment to 1.83).

Pace as a variable

The shot clock difference between NBA (24 seconds) and college (30 seconds) affects win probability modeling in a specific way: fewer possessions means higher variance per game.

Expected possessions in a game:

NBA: ~100 possessions per team → ~200 total scoring opportunities
College: ~65-70 possessions per team → ~130-140 total scoring opportunities

With fewer possessions, each individual possession represents a larger fraction of the total. This means:

Upsets are more likely in college basketball (less regression to the mean within a game)
Score differential carries slightly different information (a 10-point lead over 50 possessions is more significant than a 10-point lead over 30 possessions)
The diffusion parameter σ in the Brownian motion model is effectively larger relative to the drift μ

This is one reason college basketball tournament brackets are harder to predict than NBA playoff series. The single-elimination format, combined with fewer possessions per game, amplifies randomness.

Expected Possession Value (EPV)

Cervone, D’Amour, Bornn, and Goldsberry (2014) introduced Expected Possession Value at the MIT Sloan Sports Analytics Conference, representing the frontier of basketball analytics. EPV uses player-tracking data recorded 25 times per second to assign a point value to every moment of a possession.

How EPV works

At any instant during a possession, the ball handler has options: shoot, pass, dribble. EPV computes the expected points that will result from the remaining portion of the possession, given the current spatial configuration of all 10 players on the court.

EPV(t) = E[points scored this possession | spatial configuration at time t]

The computation requires:

A model of the probability distribution over the ball handler’s next action (shoot with probability p_s, pass to player j with probability p_j, dribble with probability p_d)
For each action, a model of the resulting spatial configuration
Recursive computation of expected points given each subsequent configuration

The result is a continuous curve of expected value over the course of a possession. When a player makes a good decision (passes to an open teammate in a high-value position), EPV increases. When they make a poor decision (take a contested long two), EPV decreases.

EPV and win probability

EPV doesn’t directly compute win probability, but it feeds into it. If you can estimate each team’s expected points per possession with EPV-level granularity, you can project the expected score at any future time and derive win probability:

E[final score differential] = current differential + (EPV_home - EPV_away) × remaining possessions

The variance around this expectation, combined with the Brownian motion or logistic regression framework, gives the win probability distribution.

EPV represents the state of the art in basketball scoring modeling, but it requires player-tracking infrastructure that only the NBA (via Second Spectrum) provides. College basketball and international leagues lack the data for EPV computation.

The end-of-game problem

Every basketball win probability model faces the same challenge: the final 90-120 seconds of a close game operate under fundamentally different dynamics than the rest of the game.

What changes

When a team trails in the final minutes, the standard rules of basketball strategy are replaced by a specific protocol:

Intentional fouling: The trailing team fouls immediately to stop the clock and send the opponent to the free throw line. This converts a running-clock possession (where the leading team can drain time) into a stopped-clock event (two free throws, then the trailing team gets the ball back).
Free throw mathematics: The expected points from two free throws (~1.55 for an average NBA shooter at 77.5%) is lower than the expected points from a normal possession (~1.12). The trailing team is deliberately accepting this tradeoff to buy additional possessions before time expires.
Three-point shooting: When trailing by 3, the trailing team explicitly seeks three-point shots. The expected value of a three-point attempt is lower than a normal possession, but the variance is higher - and the trailing team needs variance, not expected value.
Clock management: The leading team may foul intentionally to prevent three-point attempts (foul the inbounder, foul before the shot) when up by 3. Both teams’ strategies become conditional on the score differential and time remaining in ways that standard models don’t capture.

Why standard models break

The Brownian motion model assumes stationary drift and diffusion parameters. The logistic regression model assumes the relationship between score differential and win probability is smooth. Both assumptions fail when:

Scoring events become discrete and strategic (free throws vs. field goals)
The probability of scoring depends on the specific score differential (trailing by 3 means only three-point attempts; trailing by 2 means normal play)
Time management becomes a first-order consideration (who has timeouts, how much time per possession)

FiveThirtyEight’s solution: the endgame tree

FiveThirtyEight addressed this by running two separate models:

Main model (>90 seconds remaining): Poisson-based projection using RAPTOR player ratings, estimating expected points per remaining possession for each team and computing win probability from the resulting score distributions.
Endgame model (<90 seconds remaining): A decision tree that enumerates all possible sequences of possessions. For each remaining possession, the model assigns probabilities based on historical data: probability of 0, 1, 2, or 3 points, time consumed per possession, and whether the sequence involves free throws (triggered by fouls) or field goals.

Between 90 and 60 seconds, FiveThirtyEight blends the two models. Below 60 seconds, the tree model dominates.

The tree approach works because the number of remaining possessions is small enough to enumerate. With 60 seconds left and approximately 2-3 possessions per team remaining, the tree has manageable depth. The same approach would be computationally intractable for the full game, where 100+ possessions per team create an impossibly large tree.

An unsolved problem

The endgame model is an engineering solution, not a mathematical one. It handles the most common late-game scenarios well but struggles with unusual situations:

Intentional fouls of poor free throw shooters (”Hack-a-Shaq”): The model needs player-specific free throw percentages, which vary dramatically (from 40% to 95%)
Technical fouls and flagrant fouls: These create free throw plus possession situations that break the standard possession model
Overtime: Models trained on regulation data don’t transfer well to overtime, where rotations differ, strategy changes, and the psychological context is different

No existing model handles all of these situations in a unified framework. The endgame remains the hardest part of basketball win probability to get right.

Known limitations

Garbage time

When one team leads by 25+ points in the fourth quarter, both teams clear their benches. The scoring dynamics change completely - backups play extended minutes, starters rest, and neither team’s intensity reflects their true quality. Standard models treat these possessions the same as any other, which creates calibration issues:

A model might assign a team a 99.5% win probability when up 28 with 5 minutes left. This is approximately correct - but the remaining 0.5% implied comeback probability is driven by the model’s estimate of scoring dynamics that no longer reflect reality (the leading team’s starters aren’t playing).

Garbage time constitutes a small fraction of total game minutes but affects model calibration at the extremes, exactly where accuracy matters most for applications like prediction market pricing.

Home court advantage calibration

FiveThirtyEight’s RAPTOR model has a documented calibration issue: it predicts home teams winning approximately 70% of the time, while the actual home-win rate in the NBA is approximately 61%. This systematic bias suggests either an overweighted home court advantage parameter or compounding errors in the RAPTOR player-level projections when aggregated to game-level predictions.

College vs. NBA model transfer

Models built on NBA data don’t transfer directly to college basketball due to:

Shot clock: 24 seconds (NBA) vs. 30 seconds (college) changes pace, possession count, and scoring dynamics
Talent distribution: NBA teams are relatively balanced; college teams range from dominant to dramatically overmatched
Three-point line distance: NBA three-point line is further, affecting shot selection and expected points per possession
Game length: 48 minutes (NBA) vs. 40 minutes (college)
Foul rules: Different bonus structures (college one-and-one vs. NBA bonus) affect late-game free throw mathematics

Separate models must be built and calibrated for each level. College basketball models (KenPom, Bart Torvik’s T-Rank, Yale’s model) achieve 71-78% accuracy, lower than the best NBA models (75-92%) due to greater team quality variance and smaller per-team sample sizes.

The mathematical character of basketball

Basketball’s win probability is mathematically the most intuitive of the major sports because the random walk framework maps to how fans experience the game. A lead grows, shrinks, fluctuates. Momentum shifts feel real because the scoring process is continuous enough to generate visible trends over short windows. The score differential wanders, and the win probability rides on top of that wandering.

The challenge is that this intuitive simplicity breaks at the extremes - the endgame, garbage time, overtime - where the stationarity assumptions that make the random walk work no longer hold. The next generation of basketball win probability models will need to handle these regime changes more gracefully than the current approach of bolting separate models together at the boundaries.

References

Beuoy, M. (2013). “Updated NBA Win Probability Calculator.” Inpredictable.com.

Cervone, D., D’Amour, A., Bornn, L., and Goldsberry, K. (2014). “Predicting Points and Valuing Decisions in Real Time with NBA Optical Tracking Data.” MIT Sloan Sports Analytics Conference.

FiveThirtyEight. “How Our NBA Predictions Work.” https://fivethirtyeight.com/methodology/how-our-nba-predictions-work/

Gabel, A. and Redner, S. (2012). “Random Walk Picture of Basketball Scoring.” Journal of Quantitative Analysis in Sports, 8(1).

Kvam, P. and Sokol, J.S. “A Logistic Regression/Markov Chain Model for NCAA Basketball.” Georgia Institute of Technology.

Pomeroy, K. “KenPom College Basketball Ratings.” https://kenpom.com/

Stern, H.S. (1994). “A Brownian Motion Model for the Progress of Sports Scores.” Journal of the American Statistical Association, 89(427), 1128-1134.

Neal Foster is Co-Founder & CTO of SportChartz and Founder & Partner of Vybe Capital.

nfosignal

Discussion about this post

Ready for more?