Win Probability at the World Cup

May 17, 2026

The 2026 FIFA World Cup kicks off June 11 in North America. 48 teams, a new format, three host countries. It’s the largest World Cup ever staged, and there is no historical data for this tournament structure.

The soccer win probability model described earlier in this series - Dixon-Coles, bivariate Poisson, xG-driven intensity - provides the match-level foundation. Extending it to a 48-team tournament prediction is a different problem. The group stage introduces a three-outcome model. National teams aren’t persistent entities the way club teams are. And the expanded format means nobody can calibrate against prior tournaments because none existed in this shape.

We haven’t built a World Cup prediction model. But the tournament is ten weeks away, the tools exist, and the exercise of thinking through what it would take is useful - both as a reference and as something we might actually attempt.

The structural complexity

Group stage

The World Cup begins with a round-robin group stage. In the 32-team format (used through 2022), eight groups of four teams each play a complete round-robin (three matches per team). The top two teams from each group advance to a 16-team knockout bracket. The 2026 expansion to 48 teams changes the structure: twelve groups of four, with the top two teams and eight best third-place finishers advancing to a 32-team knockout.

The group stage is what makes this fundamentally different from Grand Slam prediction. In tennis, the bracket is fixed and every match is single-elimination. The World Cup group stage introduces three complications: the possibility of strategic play in the final group match, the dependence of knockout-round seeding on group-stage performance, and the three-outcome problem (win, draw, loss) that doesn’t exist in tennis.

Three outcomes per match

Soccer matches in group play can end in a draw. Points are awarded: 3 for a win, 1 for a draw, 0 for a loss. A match-level model would need to produce three probabilities (home win, draw, away win) rather than two. Dixon-Coles handles this naturally through the bivariate Poisson framework - the probability of a draw is the sum of P(0-0) + P(1-1) + P(2-2) + ... across the goal probability matrix.

Knockout matches can’t end in a draw (extra time and penalty shootouts resolve ties), so you’d need two modes: a three-outcome mode for group play and a two-outcome mode for knockout rounds. The two-outcome mode would extend the regulation-time model with conditional probabilities for extra time and shootouts.

The team strength problem

Elo as starting point

The most common approach to estimating team strength across national teams is some variant of Elo. FIFA’s own ranking system was overhauled in 2018 to use an Elo-based formula after years of criticism of the previous points-based system.

The core Elo update after each match:

R_new = R_old + K × (S - E)

where S is the actual result (1, 0.5, or 0 for win, draw, loss), E is the expected result based on rating difference, and K is a weighting factor that varies by match importance.

For World Cup prediction specifically, the K-factor for competitive matches would be higher than for friendlies, reflecting the information content of each match type. World Cup matches themselves carry the highest K-factor, which creates an interesting dynamic: the model’s ratings would be most rapidly updated during the tournament it’s trying to predict.

FiveThirtyEight’s Soccer Power Index extended Elo with adjustments for goal difference, home advantage (including continent-specific effects), and separate offensive/defensive ratings. Lars Schiefler’s World Football Elo Ratings provide a long historical series. Groll, Schauberger, and Tutz (2015) took a different approach entirely, using team-specific regularized Poisson regression for the 2014 tournament.

The squad problem

This is the piece that makes national team prediction fundamentally different from club prediction. National teams are not persistent entities. The roster changes between every tournament window. The 2022 France World Cup squad was meaningfully different from the 2018 squad that won the tournament. Rating the “team” as a continuous entity misses these composition changes.

More sophisticated approaches would rate individual players and compose team strength from the expected starting eleven. This requires projecting which players will be called up, who will start, and how the formation and tactical setup translate individual ratings into team performance. The complexity grows quickly, and the data for international team tactical setups is sparse compared to club football.

In practice, the Elo approach - treating the national team as the unit - works reasonably well for major footballing nations with deep player pools. It struggles more for smaller nations where one or two players represent most of the team’s quality. Iceland at the 2018 World Cup, with a population of 360,000, was qualitatively different from the France squad, but simple Elo captures this difference adequately through match results. The question is whether you can do better, and how much the additional complexity buys you.How a group stage simulation would work

Given pairwise match probabilities for every team pairing, the group stage simulation would compute the probability of each team finishing in each position within their group.

Exact computation

For a group of four teams playing three matches each, the total number of group outcome combinations across all six matches is 3^6 = 729 (each match is win, draw, or loss). Each outcome combination produces a points table. This is small enough to enumerate exactly.

For each of the 729 outcomes, you’d compute the probability of that specific outcome combination (product of the six match probabilities), determine the points table, apply tiebreakers (goal difference, goals scored, head-to-head), and assign that probability to the resulting standings.

The tiebreaker step is where exact computation gets tricky. Goal difference tiebreakers require modeling the joint distribution of goal differences across matches, not just win/draw/loss. A practical simplification: for each win/draw/loss outcome, simulate the scoreline from the underlying Poisson model to resolve tiebreakers probabilistically.

Monte Carlo at scale

For the 48-team format with 12 groups, or when incorporating goal-difference tiebreakers, Monte Carlo simulation would be more practical. Simulate each match using the Dixon-Coles model (drawing scorelines from the bivariate Poisson distribution), compute the group tables, apply tiebreakers, and advance teams to the knockout bracket. Run 10,000-100,000 simulations and tabulate frequencies. Suzuki et al. (2010) used a Bayesian approach along these lines for the 2006 tournament.

The knockout complications

Once teams are placed in the knockout bracket, the simulation would proceed like the Grand Slam model from the previous post: for each match, draw a result and advance the winner, continuing until a champion is determined.

The complications are extra time and penalty shootouts:

P(Team A advances) = P(A wins in regulation) + P(draw in regulation) × P(A wins in extra time or shootouts)

Extra time

Extra time is 30 minutes of additional play. Goal-scoring rates in extra time are generally lower than in regulation - teams are fatigued, more cautious, and the stakes are higher. Empirical extra-time scoring rates are roughly 60-70% of the regulation per-minute rate.

A model for extra time could use the same Poisson framework with a reduced intensity parameter:

lambda_extra = lambda_regulation × (30/90) × fatigue_factor

where fatigue_factor is typically estimated at 0.6-0.7 from historical data.

Penalty shootouts

Penalty shootouts are approximately coin flips, but not exactly. Historical World Cup shootout data shows a slight advantage for the team shooting first (roughly 55/45, though this varies across studies). Individual penalty conversion rates average around 75%, with variance based on player skill, goalkeeper ability, and the pressure state.

The simplest model treats shootouts as a 55/45 coin flip in favor of the team shooting first. More detailed models simulate individual penalties using player-specific conversion rates, though this level of detail is rarely justified given the noise in the data. Shootouts are inherently high-variance events, and modeling them precisely may be overfitting signal that isn’t there.

How dynamic updating would work

Pre-tournament

Before the tournament begins, a full simulation would produce a tournament win probability for each team. These probabilities would reflect the team’s strength (Elo rating), their group draw (easier groups produce higher advancement probabilities), and their knockout bracket position (which side of the bracket the potential semifinal and final opponents are on).

Draw difficulty matters more than most people realize. A team in a group with three other strong teams has a lower advancement probability regardless of their own strength. And if the bracket structure places the two strongest teams on the same side, one of them is guaranteed to be eliminated before the final.

During the group stage

After each group match, the simulation would update by replacing uncertain match outcomes with known results. A team that wins its first match sees its probability of advancing increase both because they’ve banked three points and because they’ve provided new information about their current form.

During knockout matches

Within a live knockout match, the tournament win probability would update continuously:

P(A wins tournament | match state) = P(A advances | match state) × P(A wins tournament | A advances)

The first term would come from the within-match soccer win probability model (xG-adjusted Poisson with current score, time remaining, and red cards). The second term from the bracket simulation conditioned on A advancing.The host effect

World Cup host nations historically outperform their ratings. This isn’t just the direct home advantage of playing at home - it includes crowd support, no travel, climate familiarity, and potentially favorable referee decisions (documented in Sutter and Kocher, 2004).

South Korea reached the semifinals in 2002, a result far beyond their Elo rating. Quantifying the host effect varies across studies, with estimates ranging from 50-100 Elo points of additional strength for the host nation. The 2026 World Cup, hosted across three countries (USA, Canada, Mexico), would complicate this further: each host nation would have “home” matches but also neutral-site matches within the tournament. How you model that split matters.

The 2026 format question

The expansion to 48 teams for 2026 creates a genuinely novel modeling challenge. Twelve groups of four (instead of eight groups of four) means more third-place finishers advancing, which changes the tiebreaker dynamics. Some groups will be dramatically weaker than others given the expanded pool.

No historical data exists for this format. A simulation framework would need to be adapted from first principles rather than calibrated on past tournaments. This is actually a case where the modeling approach has an advantage over intuition - you can simulate 48-team tournaments with current ratings thousands of times and observe the distributional properties, even without historical precedent.

The validation wall

This is the fundamental challenge with World Cup prediction. The tournament happens every four years. Even with data back to 1930, there are only 22 completed tournaments. Pre-tournament win probability assessments for 22 events, with 32 teams each, yield roughly 700 pre-tournament probabilities to evaluate. That’s not nothing, but it’s thin for rigorous calibration analysis.

The practical workaround: evaluate the match-level model on all competitive international matches (World Cup qualifiers, continental championships, Nations League). The match-level model can be validated on thousands of matches. The tournament structure is then a deterministic function of those match probabilities, so if the match model is well-calibrated, the tournament model should inherit that property. “Should” is doing some work there, but it’s the best available approach.

Betting markets provide pre-tournament odds for every World Cup. Comparing a model’s pre-tournament win probabilities against the implied market probabilities (after removing the vig) would be the most practical benchmark. Leitner, Zeileis, and Hornik (2010) did exactly this for the 2008 European Championship and found that Elo-based models performed comparably to bookmaker odds.How we could actually build this for 2026

The tournament is June 11. The draw is done. The groups are set. Here’s what a practical build would look like.

Step 1: Team strength ratings

Start with the World Football Elo Ratings (eloratings.net), which are public and updated after every international match. These give you a single strength number per team going into the tournament. You could also pull FIFA’s official rankings, which use their own Elo variant since 2018.

For a more sophisticated version, use the penaltyblog Python package (covered in the implementation post) to fit a Dixon-Coles model on recent international results. Two to three years of competitive matches (World Cup qualifiers, continental championships, Nations League) gives you attack and defense strength parameters per team, which is richer than a single Elo number.

import pandas as pd
from penaltyblog import footballdata

# Pull international match results
# Filter to competitive matches (not friendlies) from 2023-2026
# Fit Dixon-Coles with team-specific attack/defense parameters

# The output: for any Team A vs Team B matchup,
# you get P(home_win), P(draw), P(away_win) and
# the full scoreline probability matrix

The host advantage adjustment is the open question. Historical estimates suggest 50-100 Elo points for the host nation. With three hosts in 2026, you’d want to apply a reduced home boost (maybe 30-50 points) for matches played in a host’s own country, and a smaller neutral-venue adjustment for matches in the other two host countries. This is a judgment call, not a precisely calibrated parameter.

Step 2: Group stage simulation

With the actual 2026 groups and the pairwise match probabilities from Step 1, simulate every group.

import numpy as np
from scipy.stats import poisson

def simulate_match(lambda_home, lambda_away):
    """Draw a scoreline from bivariate Poisson."""
    home_goals = poisson.rvs(lambda_home)
    away_goals = poisson.rvs(lambda_away)
    return home_goals, away_goals

def simulate_group(teams, match_lambdas, n_sims=50000):
    """
    teams: list of 4 team names
    match_lambdas: dict of (team_a, team_b) -> (lambda_a, lambda_b)
    Returns: dict of team -> [P(1st), P(2nd), P(3rd), P(4th)]
    """
    finishes = {t: [0, 0, 0, 0] for t in teams}

    for _ in range(n_sims):
        points = {t: 0 for t in teams}
        gd = {t: 0 for t in teams}
        gf = {t: 0 for t in teams}

        # Play all 6 matches in the group
        for i in range(len(teams)):
            for j in range(i+1, len(teams)):
                lam_i, lam_j = match_lambdas[(teams[i], teams[j])]
                gi, gj = simulate_match(lam_i, lam_j)

                gf[teams[i]] += gi
                gf[teams[j]] += gj
                gd[teams[i]] += gi - gj
                gd[teams[j]] += gj - gi

                if gi > gj:
                    points[teams[i]] += 3
                elif gi == gj:
                    points[teams[i]] += 1
                    points[teams[j]] += 1
                else:
                    points[teams[j]] += 3

        # Sort by points, then goal difference, then goals for
        ranking = sorted(teams, key=lambda t: (points[t], gd[t], gf[t]), reverse=True)
        for pos, t in enumerate(ranking):
            finishes[t][pos] += 1

    # Convert to probabilities
    for t in teams:
        finishes[t] = [x / n_sims for x in finishes[t]]

    return finishes

Run this for all 12 groups. The output: each team’s probability of finishing 1st, 2nd, 3rd, or 4th in their group. First and second advance automatically. The eight best third-place finishers also advance, which requires comparing third-place records across groups.Step 3: Full tournament simulation

Chain the group stage into the knockout bracket.

def simulate_tournament(groups, match_lambdas, n_sims=50000):
    """
    Full tournament: group stage -> knockout bracket.
    Returns: dict of team -> P(winning the tournament)
    """
    wins = {team: 0 for group in groups.values() for team in group}

    for _ in range(n_sims):
        # 1. Simulate all groups
        group_results = {}
        third_place = []
        for group_name, teams in groups.items():
            # simulate group, get final standings
            standings = simulate_group_once(teams, match_lambdas)
            group_results[group_name] = standings
            third_place.append((standings[2], group_name))  # 3rd place team

        # 2. Determine best 8 third-place finishers
        # Sort by points, GD, GF from the group stage sim
        advancing_third = select_best_third(third_place)

        # 3. Build knockout bracket per FIFA 2026 rules
        bracket = build_bracket(group_results, advancing_third)

        # 4. Simulate knockout rounds
        champion = simulate_knockout(bracket, match_lambdas)
        wins[champion] += 1

    return {t: w / n_sims for t, w in wins.items()}

The knockout simulation needs the extra time and shootout logic described earlier. For a first pass, treat knockout matches that end in a draw after regulation as a coin flip with slight first-mover advantage (55/45). Refine later if the model warrants it.

Step 4: What you’d learn

50,000 simulations produce a tournament win probability for each of the 48 teams. But the interesting outputs aren’t just the top-line numbers. You’d see:

Group difficulty asymmetry. Some groups are dramatically easier than others in the 48-team format. The simulation quantifies this: a team with a 95% chance of advancing from a weak group vs. 60% from a strong group, even at the same Elo rating.

Bracket path effects. Which side of the bracket is weaker? Which path to the final avoids the most dangerous opponents? The draw creates structural advantages that aren’t obvious from looking at individual matchups.

Third-place volatility. The eight best third-place finishers rule adds variance that didn’t exist in the 32-team format. A team’s advancement probability depends on results in other groups they don’t play in.

Host nation boost sensitivity. Run the simulation with and without the host advantage adjustment. How much does a 50-point Elo boost change the USA’s, Mexico’s, or Canada’s tournament win probability? This tells you how much the host effect assumption matters relative to the base strength estimate.

Step 5: Live updating during the tournament

Once play begins, the simulation re-runs after every match with known results replacing simulated ones. The code structure supports this naturally: lock in completed group matches, resimulate the remaining ones, and propagate through the bracket.

This is where it connects to SportChartz. The match-level win probability model already runs in real time. Layering the tournament simulation on top means every goal in a group stage match ripples through to update every team’s tournament win probability, including teams that aren’t playing that day. A surprise result in Group A changes the third-place calculus for Group F. The simulation captures this. A human watching on TV doesn’t.

What this doesn’t solve

This approach uses team-level Elo or Dixon-Coles parameters. It doesn’t account for individual player availability (injuries, suspensions), tactical matchup effects, or within-tournament form. The squad problem described above remains. You’re modeling “France the entity” rather than “this specific France squad with Mbappe but without Kante.”

For the 2026 format specifically, there’s no way to validate the group-to-knockout transition rules against historical data because no prior tournament used them. The simulation is internally consistent - the math works - but it’s never been tested against reality in this configuration. That’s the honest limitation.

The tools exist. The data is accessible. The open question is whether the prediction adds value beyond what the betting markets already price in. Leitner et al. (2010) found Elo-based models performed comparably to bookmaker odds for the 2008 Euros. Comparable isn’t “better than.” But comparable with a transparent methodology that updates in real time during the tournament - that’s something the markets don’t provide in the same way.

References

Cea, S., Duran, G., Guajardo, M. & Saure, D. (2020). “An analytical approach to the FIFA ranking procedure and the World Cup.” Annals of Operations Research, 286(1), 357-387.

Dixon, M.J. & Coles, S.G. (1997). “Modelling association football scores and inefficiencies in the football betting market.” Journal of the Royal Statistical Society: Series C, 46(2), 265-280.

Groll, A., Schauberger, G. & Tutz, G. (2015). “Prediction of major international soccer tournaments based on team-specific regularized Poisson regression.” Journal of Quantitative Analysis in Sports, 11(2), 97-115.

Leitner, C., Zeileis, A. & Hornik, K. (2010). “Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008.” International Journal of Forecasting, 26(3), 471-481.

Silver, N. (2018). “How our 2018 World Cup predictions work.” FiveThirtyEight methodology.

Sutter, M. & Kocher, M.G. (2004). “Favoritism of agents – the case of referees’ home bias.” Journal of Economic Psychology, 25(4), 461-469.

Suzuki, A.K., Salasar, L.E.B. & Leite, J.G. (2010). “A Bayesian approach for predicting match outcomes: The 2006 (Association) Football World Cup.” Journal of the Operational Research Society, 61, 1530-1539.

nfosignal

Discussion about this post

Ready for more?