Home/Soccer Analytics/Empirical Expected Threat by Game State

Empirical Expected Threat by Game State

Expected Value ModelsLevel 3 — Advanced

Prerequisites

Game Dynamics Classification (Counter / Fast Attack / Organized / Direct)

Unlocks

Set Defense Attack Parameter Optimization

What It Is

Computing the empirical likelihood of scoring within the next N moves from each pitch location, separately for different game state subsets (e.g., set defense, counter-attack, regular play). Unlike Karun Singh's recursive expected threat model, the empirical approach is a direct frequentist calculation: for each location bin, count events in the subset that led to a goal within 5 moves, divide by total events in that bin. The key advantage: it can be computed for any arbitrary subset of events regardless of whether the eventual outcome falls within that subset. This reveals that the threat landscape is fundamentally different depending on game state — locations near the halfway line are far more threatening in counters than against set defenses, but this difference shrinks near the byline.

Correct Execution

(1) Partition events by game state (using set-defense-proxy-detection or game-dynamics-classification). (2) For each subset, grid the attacking half of the pitch into location bins. (3) For each bin, compute P(goal within next 5 moves | event in this bin, this game state). (4) Compare threat surfaces across game states. Key patterns:

Counter-attack threat surface: high near halfway line (space to attack into), gradually declining toward goal
Set defense threat surface: low near halfway line (no space), increasing sharply near the byline (cutbacks penetrate organized blocks)
The byline convergence: near the byline, the threat surfaces for different game states converge — cutbacks are dangerous regardless of defensive organization

Progression Levels

Diagnostic Tree

Coaching Cues

"Against a set defense, the byline is the exploit. Cutbacks work because the block is organized but narrow."
"The same location on the pitch has different threat depending on whether you're countering or probing a block."
"If set defense and counter look the same on your threat map, your state labels aren't working."

Common Errors

Confusing empirical expected threat with recursive expected threat: Karun Singh's model uses a recursive Markov process. Empirical expected threat is a direct frequency count. They measure similar things but the empirical version is more flexible for subset analysis.
Too few events per bin: Small bins + rare game states = noisy estimates. Merge bins or use smoothing when sample sizes are small.
Ignoring the "within 5 moves" horizon: The horizon matters — too short and you miss indirect threats; too long and every location looks similar.

Edges

🔑 Hidden Causal Lever

Against Set Defenses, the Byline Is the Exploit — Not the Center

xg-modelsempirical-expected-threat →

The threat landscape differs fundamentally by game state. Against set (organized) defenses, locations near the halfway line have almost zero threat (no space to attack into), but threat increases sharply near the byline because cutbacks penetrate organized blocks. Against counters, the pattern inverts: high threat near the halfway line (space to run into), declining toward the byline. Near the byline, the threat surfaces CONVERGE across game states — cutbacks are dangerous regardless of defensive organization.

What most people do

Use a single, unconditional expected threat map for all game states, treating every zone as equally valuable regardless of whether the team is countering or probing a set defense.

What the best do

Compute separate threat surfaces per game state. Against set defenses, prioritize byline penetration and cutbacks. Against disorganized defenses, prioritize direct central progression. The tactical prescription is game-state-dependent.

Why it's an edge: Man City's half-space-to-byline cutback strategy is specifically designed to exploit the set-defense threat landscape. Teams that understand this can replicate the principle: against organized blocks, the valuable zone is the byline, not the center of the box. Most teams waste possession probing the center against set defenses when the exploit is at the edges.

How to exploit: When facing a set defense (>20 seconds in possession, opponent organized), route attacks to the byline rather than trying to penetrate centrally. Measure byline entry rate against set defenses as a tactical KPI. For opponent analysis: check if they use cutbacks disproportionately against set defenses — if so, defend the byline, not the center.

Perdomo & Zarrella, 23 Sports, StatsBomb Innovation in Football Conference, 2019-10-28. Man City's pattern explicitly identified.

Sources

David Perdomo & Daniel Zarrella, 23 Sports, StatsBomb Innovation in Football Conference, YouTube, 2019-10-28 — presented empirical expected threat computed separately for set defense, counter, and regular play; showed that threat surfaces differ dramatically by game state; identified byline cutback effectiveness against set defenses; highlighted Man City's half-space-to-byline pattern