Home/Soccer Analytics/Context-Relative Action Scoring

Context-Relative Action Scoring

Player EvaluationLevel 4 — Expert

What It Is

Measuring what a player achieved relative to the distribution of what was historically achievable in similar situations. Instead of giving a player their raw EPV delta (+0.5 for a through ball), find all historically similar possession sequences, compute the distribution of outcomes in those situations, and report where this player's action falls in that distribution (e.g., 85th percentile). This fixes the positional-reward-access-bias: a defender's break-even play in a context where 90% of players got negative outcomes is 90th percentile performance, not a zero.

Correct Execution

The pipeline has three stages: (1) Sequence encoding: compress each possession or event sequence into a latent vector using an LSTM autoencoder — the encoder feeds in raw event sequences (not just geometry, but event types, directions, pressure), squeezes through a bottleneck, and the decoder tries to recreate the original sequence; the bottleneck representation captures the "meaning" of the possession. (2) Similarity matching: for each event, find K nearest neighbor sequences in latent space (or cluster via K-means if compute is limited). (3) Percentile scoring: compute the player's action value delta, compare it to the distribution of historical deltas in matched sequences, and report a percentile rank scaled to [-1, +1].

Important: the autoencoder should capture more than geometry — include event types, pass types (through ball vs. cross vs. carry), pressure state, and sequence ordering. The LSTM structure preserves temporal information that flat geometric approaches miss. If K-nearest-neighbors is too slow (millions of sequences × millions of comparisons), fall back to K-means clustering, but accept the fidelity loss.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "How did they do given what was possible? That's the fair question." — StatsBomb CTO, 2019
  • "Percentiles, not z-scores. Outliers are noise, not signal."
  • "Two possessions that look the same on a heat map can be completely different situations."

Common Errors

  1. Using only geometric similarity: Two possessions that look identical on a heat map can be completely different in tempo, event sequence, and pressure context. The autoencoder must encode more than coordinates.
  2. Using z-scores instead of percentile ranks: Z-scores produce extreme outliers in rare contexts. Percentile ranks bounded to [-1, +1] are more robust.
  3. Clustering too coarsely: K-means with too few clusters loses the fidelity of individual situation matching. Use as many clusters as compute allows, or go full K-nearest-neighbors if feasible.

Edges

Conventional Wisdom Is Wrong

Raw EPV Delta Systematically Undervalues Defenders and Overvalues Attackers

A defender whose possession-maintenance play produces zero EPV delta is NOT performing at average level — in contexts where 90% of historical outcomes were negative (turnovers, backward passes), breaking even is 90th percentile performance. Raw EPV delta doesn't account for opportunity: what was achievable given the situation? Context-relative scoring (measuring what a player achieved versus the distribution of what was historically achievable in similar situations) eliminates position bias. Z-scores produce extreme outliers; percentile ranks bounded to [-1, +1] are more robust.

What most people do
Rank players by raw EPV delta, which creates a leaderboard dominated by attackers in high-reward zones and penalizes defenders who competently navigate low-reward zones.
What the best do
Use LSTM autoencoders to encode possession sequences, find similar historical sequences via nearest-neighbor matching, and compute percentile rank of the player's action within that distribution. A defender's break-even play in a terrible context maps to a high percentile.
Why it's an edge: Clubs using raw EPV for player evaluation will never identify their best defenders or deep midfielders because those players are structurally capped by the reward landscape of their pitch zones. Opportunity-normalized scoring puts defenders and attackers on a comparable scale for the first time.
How to exploit: Build the opportunity-normalization pipeline. Produce player rankings where the top 20 includes defenders and midfielders alongside attackers. When these rankings surface a DM or CB as elite, cross-reference with traditional metrics — if traditional metrics rank them average but opportunity-normalized metrics rank them elite, you've found an undervalued player.
StatsBomb CTO, StatsBomb Innovation in Football Conference, 2019-10-25. Full pipeline described: LSTM autoencoder, K-NN matching, percentile scoring.

Sources

  • StatsBomb CTO, StatsBomb Innovation in Football Conference, YouTube, 2019-10-25 — described the full pipeline: LSTM autoencoder for possession similarity → K-nearest-neighbor matching → percentile scoring; warned against z-scores; emphasized that geometric similarity alone is insufficient