Home/Soccer Analytics/Position-Driven EPV Skew

Position-Driven EPV Skew

Player EvaluationLevel 3 — Advanced

What It Is

In any action-value model (EPV, SARSA, expected threat), the reward landscape is spatially non-uniform: strikers play in zones with steep value gradients near goal, while defenders play in zones where the value surface is nearly flat. This means a striker completing an ordinary 5-yard forward pass gets a large positive ΔEPV (the value curve is steep), while a defender completing an identical 5-yard forward pass gets almost zero ΔEPV (flat value surface). Naively summing EPV deltas per player will always rank attackers highest — not because they're better players, but because they have access to high-reward zones. This is a systematic measurement artifact, not a player quality finding.

Correct Execution

Recognize that the "team of Messis" thought experiment reveals the problem: 11 identical players at different positions would produce wildly different EPV delta totals. Striker-Messi gets high-reward, low-risk deltas. Defender-Messi gets low-reward, high-risk deltas. Midfield-Messi gets nothing — stuck in the "trough of meh." Before trusting any EPV leaderboard, verify that the ranking isn't simply a position-access artifact. The fix is opportunity normalization: compare each player's output to the distribution of outcomes available in their specific context, not raw deltas.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "Before you trust the EPV leaderboard, ask: are you measuring player quality or zone access?" — StatsBomb CTO, 2019
  • "If your club hires analysts with this model, your DMs will just hit long balls and get big contracts."
  • "Striker-Messi gets high-reward deltas. Defender-Messi gets low-reward deltas. Midfield-Messi gets nothing."

Common Errors

  1. Assuming EPV delta sums are position-agnostic: They are not. The value surface is spatially non-uniform, so position determines the available reward range.
  2. Trying to fix this by normalizing per position group: This is too coarse — a right-back and a left-back have different zone access depending on team shape. The fix needs per-event context, not per-position buckets.

Edges

🔑 Hidden Causal Lever

Every RL Player Leaderboard Is a Zone Heatmap in Disguise

Value delta sums rank strikers highest because they play in high-reward zones, not because they're most valuable. A hypothetical identical player scores differently at striker vs. DM. Without opportunity normalization, every RL-based valuation is just zone access ranking.

What most people do
Sum value deltas per player and rank.
What the best do
Apply opportunity-normalized action value — compare output to the historical distribution in similar contexts.
Why it's an edge: Without normalization, you'll overpay for strikers and undervalue midfielders who create conditions for those zones.
How to exploit: Compute positional baseline expected value. Player value = actual minus baseline. Validate that top-10 includes multiple positions.
StatsBomb CTO, StatsBomb Conference, 2019-10-25

Sources

  • StatsBomb CTO, StatsBomb Innovation in Football Conference, YouTube, 2019-10-25 — described the "team of Messis" thought experiment demonstrating that position access drives EPV delta rankings; showed that risk is equally position-skewed; warned that naive EPV deployment incentivizes tactically wrong play