Home/Soccer Analytics/SARSA/RL Event Valuation

SARSA/RL Event Valuation

Expected Value ModelsLevel 4 — Expert

What It Is

A reinforcement learning approach to valuing every event in a match by iteratively propagating goal rewards backward through event sequences. Unlike EPV (which uses Markov state transitions within possessions), SARSA treats the entire match as one continuous sequence, eliminating possession boundaries and arbitrary temporal horizons. The model iteratively learns: shots are valuable because they lead to goals, certain passes are valuable because they lead to shots, tackles are valuable because they lead to passes — all without defining "possession" or setting a 10-event cutoff.

Correct Execution

Start by assigning reward=1 to goals (or optionally, xG values to shots to speed convergence). Train a predictive model on the dataset. After each training pass, apply the SARSA update: project a small fraction of value from high-value events backward to their preceding events. Retrain on the updated values. Repeat until convergence. The neural network architecture should include an LSTM layer to feed in sequences of ~10 events as temporal context — the LSTM learns that a pass received after a through ball is different from a pass received after a lateral. The output layer should predict three classes: probability that the home team scores next, that the away team scores next, and that nobody scores next. This three-outcome structure enables modeling defensive intent (minimize opponent's "scores next" probability) separately from attacking intent.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "We finally killed xG. We've got something else to talk about." — StatsBomb CTO, 2019
  • "If your model always puts strikers at the top, your model is counting zone access, not player quality."
  • "The model gets noisy when the game is almost over. Don't trust the tail."
  • "If you put home advantage in the model, you can't use the model to study home advantage."

Common Errors

  1. Using Q-learning instead of SARSA: Q-learning is off-policy — it tries to find the optimal strategy by controlling agents. Football players can't be controlled. SARSA evaluates existing behavior, which is all you can do with historical match data.
  2. Setting arbitrary possession boundaries: The whole point of SARSA is to avoid splitting on possessions. If you reintroduce possession splits, you lose the temporal continuity that makes this approach valuable.
  3. Not training long enough: The iterative reward propagation needs many cycles to push value back from goals through intermediate events. Insufficient training makes everything look like "only shots matter."

Edges

Conventional Wisdom Is Wrong

Q-Learning Is Conceptually Invalid for Football — Only SARSA Works

Q-learning tries to find optimal strategy by controlling agents. You cannot control football players retrospectively. SARSA evaluates the strategy that already exists — the only valid approach for historical match data.

What most people do
Apply off-policy RL from game AI literature.
What the best do
Use SARSA with LSTM temporal context and three-outcome probability output, treating the match as one continuous sequence.
Why it's an edge: Q-learning's agent-control assumption is violated in observational sports data. SARSA also eliminates the possession-boundary problem.
How to exploit: Implement SARSA with 10-event LSTM sequences. Validate with behavioral assertion tests (penalty area shots > outside-box shots).
StatsBomb CTO, StatsBomb Conference, 2019-10-25

Sources

  • StatsBomb CTO, StatsBomb Innovation in Football Conference, YouTube, 2019-10-25 — presented SARSA-based event valuation with LSTM temporal context and three-outcome probability; described iterative reward propagation, end-of-game instability, home bias tradeoff, and the fundamental Q-learning vs. SARSA distinction for football