Home/Soccer Analytics/Soft Feature Encoding for xG

Soft Feature Encoding for xG

Expected Value ModelsLevel 3 — Advanced

What It Is

Replacing discrete integer features in xG models (e.g., "3 defenders blocking") with continuous Gaussian-based representations that capture partial occlusion and proximity to blocking thresholds. This eliminates step-change artifacts where xG jumps discontinuously when a defender crosses an arbitrary blocking boundary, producing smoother and more physically realistic xG surfaces.

Correct Execution

Instead of counting "number of defenders in the shot cone" (integer), compute a continuous blocking score: sum of Gaussian kernels centered on each defender's position, with bandwidth proportional to their proximity to the shot line. Near the shot line = high blocking contribution; far from it = low contribution. The result is a smooth, continuous feature.

Diagnostic Tree

Sources

  • Dr. Dinesh Vatvani, StatsBomb Conference 2022, YouTube, 2022-10-04