Home/Soccer Analytics/Buildup Path Cluster Analysis

Buildup Path Cluster Analysis

Tactical AnalysisLevel 3 — Advanced

What It Is

Using cluster analysis (K-means or similar) to identify distinct archetypes of buildup paths — the spatial trajectories teams take when building from their own quarter to the offensive zone. Rather than analyzing individual possessions, clustering reveals 4-6 recurring buildup patterns across a league, each with different shot and goal probabilities. The key finding: sideline-to-center cross patterns are the most dangerous (~50% shot rate), while buildups trapped on the sideline are the least effective.

Correct Execution

For each buildup possession (starting in own quarter, reaching the final quarter): (1) extract the sequence of ball positions as a spatial trajectory, (2) normalize trajectories to a standard pitch, (3) cluster using K-means or hierarchical clustering on the spatial features (positions along the trajectory, lateral spread, final entry point). The four main clusters typically found:

  • Right sideline: stays right, enters zone near sideline — moderate danger
  • Right to center: goes right then crosses to center before entering zone — high danger
  • Left sideline: stays left, enters near sideline — moderate danger
  • Left to center cross: goes left then crosses to center — highest danger (~50% shot rate)

The cross-to-center patterns are most dangerous because the entry point is central. The sideline patterns are less effective because the team gets "trapped" wide. Footedness effects appear: left-to-center-cross may be more effective than right-to-center due to right-footed players cutting inside.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "Cross to the center before you enter the zone, not after."
  • "Long balls skip the midfield but they skip your advantage too."
  • "Four ways in. Two of them are dangerous. Are we taking the dangerous ones when they're open?"

Common Errors

  1. Too few clusters: 2-3 clusters may miss the critical cross-to-center pattern. Start with 4-6 and evaluate.
  2. Ignoring footedness: Left-to-center and right-to-center patterns may have different effectiveness due to dominant foot cutting inside.
  3. Prescribing only the highest-value pattern: A team that always crosses to center becomes predictable. The analysis informs when the pattern is available, not that it should always be attempted.

Edges

Conventional Wisdom Is Wrong

Completed Long Balls Show Zero Effectiveness Advantage Over Short Combinations

Even when long balls are completed, buildups using them show no statistical advantage in shot or goal probability. The speed advantage is entirely offset by loss of team shape. This isn't about interceptions — even the ones that work don't produce better outcomes.

What most people do
A completed long ball is "successful." "He skipped the midfield" is treated as positive.
What the best do
Evaluate long balls by what happens after completion. Show coaches that even completed long balls produce no better downstream outcomes.
Why it's an edge: Don't pay a premium for "excellent long passing range" in buildup contexts where it provides no outcome advantage.
How to exploit: When scouting opponents relying on long-ball buildup, don't fear the completed long ball. Invest in short-combination speed (tempo) rather than long-ball range.
Benjamin (physicist), StatsBomb Conference, 2019-10-25

Sources

  • Benjamin (physicist), StatsBomb Innovation in Football Conference, YouTube, 2019-10-25 — presented K-means cluster analysis of buildup paths across 5 major leagues; identified 4 main buildup archetypes; showed sideline-to-center-cross pattern produces ~50% shot rate; demonstrated long balls show no effectiveness advantage