Using cluster analysis (K-means or similar) to identify distinct archetypes of buildup paths — the spatial trajectories teams take when building from their own quarter to the offensive zone. Rather than analyzing individual possessions, clustering reveals 4-6 recurring buildup patterns across a league, each with different shot and goal probabilities. The key finding: sideline-to-center cross patterns are the most dangerous (~50% shot rate), while buildups trapped on the sideline are the least effective.
For each buildup possession (starting in own quarter, reaching the final quarter): (1) extract the sequence of ball positions as a spatial trajectory, (2) normalize trajectories to a standard pitch, (3) cluster using K-means or hierarchical clustering on the spatial features (positions along the trajectory, lateral spread, final entry point). The four main clusters typically found:
The cross-to-center patterns are most dangerous because the entry point is central. The sideline patterns are less effective because the team gets "trapped" wide. Footedness effects appear: left-to-center-cross may be more effective than right-to-center due to right-footed players cutting inside.
Even when long balls are completed, buildups using them show no statistical advantage in shot or goal probability. The speed advantage is entirely offset by loss of team shape. This isn't about interceptions — even the ones that work don't produce better outcomes.