Clustering players by their zone-to-zone passing xT signature — a vector representing how much xT a player generates from passes between each pair of pitch zones. Players with similar passing signatures play similar roles in buildup (e.g., fullbacks who progress down the wing vs. fullbacks who recycle across the backline). Within each cluster, ranking by total xT generated identifies the best performers at that specific passing style. This is more useful for scouting replacements than generic player similarity scores because it groups players by HOW they create threat, not just how much.
(1) Divide the pitch into N zones (15 macro-zones for computational feasibility, or 150 micro-zones for higher fidelity). (2) For each player, compute a zone-to-zone passing matrix: how much xT they generate from passes originating in zone X and ending in zone Y. Flatten this into a feature vector. (3) Cluster players using K-means or similar. (4) Interpret clusters by which positions are overrepresented and by the spatial patterns of their passing. (5) Within each cluster, rank by total xT generated to find the best performers at that passing style.
Key distinction: two fullback clusters might emerge — one dominated by direct wing-to-box progression (high xT, high variance), another by conservative backline distribution (low xT, low variance). When scouting a replacement for a progressive fullback, search within the progressive cluster, not across all fullbacks.
Two fullback clusters: progressive (wing-to-box, high xT) and conservative (backline recycling, low xT). Signing a conservative recycler to replace a progressive wing-back creates a system mismatch aggregate metrics don't predict.