Home/Soccer Analytics/Graph Neural Network Modeling on 360 Data

Graph Neural Network Modeling on 360 Data

Data InfrastructureLevel 4 — Expert

What It Is

Representing StatsBomb 360 freeze frames as graphs (nodes = players, edges = relationships) and processing them through graph neural networks (GNNs) to predict possession outcomes. Unlike tabular approaches (which require hand-crafted features like "distance to nearest defender" and break when player counts vary) or image approaches (which lose structural information), graphs naturally handle variable numbers of players and preserve relational information. The model predicts three possession outcomes: shot, short possession (<5 seconds), or long possession (>5 seconds), enabling analysis of both attacking threat and defensive structure.

Correct Execution

Graph construction: For each 360 frame, create a graph where:

  • Nodes = each visible player. Node features: (x, y) location, is_ball_carrier flag, is_goalkeeper flag, distance to ball carrier, angle to ball carrier
  • Edges = relationships between ball carrier and other players. Edge features: is_teammate or is_opponent
  • Global features (outside graph): play context (open play, free kick), body part, field of view extent

Key design decision: encode the field of view as a global feature — the model learns that seeing 15 players in a narrow view means something different from seeing 2 players in a wide view. The model infers off-screen context from what it can see.

Training: Classify each possession state into shot / short possession / long possession. The model learns that high short-possession probability means the ball carrier is under heavy pressure with blocked passing lanes; high shot-possession probability means there are open passing options into attacking territory.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "The model can infer what's off-screen from what's on-screen."
  • "Graphs handle the 'how many defenders' problem naturally. No hand-crafting needed."
  • "15 players in a narrow frame means something completely different from 2 players in a wide frame."

Common Errors

  1. Forcing 360 data into tabular format: Hand-crafting features like "distance to 1st/2nd/3rd nearest defender" loses information and breaks when players are missing.
  2. Not encoding field of view: Without knowing how much of the pitch is visible, the model can't calibrate its predictions for narrow vs. wide camera angles.
  3. Using one model for attack and defense: A single possession outcome model captures both (shot probability = attacking threat; short possession probability = defensive pressure), but the interpretation differs by possession state.

Edges

💎 Elite-Only Behavior

360 Data Makes Player Positioning Visible — But Only 500 Manual Labels Are Needed to Classify All Phases of Play

The bottleneck in tactical phase classification isn't model complexity — it's labeled training data. With GCN embeddings from 360 data, only ~500 manually labeled actions (about 30 minutes of analyst time) are sufficient to train a simple classifier that accurately labels ALL remaining actions into phases of play. The embedding captures the spatial structure; the classifier just needs a few examples of each phase. This 500-label approach is 100x more efficient than traditional manual video coding.

What most people do
Manually code phases of play from video, requiring hundreds of hours of work per season.
What the best do
Train GCN embeddings on 360 data (unsupervised), then label ~500 actions across all phase types, train a lightweight classifier, and auto-classify the entire season. The time investment is 30 minutes of labeling + a few hours of compute, vs. hundreds of hours of manual coding.
Why it's an edge: The speed difference makes tactical phase analysis feasible for every match rather than a select few. Clubs with this capability can run phase-specific analysis for every opponent, while competitors manually code only priority matches.
How to exploit: Build the GCN embedding pipeline. Have your analyst label 500 actions at the start of each season. Auto-classify all matches. Run phase-specific analysis for every opponent preparation, not just the top-6 matches.
StatsBomb Conference presentations on GCN action embeddings with 360 data.

Sources

  • StatsBomb, StatsBomb Conference 2021, YouTube, 2021-11-04 — presented GNN architecture for 360 data possession outcome prediction; showed Leeds/Liverpool/Manchester City defensive tendency maps; demonstrated frame-level opposition analysis capability