Home/Soccer Analytics/Action Embeddings via Graph Convolutional Networks

Action Embeddings via Graph Convolutional Networks

Data InfrastructureLevel 4 — Expert

What It Is

Using Graph Convolutional Networks (GCN) with a Word2Vec-inspired training approach to generate dense vector representations ("embeddings") of football actions from 360 data. Each action is represented as a graph (players as nodes, relationships as edges) and trained to predict surrounding actions in a sequence — the same principle behind Word2Vec in NLP. Similar actions (same phase, similar player configurations) cluster together in embedding space, enabling few-shot classification of phases of play, similarity search across different matches, and even football "analogies" (action A is to action B as action C is to ?).

Correct Execution

(1) Represent each action as a graph using 360 data: nodes = players with position features, edges = relationships (teammate/opponent). (2) Train a GCN to predict the surrounding actions in a possession sequence (Word2Vec's skip-gram approach applied to football). (3) The trained model produces a fixed-length vector embedding for any action. (4) Phase classification: label ~500 actions manually (30 minutes of work), train a simple classifier on embeddings → classify all remaining actions. (5) Similarity search: find the most similar historical actions to a given action via nearest-neighbor in embedding space. (6) Analogies: perform vector arithmetic (action1 - action2 + action3) to find actions that satisfy football analogies.

Key insight: embeddings capture player configuration context even when the number of visible players differs between frames. Two "pass to keeper" actions with 10 vs. 20 visible players can still be recognized as similar because the graph structure captures the relevant relationships.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "Label 500 actions in 30 minutes. The model classifies the rest. That's the promise of embeddings."
  • "Similar actions cluster together — even when the number of players in the frame is different."

Common Errors

  1. Over-interpreting analogy results: Football analogies are less clean than word analogies. The second-nearest neighbor may be more meaningful than the first.
  2. Assuming embeddings are ready for production: This is introductory research. Rigorous evaluation (not just manual inspection) is needed before using embeddings for decisions.
  3. Not including player velocity/orientation: Current 360 data lacks these; adding them would improve embedding quality.

Sources

  • Juan Camilo Campos, Genius Sports, StatsBomb Conference 2021, YouTube, 2021-11-04 — presented GCN-based action embeddings using Word2Vec principles on 360 data; demonstrated phase classification with 500 labels, similarity search, and football analogies