Home/Sports Betting/Effect Identification

Effect Identification

Model-Based BettingLevel 2 — Informed

Prerequisites

What It Is

The process of identifying a real causal relationship between an observable factor and betting outcomes — BEFORE building a model. An effect is a direct mechanism that the market may be mispricing. A model is a tool to help you find and quantify that effect. The two are not the same thing.

The core distinction: "There is a distinction to be made and it's a very important distinction between an edge which comes from an effect and a model which is a tool to help you identify and quantify the effect which can lead to an edge. It is not necessarily true that a model provides an edge. A model only provides an edge when it is identifying an effect that allows you to beat the vig." — Andrew Mack

Correct Execution

You start every modeling project by naming the effect: a specific, causal, testable mechanism. You only build a model after you can articulate why this effect should exist. You check whether the market has already priced the effect before investing in model development. You understand that data mining (throwing all available stats into a regression and hoping something falls out) is the wrong direction — it produces models that look good on backtests but lose money live.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "Start with effect, not with the model." — Andrew Mack, Circles Off Ep. #185
  • "There is no edge in a model that does not identify and quantify an effect. Modeling is not an inherent edge in itself. It is only a means to an end." — Andrew Mack, Circles Off Ep. #185
  • "If you start with the effect you're going to be much better off. If you start with the model and hope you get lucky somewhere along the way, that's the hardest way to do it." — Andrew Mack, Circles Off Ep. #185
  • "The magic is the effect. The ML algo just helps you identify it when you have a tremendous amount of data." — Andrew Mack, Circles Off Ep. #185

Common Errors

  1. Starting with a modeling technique: "I'll try XGBoost and see what it finds" → The ML algorithm finds correlations, not causes → Start with a hypothesis about a causal mechanism
  2. Data mining to confirmation: Tweak parameters until metrics look good without improving the effect quantification → "You've done nothing — you've just made it look good" → Test one hypothesis at a time
  3. Assuming public data contains unknown information: Basketball-Reference, Hockey-Reference are fully priced in → Using only publicly available data gives you no information advantage → Look for new, less-incorporated data sources or novel analytical approaches
  4. Applying a model from one sport to another without verifying the effect exists there too: Andrew lost money applying a hockey model to EPL during COVID → "Every sport deserves its own respect" → Identify effects specific to each sport

Edges

💎 Elite-Only Behavior

The Effect Is the Moat; the Model Is the Infrastructure

Two bettors with identical modeling skills will have very different results if one has found a real causal effect and the other hasn't. The effect is the durable competitive advantage — it represents genuine market mispricing. The model is just infrastructure to exploit it. Most bettors focus 90% of their energy on models and 10% on effects, when the ratio should be reversed.

What most people do
Spend months perfecting a regression or ML pipeline before testing whether there's a real effect to identify. Assume model sophistication = edge.
What the best do
Identify an effect first. Build only enough model to quantify it precisely. "There are situations where you don't even need a model to harness an effect — the effect is the thing, the model is optional infrastructure."
Why it's an edge: Effect scarcity is real. There are fewer genuinely exploitable effects than there are people capable of building sophisticated models. The bottleneck is finding the effect, not building the model.
How to exploit: Before any model work, write down in one sentence the causal mechanism you're trying to exploit. If you can't, stop and find one first.
"It is not necessarily true that a model provides an edge. A model only provides an edge when it is identifying an effect that allows you to beat the vig." — Andrew Mack, Circles Off Ep. #185
Conventional Wisdom Is Wrong

Fading Overcorrection When New Data Goes Mainstream

When a new data source or methodology becomes mainstream, the market overcorrects toward it — and simple older approaches temporarily outperform. When xG models became popular in hockey, Corsi (simpler shot-attempt counting) outperformed them for years. The overcorrection creates the opposite edge: fade the new thing while everyone else is adopting it.

What most people do
Immediately incorporate new data or methodologies as soon as they become available, assuming more sophisticated = better edge.
What the best do
Look for holes in new data before adopting it. Ask: "Is the market now overweighting this new signal? Are there data integrity issues? Is the complexity hiding overfitting?"
Why it's an edge: The rush to adopt new methods creates temporary overfitting. The contrarian who knows the limitations of new data profits while others overcorrect.
How to exploit: When a new methodology goes mainstream, run it as a second model alongside your existing approach. If they disagree, investigate which is right. Don't assume new = better.
"When expected goals models became popular in hockey, I had a theory they were overfitting... and Corsi actually outperformed them for a while." — Andrew Mack, Circles Off Ep. #185
Conventional Wisdom Is Wrong

Complex Models Often Don't Beat a Moving Average

Andrew Mack built a 25-variable rolling ridge regression that outperformed a 200-day moving average by one-third of 1%. Before declaring any model has edge, benchmark it against the simplest possible baseline. The added overfitting risk from complexity may exceed the marginal improvement.

What most people do
Build increasingly complex models (XGBoost, neural nets, ensemble methods) without comparing to a simple baseline. Assume complexity = edge.
What the best do
Before deploying any model, run it against the simplest reasonable alternative (moving average, basic Elo, home advantage + league table). Only accept the complex model if the improvement justifies the added overfitting risk.
Why it's an edge: Most modelers never run this comparison, leading to over-engineered models that are fragile in live betting. The bettor using a simple, robust model often outperforms the one using a complex, overfitted one.
How to exploit: For every model you build, also build the "dumbest version that could work" — a simple average, Elo, or rolling mean. Compare out-of-sample performance. If the complex model doesn't beat the simple one by at least 1% ROI, use the simple one.
"I built a complex 25-variable rolling ridge regression. It performed marginally better than a 200-day moving average — to the tune of a third of 1% better." — Andrew Mack, The Outlier Podcast, 2025
💎 Elite-Only Behavior

Most Bettors Stay in Sponge Mode Forever and Never Execute

There are two distinct phases of development: sponge (absorb everything, find edges, explore) and operator (reduce noise, narrow focus, execute with conviction). The transition signal is clear — you have a validated effect. Most bettors never make the transition, staying in perpetual learning mode that prevents committed execution.

What most people do
Continuously consume new content, test new approaches, follow new data sources — indefinitely. Always learning, never executing with sustained conviction on a validated edge.
What the best do
Recognize the transition point: once an effect is validated, deliberately narrow their information diet. The same openness that found the effect now distracts from exploiting it. Switch from exploration to exploitation.
Why it's an edge: The bettor in operator mode captures compound returns from a validated edge while the bettor in perpetual sponge mode keeps resetting, never compounding.
How to exploit: Ask yourself: "Do I have a validated, positive-CLV edge I've been running for 3+ months?" If yes, you're in operator territory — cut information sources by 50% and focus on execution. If no, stay in sponge mode but set a deadline for when you'll commit.
"You try to put the critical mind aside while you're brainstorming... Once you have a validated effect, now you want to reduce the noise." — Andrew Mack, The Outlier Podcast, 2025
🔑 Hidden Causal Lever

NIL Has Structurally Killed March Madness Upsets

NIL compensation has dramatically expanded the talent gap between power programs and mid-major schools. Historical first-round upset rates (approximately 1-in-3 for 12-seeds, 1-in-5 for 13-seeds) were built on pre-NIL parity. If the structural change is durable, the market continues pricing historical rates while true rates have shifted — making favorite-side early-round bets systematically underpriced.

What most people do
Use historical March Madness upset rates (built on 30+ years of data) to inform bracket and betting decisions. Treat first-round upsets as regularly recurring events.
What the best do
Track upset frequency post-NIL as a separate regime. Adjust upset probability estimates downward for the NIL era. Bet favorites more aggressively in early rounds until/unless the structural shift reverses.
Why it's an edge: The market's upset pricing is anchored to decades of data that no longer reflect the current talent distribution. This is a regime change, not a temporary fluctuation.
How to exploit: Track first and second-round upset rates post-NIL (2021+). Compare to historical base rates. If the suppression is durable (3+ years of data), bet first-round favorites at prices still calibrated to historical upset frequency.
"If you're looking for that mid-major to make a run... I don't think those teams exist anymore. We had no upsets last year." — Will Hill, Super Bowl LX MegaPod, 2026

Sources

  • Andrew Mack, Circles Off Ep. #185 (2024-12-19) — edge vs. model distinction, effect-first framework, data mining critique, new data overcorrection, market as competitive game
  • Andrew Mack, The Outlier Podcast (2025-11-26) — creative→analytical cycle, mechanics-first, sponge→operator phases, transient vs. structural effects, benchmark against simplest alternative, "notice weird things" principle
  • Rufus Peabody, Using Data to Find Angles (2023-08-17) — creativity/question formation as primary edge source, qualitative confidence tracking
  • Rufus Peabody, Super Bowl LX MegaPod (2026-02-05) — 3-unanswered scores total/spread interaction (sharp disagreement), NIL structural upset suppression