Home/Sports Betting/Effect Identification

Effect Identification

Model-Based BettingLevel 2 — Informed

Prerequisites

What It Is

The process of identifying a real causal relationship between an observable factor and betting outcomes — BEFORE building a model. An effect is a direct mechanism that the market may be mispricing. A model is a tool to help you find and quantify that effect. The two are not the same thing.

The core distinction: "There is a distinction to be made and it's a very important distinction between an edge which comes from an effect and a model which is a tool to help you identify and quantify the effect which can lead to an edge. It is not necessarily true that a model provides an edge. A model only provides an edge when it is identifying an effect that allows you to beat the vig." — Andrew Mack

Correct Execution

You start every modeling project by naming the effect: a specific, causal, testable mechanism. You only build a model after you can articulate why this effect should exist. You check whether the market has already priced the effect before investing in model development. You understand that data mining (throwing all available stats into a regression and hoping something falls out) is the wrong direction — it produces models that look good on backtests but lose money live.

Progression Levels

Diagnostic Tree

Coaching Cues

"Start with effect, not with the model." — Andrew Mack, Circles Off Ep. #185
"There is no edge in a model that does not identify and quantify an effect. Modeling is not an inherent edge in itself. It is only a means to an end." — Andrew Mack, Circles Off Ep. #185
"If you start with the effect you're going to be much better off. If you start with the model and hope you get lucky somewhere along the way, that's the hardest way to do it." — Andrew Mack, Circles Off Ep. #185
"The magic is the effect. The ML algo just helps you identify it when you have a tremendous amount of data." — Andrew Mack, Circles Off Ep. #185

Common Errors

Starting with a modeling technique: "I'll try XGBoost and see what it finds" → The ML algorithm finds correlations, not causes → Start with a hypothesis about a causal mechanism
Data mining to confirmation: Tweak parameters until metrics look good without improving the effect quantification → "You've done nothing — you've just made it look good" → Test one hypothesis at a time
Assuming public data contains unknown information: Basketball-Reference, Hockey-Reference are fully priced in → Using only publicly available data gives you no information advantage → Look for new, less-incorporated data sources or novel analytical approaches
Applying a model from one sport to another without verifying the effect exists there too: Andrew lost money applying a hockey model to EPL during COVID → "Every sport deserves its own respect" → Identify effects specific to each sport

Edges

💎 Elite-Only Behavior

The Effect Is the Moat; the Model Is the Infrastructure

model-bettingeffect-identification →

Two bettors with identical modeling skills will have very different results if one has found a real causal effect and the other hasn't. The effect is the durable competitive advantage — it represents genuine market mispricing. The model is just infrastructure to exploit it. Most bettors focus 90% of their energy on models and 10% on effects, when the ratio should be reversed.

What most people do

Spend months perfecting a regression or ML pipeline before testing whether there's a real effect to identify. Assume model sophistication = edge.

What the best do

Identify an effect first. Build only enough model to quantify it precisely. "There are situations where you don't even need a model to harness an effect — the effect is the thing, the model is optional infrastructure."

Why it's an edge: Effect scarcity is real. There are fewer genuinely exploitable effects than there are people capable of building sophisticated models. The bottleneck is finding the effect, not building the model.

How to exploit: Before any model work, write down in one sentence the causal mechanism you're trying to exploit. If you can't, stop and find one first.

"It is not necessarily true that a model provides an edge. A model only provides an edge when it is identifying an effect that allows you to beat the vig." — Andrew Mack, Circles Off Ep. #185

⚡ Conventional Wisdom Is Wrong

Fading Overcorrection When New Data Goes Mainstream

model-bettingeffect-identification →

When a new data source or methodology becomes mainstream, the market overcorrects toward it — and simple older approaches temporarily outperform. When xG models became popular in hockey, Corsi (simpler shot-attempt counting) outperformed them for years. The overcorrection creates the opposite edge: fade the new thing while everyone else is adopting it.

What most people do

Immediately incorporate new data or methodologies as soon as they become available, assuming more sophisticated = better edge.

What the best do

Look for holes in new data before adopting it. Ask: "Is the market now overweighting this new signal? Are there data integrity issues? Is the complexity hiding overfitting?"

Why it's an edge: The rush to adopt new methods creates temporary overfitting. The contrarian who knows the limitations of new data profits while others overcorrect.

How to exploit: When a new methodology goes mainstream, run it as a second model alongside your existing approach. If they disagree, investigate which is right. Don't assume new = better.

"When expected goals models became popular in hockey, I had a theory they were overfitting... and Corsi actually outperformed them for a while." — Andrew Mack, Circles Off Ep. #185

⚡ Conventional Wisdom Is Wrong

Complex Models Often Don't Beat a Moving Average

model-bettingeffect-identification →

Andrew Mack built a 25-variable rolling ridge regression that outperformed a 200-day moving average by one-third of 1%. Before declaring any model has edge, benchmark it against the simplest possible baseline. The added overfitting risk from complexity may exceed the marginal improvement.

What most people do

Build increasingly complex models (XGBoost, neural nets, ensemble methods) without comparing to a simple baseline. Assume complexity = edge.

What the best do

Before deploying any model, run it against the simplest reasonable alternative (moving average, basic Elo, home advantage + league table). Only accept the complex model if the improvement justifies the added overfitting risk.

Why it's an edge: Most modelers never run this comparison, leading to over-engineered models that are fragile in live betting. The bettor using a simple, robust model often outperforms the one using a complex, overfitted one.

How to exploit: For every model you build, also build the "dumbest version that could work" — a simple average, Elo, or rolling mean. Compare out-of-sample performance. If the complex model doesn't beat the simple one by at least 1% ROI, use the simple one.

"I built a complex 25-variable rolling ridge regression. It performed marginally better than a 200-day moving average — to the tune of a third of 1% better." — Andrew Mack, The Outlier Podcast, 2025

💎 Elite-Only Behavior

Most Bettors Stay in Sponge Mode Forever and Never Execute

model-bettingeffect-identification →

There are two distinct phases of development: sponge (absorb everything, find edges, explore) and operator (reduce noise, narrow focus, execute with conviction). The transition signal is clear — you have a validated effect. Most bettors never make the transition, staying in perpetual learning mode that prevents committed execution.

What most people do

Continuously consume new content, test new approaches, follow new data sources — indefinitely. Always learning, never executing with sustained conviction on a validated edge.

What the best do

Recognize the transition point: once an effect is validated, deliberately narrow their information diet. The same openness that found the effect now distracts from exploiting it. Switch from exploration to exploitation.

Why it's an edge: The bettor in operator mode captures compound returns from a validated edge while the bettor in perpetual sponge mode keeps resetting, never compounding.

How to exploit: Ask yourself: "Do I have a validated, positive-CLV edge I've been running for 3+ months?" If yes, you're in operator territory — cut information sources by 50% and focus on execution. If no, stay in sponge mode but set a deadline for when you'll commit.

"You try to put the critical mind aside while you're brainstorming... Once you have a validated effect, now you want to reduce the noise." — Andrew Mack, The Outlier Podcast, 2025

🔑 Hidden Causal Lever

NIL Has Structurally Killed March Madness Upsets

model-bettingeffect-identification →

NIL compensation has dramatically expanded the talent gap between power programs and mid-major schools. Historical first-round upset rates (approximately 1-in-3 for 12-seeds, 1-in-5 for 13-seeds) were built on pre-NIL parity. If the structural change is durable, the market continues pricing historical rates while true rates have shifted — making favorite-side early-round bets systematically underpriced.

What most people do

Use historical March Madness upset rates (built on 30+ years of data) to inform bracket and betting decisions. Treat first-round upsets as regularly recurring events.

What the best do

Track upset frequency post-NIL as a separate regime. Adjust upset probability estimates downward for the NIL era. Bet favorites more aggressively in early rounds until/unless the structural shift reverses.

Why it's an edge: The market's upset pricing is anchored to decades of data that no longer reflect the current talent distribution. This is a regime change, not a temporary fluctuation.

How to exploit: Track first and second-round upset rates post-NIL (2021+). Compare to historical base rates. If the suppression is durable (3+ years of data), bet first-round favorites at prices still calibrated to historical upset frequency.

"If you're looking for that mid-major to make a run... I don't think those teams exist anymore. We had no upsets last year." — Will Hill, Super Bowl LX MegaPod, 2026

Sources

Andrew Mack, Circles Off Ep. #185 (2024-12-19) — edge vs. model distinction, effect-first framework, data mining critique, new data overcorrection, market as competitive game
Andrew Mack, The Outlier Podcast (2025-11-26) — creative→analytical cycle, mechanics-first, sponge→operator phases, transient vs. structural effects, benchmark against simplest alternative, "notice weird things" principle
Rufus Peabody, Using Data to Find Angles (2023-08-17) — creativity/question formation as primary edge source, qualitative confidence tracking
Rufus Peabody, Super Bowl LX MegaPod (2026-02-05) — 3-unanswered scores total/spread interaction (sharp disagreement), NIL structural upset suppression

Effect Identification

Prerequisites

What It Is

Correct Execution

Progression Levels

Level 1 — Casual

Level 2 — Informed

Level 3 — Sharp

Level 4 — Professional

Diagnostic Tree

Symptom: Model looks great on backtests but loses money in real betting

Symptom: Good effect, but model isn't making money

Symptom: New data source is available — should I use it?

Symptom: An effect that worked for two seasons stopped working

Coaching Cues

Common Errors

Edges

The Effect Is the Moat; the Model Is the Infrastructure

Fading Overcorrection When New Data Goes Mainstream

Complex Models Often Don't Beat a Moving Average

Most Bettors Stay in Sponge Mode Forever and Never Execute

NIL Has Structurally Killed March Madness Upsets

Sources