Home/Sports Betting/Bayesian Statistical Modeling

Bayesian Statistical Modeling

Model-Based BettingLevel 3 — Sharp

What It Is

A statistical philosophy for sports modeling that treats observed game data as fixed truth and probability distributions as adaptive — updating beliefs as new evidence arrives. The alternative (frequentist) treats distributions as fixed and data as random, which is the wrong framing for the small-sample environment of sports.

Correct Execution

You choose Bayesian approaches because sports data is inherently small-N: an EPL team plays 38 games, a Champions League team may play ~10. Every single observation must update the model — including "outlier" results like a 6-0 scoreline, which a frequentist would discard as an aberration but which actually happened and must inform team strength estimates. You use Monte Carlo simulation to build probability distributions as outputs (not point estimates). For player props, you use mixed distributions — simulate one distribution for efficiency (e.g., points per minute) combined with a separate distribution for playing time (e.g., minutes played) to produce a full probability distribution for the outcome.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "The Bayesian approach says the data is fixed — it actually happened — and as you get new data, the distribution shifts and evolves and learns." — Andrew Mack, Ep. #08
  • "Sports data is a small data problem. If you have an outlier, you have a big problem in frequentist terms, because that data really happened." — Andrew Mack, Ep. #08
  • "Simulations are generally the preferred way to go about it — the output is a probability distribution that answers most of the questions bettors want answered." — Andrew Mack, Ep. #08

Common Errors

  1. Defaulting to normal distributions: Most sports outcomes are not normally distributed → Check the data shape first → Try Poisson for goals, negative binomial for overdispersion
  2. Point-estimate outputs: A single number answers one question → Monte Carlo outputs answer all of them → Rebuild as simulation
  3. Discarding outliers: In frequentist terms they're noise; in Bayesian terms they happened → Never throw away real game results
  4. Strong priors overriding data: If your prior is too strong, new data can't update the model → Test whether your model actually learns from extreme results

Edges

Conventional Wisdom Is Wrong

Bayesian Models Survive Small Samples; Frequentist Models Fail

Standard statistics (frequentist) assumes large samples. Sports gives you 38 EPL games per team, ~10 CL games, and sometimes far fewer. In small-N environments, frequentist models produce unstable estimates and throw away real data as "outliers." Bayesian models treat every observation as fixed truth that updates the distribution — exactly the right behavior when you have almost no data.

What most people do
Build frequentist models (regression, OLS) because that's what statistics courses teach. Discard extreme results to "clean" the data.
What the best do
Choose Bayesian frameworks and simulation approaches that can update from small samples without becoming unstable. Accept every game result as real information.
Why it's an edge: Most modeling bettors are using statistical tools designed for large-N research problems. The sports context is categorically different — small N, non-stationary teams, regime changes. Tools designed for this context produce better estimates.
How to exploit: At minimum, use Monte Carlo simulation for output. For small-sample markets (CL, cups, tournaments), use Bayesian frameworks with domestic-form priors.
"With sports data you don't have a lot of data. From a statistical point of view it's a small data problem, which means if you have an outlier you have a big problem — because that data really happened." — Andrew Mack, Ep. #08
🔑 Hidden Causal Lever

Mixed Distribution Models Capture Prop Variance That Point Estimates Miss

A player's point total depends on two independent sources of variance: efficiency (points per minute) and playing time (minutes). Point-estimate models collapse this into one number and miss the variance from playing time fluctuation. A player averaging 1.2 pts/min who plays anywhere from 15-35 minutes has enormous outcome variance that a single "projected 28 points" can't capture.

What most people do
Project player props using season-average efficiency × average minutes = point estimate. Compare to the line.
What the best do
Model efficiency and minutes as separate distributions, simulate them together via Monte Carlo, read the probability of crossing the line directly from the simulation output.
Why it's an edge: Prop markets are thin and priced with point estimates. A distribution-based model identifies value when the variance in one component creates tail probabilities that the market underprices.
How to exploit: Build two distributions for every prop: efficiency and playing time. Run 10,000 Monte Carlo iterations. The percentage of iterations crossing the threshold IS your win probability.
"I walked the reader through player props using mixed distributions — taking a distribution for points per minute and a distribution for minutes played and doing a Monte Carlo simulation." — Andrew Mack, Ep. #08
Conventional Wisdom Is Wrong

Priors Still Dominate After a Full Season in Low-Game Sports

Most bettors assume a full season of data overrides priors. In college football (12 games), even after a full season "we're still regressing a large amount to our priors." Without strong priors, a model would pick the Minnesota Vikings to win the Super Bowl based on a negative point differential.

What most people do
Build season-based models that treat 12 games as sufficient data to override pre-season estimates. Trust in-season performance as the dominant signal after week 8-10.
What the best do
Maintain heavy prior weighting (recruiting ratings, returning production, coaching stability) through and beyond a full season. The prior doesn't fade to zero — it fades to a still-significant fraction.
Why it's an edge: Most college football models under-weight priors by mid-season, creating systematic mispricing on teams whose in-season results diverge from their talent stock.
How to exploit: In your college football model, test prior weight at 0%, 25%, 50% after a full season. The optimal weight will be much higher than intuition suggests. Compare predictions with and without priors for Week 13-14 and bowl games.
"Even after a full season we're still regressing a large amount to our priors." — Rufus Peabody, ETR Podcast Ep. 67, 2020
🔑 Hidden Causal Lever

Model Averaging Creates Adverse Selection at Extreme Edges

When your model shows a huge edge (e.g., 29% on a single game), naive model-market averaging is dangerous because the situations where your model disagrees most with the market are precisely the situations where YOUR model is most likely wrong. A horse your model prices at 3/1 going off at 25/1 is a debugging signal, not confirmation of a huge opportunity.

What most people do
Get excited when their model shows a large edge. Bet more on high-edge opportunities, assuming the model is right and the market is wrong.
What the best do
Treat extreme model-market disagreements as debugging opportunities. "When the universe tells you you're missing something, the first question is: what am I missing?" Investigate why the market disagrees before betting.
Why it's an edge: The situations where a model's apparent edge is largest are the situations most likely to contain a model error. The practitioner who investigates extreme edges rather than exploiting them blindly avoids the largest single-bet losses.
How to exploit: Set a threshold (e.g., model edge >15%). Any bet exceeding this threshold triggers a mandatory investigation: check for missing information (injuries, lineup changes, rule differences, data errors). Only proceed after confirming the market isn't incorporating something your model missed.
"If my model says 29% edge on a single game, it's usually not the market that is way off." — Harry Crane, Analytics.Bracket, 2022

Sources

  • Andrew Mack, Ep. #08 (2023-12-18) — Bayesian vs. frequentist philosophy, small-N problem, Monte Carlo simulation, mixed distributions for player props
  • Rufus Peabody, Using Data to Find Angles (2023-08-17) — priors essential even after full season, recruiting ratings as talent prior, Bayesian updating in college football
  • Harry Crane / Matt Bookhalter, Analytics.Bracket (2022-03-15) — margin-of-victory encoding, recency weighting, retrospective opponent ratings
  • Harry Crane, Hidden Risks (2020-08-31) — model averaging adverse selection caveat