Home/Systematic Trading/Quant Research Process

Quant Research Process

model-buildingLevel 2 — Intermediate

What It Is

Designing and operating the full infrastructure of a systematic investment research program: data acquisition and engineering, analysis tooling, experiment design, and team prioritization. The research platform is not just software — it determines which hypotheses can be tested, how quickly, and with what rigor. Research process quality is a sustainable competitive moat because it compounds: good infrastructure enables better experiments which generates better edges.

Correct Execution

Practitioner maintains a live research platform where data, analysis tools, and team skills are continuously upgraded. Prioritizes research projects by AUM-scaled information ratio (expected alpha per unit of research effort, scaled by how much capital the strategy can absorb). Separates data processing (commodity, can outsource) from analytics (proprietary, must own). Maintains a spectrum from strong-prior (academic empirical finance) to weak-prior (machine learning, allows data to reveal structure) research approaches to achieve framework diversity.

Progression Levels

Diagnostic Tree

Coaching Cues

  • "Data, tools, people — all three have to compound together. Neglect any one and the platform stalls." — Chris Meredith
  • "AUM × IR is the research prioritization metric. Not how interesting the paper is." — Chris Meredith
  • "Buy the raw data. Build the analytics. Own the mechanism." — Chris Meredith
  • "Strong priors for rigor. Weak priors for discovery. You need both." — Adam Butler; when designing research framework

Common Errors

  1. Confusing data acquisition with edge: Having unique data is necessary but not sufficient — the mechanism by which data predicts returns must be articulated and tested → data without mechanism = expensive noise.
  2. Strong-prior-only research: Academic empirical finance approach misses non-obvious patterns that only emerge from data without prior → need weak-prior stream to discover unexpected relationships.
  3. Weak-prior-only research (zero-prior): Feeding all data into ML without any theoretical framework → massive false positive problem → every pattern discovered is likely noise minus trading costs.
  4. Static research prioritization: Research agenda set at year start and not updated based on new edges discovered or old edges decaying → need live strategy inventory with real-time expected value monitoring.
  5. Misaligned team incentives: Research talent is scarce; if economics aren't shared generously, talent exits → the research platform degrades faster than competitive pressures can be responded to.

Edges

🔑 Hidden Causal Lever

Strong-Prior Research Discovers, Weak-Prior Research Monitors Decay

Strong-prior (academic empirical finance) and weak-prior (machine learning) research are not just different discovery tools — they serve complementary lifecycle roles. Strong-prior is better at initial hypothesis development with theoretical grounding. Weak-prior is better at detecting when a historically strong-prior factor is decaying in real-time because it has no attachment to the original thesis.

What most people do
Run one research framework (usually strong-prior) and use performance drawdowns to detect factor decay — which is lagging by definition.
What the best do
Run both frameworks in parallel, using the weak-prior stream as a live monitoring tool for the decay of strong-prior strategies. When the weak-prior ML detects that a theoretically grounded factor is no longer predictive, that is an early warning before performance obviously deteriorates.
Why it's an edge: Earlier detection of factor decay enables more graceful exits from crowding situations before the dramatic de-crowding event that generates losses.
How to exploit: For every production factor strategy, run a parallel ML monitoring model that treats the factor's predictive signal as a live time series. When the ML model's feature importance for that factor starts declining, flag it for review — regardless of recent P&L.
Adam Butler, "Questioning the Quant Orthodoxy," Flirting with Models S5E13, 2022-10-03
💎 Elite-Only Behavior

Buy Raw Data, Build Analytics, Own The Mechanism

The standard practice of purchasing vendor analytics (processed factor scores from Barra, FactSet, etc.) creates a structural blind spot: when the factor decays, you cannot diagnose why because you don't own the mechanism. The edge is not in having data — it is in the causal chain from raw data to return. That chain must be owned in-house.

What most people do
Purchase vendor analytics (pre-built factor scores) for use in models. Accept vendor methodology as a black box. Diagnose underperformance by looking at strategy-level P&L rather than mechanism-level signal quality.
What the best do
Purchase raw data and build all analytics internally. Every factor used in production has an in-house implementation whose methodology the team can inspect, modify, and diagnose. Vendor data is raw input; vendor analytics are never used in production.
Why it's an edge: When a factor underperforms, in-house analytics allows diagnosis at the mechanism level (is the signal still predicting? or is the signal fine but the market has re-priced the risk premium?). Vendor analytics makes this diagnosis impossible.
How to exploit: Audit every production factor for whether the analytical methodology is owned in-house. For any factor using vendor analytics, rebuild the signal from raw data within one research cycle. This is a prerequisite for understanding decay vs. noise.
Chris Meredith, "What Does a Full-Stack Quant Research Platform Look Like?" Flirting with Models, 2023-02-13
🔑 Hidden Causal Lever

The Mosaic Has More Value Than Any Single Data Source

Research instinct focuses on finding one great data source — the satellite data, the alternative signal, the unique insight. But the empirical evidence from mature systematic shops is that the edge comes from building a richer information mosaic than competitors, not from superior processing of any single source. Combining five independent partial signals that are each 55% predictive produces a more reliable combined signal than one 65% predictive source, because the combination reduces the variance around the prediction.

What most people do
Pursue a single differentiated data source as the primary research goal. Spend 80% of research budget perfecting one signal.
What the best do
Build the mosaic: multiple independent, partially predictive signals that each capture a different facet of the investment thesis. The research goal is independence and coverage, not any single signal's predictive power.
Why it's an edge: Information competition in any single data source intensifies quickly once discovered. A mosaic of 10 independent signals is much harder for competitors to replicate than a single "magic" data source.
How to exploit: For every research agenda item, ask: "Is this adding a new independent dimension to our view, or is this improving a dimension we already have?" Prioritize new dimensions over improvements to existing ones until the mosaic has at least 5-7 genuinely independent signal sources.
Chris Meredith, "What Does a Full-Stack Quant Research Platform Look Like?" Flirting with Models, 2023-02-13

Sources

  • Chris Meredith, "What Does a Full-Stack Quant Research Platform Look Like?" Flirting with Models (2023-02-13) — three-pillar platform framework, AUM-scaled IR research prioritization, data processing vs. analytics distinction, structured vs. unstructured data integration
  • Adam Butler, "Questioning the Quant Orthodoxy," Flirting with Models S5E13 (2022-10-03) — strong-prior vs. weak-prior research frameworks, framework diversity as process diversification, experiment design evolution