← Playbooks
Edgecraft~23 min read·5,109 words

Literature Review: Skill Networks & Mastery Scaffolding

What the research says about whether Edgecraft's architecture is right — and what needs to change.


1. The Math Academy Way — Deep Analysis

Source: Justin Skycak, "The Math Academy Way" (508 pages, working draft, updated March 2026)

Their Architecture vs Ours

Math Academy's knowledge graph:

  • Thousands of topics (their unit of knowledge, equivalent to our "skills")
  • Each topic has multiple "knowledge points" (sub-concepts within a topic — we don't have this layer)
  • Strict DAG of prerequisite relationships (same as us)
  • ADDITIONALLY: "encompassing" relationships — when an advanced topic implicitly practices simpler topics. This is a SEPARATE graph overlaid on the prerequisite graph.
  • Course = a set of topics in the graph (not a separate structure — the graph is the source of truth)
  • Their granularity is MUCH finer than ours. A single course has ~300 topics. Our entire pickleball domain has 51 skills. They would have 300+ for a single subject at one level.

Key architectural difference: Encompassings

This is the concept we're missing entirely. An "encompassing" relationship says: "when you practice topic X, you're implicitly also practicing topics A, B, C (which are simpler component skills)."

Example: Multiplying a two-digit number by a one-digit number encompasses both "multiplying one-digit numbers" and "adding a one-digit number to a two-digit number."

This means when a student reviews the advanced topic, they get credit toward spaced repetition on ALL the simpler topics it encompasses. This is mathematically elegant — it dramatically reduces review burden because advanced practice serves double duty.

Edgecraft implication: Our prerequisite edges capture "A is required before B" but NOT "practicing B also practices A." In physical skills, this matters: practicing a full rally encompasses grip, footwork, and shot selection simultaneously. We should add encompassing relationships to our schema.

Their Spaced Repetition: FIRe (Fractional Implicit Repetition)

This is a NOVEL algorithm, not SM-2 or FSRS. Key properties:

  1. Repetitions trickle down. Passing a review on an advanced topic sends credit BACKWARD through encompassing relationships to simpler topics. You don't need to separately review the simpler topics if you're practicing them implicitly through advanced work.

  2. Failures propagate up. Failing a review on a simple topic sends penalty FORWARD to all advanced topics that encompass it. If you can't multiply one-digit numbers, you certainly can't multiply two-digit numbers.

  3. Partial encompassings with fractional weights. Not all encompassings are full — "integration by parts" only partially encompasses "integrating polynomials" (only some integration-by-parts problems involve polynomials). Weights represent the probability that a random problem from the advanced topic encompasses a random problem from the simpler topic.

  4. The spaced repetition model tracks per student per topic:

    • repNum — how many successful spaced repetition rounds accumulated
    • interval — ideal days between repetitions (grows with each success)
    • memory — expected retention, decays over time, triggers review when sufficiently low
    • speed — learning speed for THIS student on THIS topic (ratio of student ability to topic difficulty)
    • decay — backwards movement speed, grows larger when reviews are severely overdue (models "summer slide")
  5. Encompassing weights are set by domain experts on direct/key prerequisite edges only (not all pairwise). The system infers the rest through repetition flow. This scales linearly with topic count.

Edgecraft implication: We cannot copy FIRe directly because:

  • Our domains include motor skills where "review" means physical practice (15-30 minutes), not answering a quiz question (30 seconds)
  • We have no assessment mechanism (no quiz questions to measure accuracy)
  • Our skills are much coarser-grained (51 pickleball skills vs ~300 topics for one math course)

But the CONCEPT of encompassing relationships is transferable. In pickleball, "playing a full rally" encompasses grip, footwork, dinking, and shot selection. A spaced reinforcement system should give credit for component skills when the learner practices composite activities.

Their Mastery Gating

  • Student must "answer sufficiently many questions correctly in each successive knowledge point" to demonstrate mastery
  • Once mastered, more advanced topics become available on their "knowledge frontier"
  • If a student fails a lesson twice at the same knowledge point, the system automatically provides remedial reviews on the "key prerequisite topics"
  • The knowledge frontier = zone of proximal development (Vygotsky) — the range of tasks the student can do with support but not independently

Mastery threshold: Not a single percentage. It's knowledge-point-by-knowledge-point within a lesson. Each knowledge point has a worked example + similar questions. You must pass each one to progress.

Edgecraft implication: We have no assessment mechanism. Our "mastery" is self-reported via progression levels. This is fundamentally weaker than Math Academy's approach because:

  • Self-assessment is unreliable (Dunning-Kruger effect)
  • No automatic remediation triggers
  • No knowledge frontier computation

For our domain types, the closest equivalent would be benchmark-based self-assessment with video evidence or coach verification. But this requires a social/feedback layer we don't have.

Their Diagnostic Exams (Cold Start)

  • Adaptive diagnostic: 20-40 questions for lower-grade courses, 40-60 for higher-grade
  • Uses TWO inference mechanisms:
    1. Causal (encompassings): If you pass an advanced topic, you probably know the simpler topics it encompasses
    2. Correlation-based: Positive credit propagates to prerequisites; negative credit propagates to post-requisites; credit propagates within same-module leaf topics
  • Compresses the knowledge graph into the minimum topics that "cover" the course at desired granularity (each topic has both a progeny and ancestor within 3 prerequisite edges)
  • Measures "knowledge confidence" — detects conflicts (e.g., student passes an advanced topic but fails a simpler prerequisite)
  • Uses "conditional completion" for low-confidence areas — assumes mastery but quickly falls back if student struggles

Edgecraft implication: We have no cold start mechanism. A new user sees all 51 pickleball skills with no sense of where they are. Math Academy's diagnostic takes 20-60 questions and places you precisely on the knowledge frontier. Our equivalent would be a self-assessment checklist (can you do X? Y? Z?) that uses the prerequisite graph to infer your frontier. Even a crude version of this would be 10x better than nothing.

What Breaks Outside Math

Math Academy's entire system depends on three properties that math has and our domains don't:

Property Math Motor Skills (pickleball, shooting) Analytical (betting, trading) Creative (marketing)
Assessable via quiz Yes — solve this problem No — must physically perform Partially — can test knowledge but not judgment under uncertainty No — no single correct answer
Deterministic correctness Yes — answer is right or wrong Partially — shot lands in or out, but quality is continuous No — correct bet can lose, wrong bet can win No — effectiveness is probabilistic and delayed
Fine granularity possible Yes — "add fractions with unlike denominators" is atomic Harder — "dink cross-court" involves grip, footwork, contact point, trajectory simultaneously Yes for knowledge, no for execution Mixed
Spaced repetition via quiz Yes — re-solve problems No — must physically practice (15-30 min vs 30 sec) Partially — can review concepts but real skill is pattern recognition under pressure No — can review frameworks but real skill is creative application

Specific breakdowns:

  1. Motor skills: FIRe's "answer a question to demonstrate mastery" doesn't work. A pickleball player can't demonstrate grip mastery by answering a question — they must physically play. This means:

    • Assessment requires observation (video, coach, or self-assessment against specific benchmarks)
    • "Review" means physical practice, which is 100x more time-intensive than answering a quiz
    • Spaced repetition intervals must be much shorter for motor skills (physical memory decays faster than declarative)
    • Encompassing relationships work WELL for motor skills (a rally encompasses everything), but measuring credit is harder
  2. Analytical skills (betting, trading): The core problem is probabilistic outcomes. A "correct" bet can lose and a "wrong" bet can win. Assessment must be on PROCESS, not outcome. This means:

    • Mastery of "xG interpretation" can't be measured by "did your bet win" — it must be measured by "can you correctly interpret this xG chart"
    • Some skills ARE quiz-testable (knowledge of market mechanics, model interpretation)
    • But the highest-level skills (judgment under uncertainty, bankroll management under drawdown) require extended real-world performance to assess
  3. Creative skills (marketing): There is no single correct answer. "Is this a good hook?" depends on audience, context, and timing. Assessment requires:

    • Expert judgment or outcome measurement (did the hook convert?)
    • Significantly longer feedback loops (days-weeks vs seconds)
    • Portfolio-based assessment (pattern of results over time) rather than point-in-time testing

What We Should Copy

  1. Encompassing relationships — Add to our graph schema. "Playing a full rally" encompasses grip, footwork, dinking, shot selection. "Running a full marketing campaign" encompasses offer design, copywriting, landing pages.

  2. Knowledge frontier concept — Even with self-assessment, computing "what you're ready to learn next" from the DAG is valuable.

  3. Diagnostic exam approach — A self-assessment checklist that infers your frontier using prerequisite relationships. Not as precise as math quizzes, but still 10x better than "browse all 51 skills."

  4. Conditional completion — When self-assessment is uncertain, assume mastery but flag for re-assessment if downstream skills are struggling.

What We Should NOT Copy

  1. FIRe algorithm directly — Designed for quiz-based, seconds-per-question review. Our review is physical practice, minutes-per-skill.

  2. Fine-grained topic decomposition — Math can be decomposed into atomic operations. "Third shot drop" in pickleball is already semi-atomic. Going finer (grip micro-adjustments) creates meaningless nodes.

  3. Automated assessment — We can't auto-assess motor skills or creative skills. Self-assessment with benchmarks is our path.

  4. XP/gamification system — Math Academy's XP system (Chapter 22) is tuned for daily quiz engagement. Gamifying physical practice differently — practice logs, streak tracking, benchmark milestones.


2. Knowledge Space Theory (Doignon & Falmagne)

Knowledge space theory (KST), introduced by Doignon & Falmagne (1985, "Spaces for the Assessment of Knowledge"), provides a mathematical framework more expressive than simple prerequisite DAGs.

Key concepts:

  • A knowledge state is the set of all items/skills a person has mastered at a given time
  • A knowledge space is the collection of all possible knowledge states (not all combinations are possible — you can't know calculus without knowing algebra)
  • The surmise relation defines which items can be inferred from knowing other items (similar to prerequisites but more flexible)
  • A learning path is a sequence of knowledge states that adds one item at a time

How KST differs from our DAG:

  • Our DAG says "A is prerequisite for B" — binary relationship
  • KST says "if you know {A, B, C} then you MIGHT also know {D}" — probabilistic, set-based
  • KST allows for MULTIPLE valid orderings that a DAG can't represent elegantly
  • KST can represent "either A or B is sufficient prerequisite for C" — our DAG requires both

ALEKS uses KST directly. Their adaptive assessment constructs a probabilistic model of the student's knowledge state by asking questions and updating beliefs about which knowledge state the student is in. This is the mathematical foundation for their placement and adaptive learning.

Edgecraft implication: Our DAG is a simplification of KST. For most of our domains, this simplification is fine — the prerequisite relationships are mostly strict (you really do need grip before you can dink well). But some skills have "OR" prerequisites (you can learn third-shot-drop via either a soft-game path or a drive-heavy path). Our DAG forces "AND" (all prerequisites required). We could represent this more flexibly by using knowledge space formalism, but the complexity cost may not be worth it for 50-150 skill domains.

Recommendation: Keep the DAG for now. Add a prerequisiteMode: "all" | "any" field to graph.yaml for skills where only SOME prerequisites are needed. This captures the most important KST insight without the full mathematical machinery.


3. Student State Modeling

Three main approaches in the literature:

Item Response Theory (IRT) — Rasch 1960, Lord 1980

  • Models probability of correct response as function of student ability (theta) and item difficulty (beta)
  • 1PL model: P(correct) = logistic(theta - beta)
  • 2PL adds discrimination parameter: P(correct) = logistic(a * (theta - beta))
  • 3PL adds guessing parameter: P(correct) = c + (1-c) * logistic(a * (theta - beta))
  • Used by: standardized tests (GRE, GMAT), some adaptive platforms
  • Not directly applicable to Edgecraft because we don't have quiz items. But the concept of modeling "student ability on this topic" vs "topic difficulty" is exactly what Math Academy does in their student-topic learning speed calculation.

Bayesian Knowledge Tracing (BKT) — Corbett & Anderson 1995

  • 4 parameters per skill: P(L0) initial probability of mastery, P(T) probability of learning per opportunity, P(G) probability of guessing correctly, P(S) probability of slipping (knowing but getting wrong)
  • Hidden Markov Model: at each practice opportunity, update belief about whether student has mastered the skill
  • Used by: Carnegie Learning, Cognitive Tutors
  • Partially applicable: The P(S) "slip" concept is valuable — a student might have mastered a skill but fail to execute it under pressure (relevant for motor skills and competition settings)

Deep Knowledge Tracing (DKT) — Piech et al 2015

  • Uses recurrent neural networks to predict student performance from sequence of interactions
  • No explicit parameters — the model learns latent representations
  • Criticized for being a black box and sometimes making nonsensical predictions (e.g., predicting a student gets worse after correct answers)
  • Not applicable to Edgecraft — requires massive interaction data we don't have and provides no interpretable structure

Edgecraft recommendation: Use a simplified BKT-inspired model for self-assessment: each skill has a "confidence" score based on self-reported mastery + time since last practice + downstream skill performance (if you struggle at drops, your grip confidence should decrease). This is what Math Academy's FIRe system does, adapted for self-report rather than quiz data.


4. Diagnostic Trees and Fault Analysis

Fault Tree Analysis (FTA) — Watson 1961, aerospace engineering

  • Represents failure modes as a tree with AND/OR gates
  • Top event (system failure) branches into intermediate events, eventually reaching basic events (root causes)
  • AND gate: all children must fail for parent to fail
  • OR gate: any child failing causes parent to fail
  • Minimal cut sets: the smallest combinations of basic events that cause the top event

How this maps to our diagnostic trees:

  • Our structure: symptom → root cause → fix
  • FTA adds: AND/OR logic for COMPOUND symptoms. "Ball goes long AND wide" has different root causes than "ball goes long but straight"
  • We're missing the AND/OR logic. Our diagnostics treat each symptom independently.

Medical differential diagnosis uses a hypothesis-generate-test cycle:

  1. Patient presents symptoms
  2. Doctor generates 2-5 candidate diagnoses
  3. Asks discriminating questions that differentiate between candidates
  4. Orders tests to confirm/reject
  5. Narrows to diagnosis

This is exactly our Diagnostic Engine (Proposal 3) design. The key insight from medicine: the power is in the discriminating questions, not in the symptom catalog. A good question eliminates 50% of possibilities.

Information gain (Shannon entropy) for question selection:

  • For each candidate question, compute how much it reduces uncertainty about the diagnosis
  • Select the question with highest information gain
  • This is the mathematical foundation for ID3/C4.5 decision tree algorithms

Sports coaching examples: Limited formal literature, but:

  • The Titleist Performance Institute (TPI) has a movement screen: 12 physical tests that identify the root cause of golf swing faults. This IS a diagnostic tree applied to motor skills.
  • FMS (Functional Movement Screen) — 7 movement patterns scored 0-3, identifies injury risk and movement quality. Used in physical therapy and strength training.
  • These are the closest analogues to our diagnostic approach in physical domains.

Edgecraft implication: Our diagnostic trees are structurally sound but could benefit from:

  1. AND/OR logic for compound symptoms
  2. Information gain-based question ordering (not just hand-curated decision trees)
  3. Structured movement screens as diagnostic ENTRY POINTS for physical domains

5. Cross-Domain Transfer: When It Works and When It Doesn't

Structure Mapping Theory (Gentner 1983) — the foundational framework:

  • Analogy works when two domains share relational structure (how things relate to each other), not object attributes (what things look like)
  • "Systematicity principle": higher-order relations (relations between relations) are stronger analogies than first-order relations
  • Surface similarity (they LOOK alike) ≠ structural similarity (they WORK the same way)

When transfer works:

  • Shared causal mechanisms (same root cause produces same symptom in both domains)
  • Shared relational structure (same DAG shape, same failure-success patterns)
  • The learner is explicitly TOLD about the analogy (spontaneous far transfer is rare)

When transfer fails (Barnett & Ceci 2002, taxonomy of transfer):

  • Surface similarity without structural similarity → misleading analogies
  • The learner assumes transfer without checking → negative transfer (old habits interfere with new learning)
  • Domains with different feedback loops (immediate vs delayed, deterministic vs probabilistic)

Negative transfer is real and dangerous:

  • A tennis player learning pickleball transfers power-hitting habits that don't work at the kitchen
  • A stock trader learning sports betting transfers "cut losses fast" when Kelly criterion says hold through variance
  • These are cases where surface similarity (both involve a racquet; both involve placing bets) masks structural differences

Edgecraft implication for Rosetta Engine:

  • Cross-domain parallels must be validated for STRUCTURAL similarity (shared causal mechanism), not surface similarity
  • Every parallel must include a "transfer warning" for where the analogy BREAKS
  • The system should explicitly flag negative transfer risks: "This pattern transfers from shooting to pickleball, BUT the grip pressure threshold is different because..."
  • Spontaneous far transfer is rare — the system must EXPLICITLY highlight the connection for it to be useful

6. Spaced Repetition for Non-Factual Knowledge

SM-2 algorithm (Wozniak 1987): Designed for flashcard-style declarative memory.

  • Initial ease factor: 2.5
  • Decreased by 0.2 per failure, minimum 1.3
  • Interval formula: I(n) = I(n-1) * EF, where EF = ease factor
  • First interval: 1 day, second: 6 days, then multiply by EF
  • This assumes each review takes 5-30 seconds (flip flashcard, recall answer)

FSRS (Free Spaced Repetition Scheduler): Improves on SM-2 with:

  • 4 learnable parameters per student (stability, difficulty, initial stability, initial difficulty)
  • Uses a power-law decay model instead of exponential
  • Trained on 500M+ reviews from Anki users
  • Better calibrated than SM-2 for long-term retention

Motor skill retention is DIFFERENT from declarative memory:

Schmidt & Lee ("Motor Learning and Performance," 6th ed) established that:

  • Motor skills show a "retention advantage" — once learned, they decay more slowly than declarative knowledge
  • BUT: early-stage motor learning decays FAST. A skill practiced for only a few sessions decays much faster than one practiced for months
  • The "encoding specificity" principle: motor skills are context-dependent. Practicing in one context (calm, no pressure) doesn't fully transfer to another (competition, fatigue)
  • Contextual interference effect (Shea & Morgan 1979): Interleaved practice (random order) produces worse performance during practice but BETTER long-term retention than blocked practice (same skill repeated). This is the motor equivalent of interleaving in math (Rohrer & Taylor 2007).

Implications for Edgecraft spaced repetition:

Skill Type Decay Rate Review Duration Interval Strategy
Declarative (facts, frameworks) Fast initial, stabilizes 30 seconds (flashcard) SM-2/FSRS standard
Motor (grip, footwork, stroke) Slow once encoded, fast if new 15-30 minutes (physical drill) Longer intervals but MUST include contextual variation
Analytical (pattern recognition) Medium 5-10 minutes (scenario analysis) Standard intervals with INTERLEAVING of scenario types
Strategic (decision-making under pressure) Slow (conceptual) + fast (execution) Variable — must include pressure simulation Interval with deliberate stress inoculation

Interleaving vs Blocking (Rohrer & Taylor 2007):

  • Students learned to compute volumes of 4 different solid shapes
  • Blocked group: practice all type A, then all type B, etc.
  • Interleaved group: random mix of A, B, C, D
  • On test one week later: interleaved group scored 63%, blocked group scored 20%
  • The interleaved group performed WORSE during practice but 3x better on the test
  • This is a "desirable difficulty" (Bjork 1994) — it feels harder but produces better learning

Edgecraft recommendation:

  1. Don't use SM-2 directly — it's for flashcards
  2. For motor skills: use practice-log-based tracking (did you practice this week? what drills?) rather than quiz-based review
  3. Build interleaving INTO practice recommendations: "This week, practice cross-court dinks, drives, AND third-shot drops in mixed drills, not separate blocks"
  4. Separate "review" (have you retained this?) from "practice" (are you improving this?) — they serve different functions and need different schedules

7. Quality Metrics for Educational Content

Bloom's Taxonomy (Revised, Anderson & Krathwohl 2001):
6 levels: Remember → Understand → Apply → Analyze → Evaluate → Create

  • Our diagnostics live at Apply/Analyze level (can you do it? what went wrong?)
  • Our edges live at Evaluate/Create level (what's the non-obvious insight?)
  • Our progression levels map roughly to Remember (L1) → Apply (L2) → Analyze (L3) → Evaluate (L4)

Cognitive Load Theory (Sweller 1988):

  • Intrinsic load: inherent complexity of the material (can't be reduced without simplifying content)
  • Extraneous load: complexity from poor presentation (CAN and SHOULD be reduced)
  • Germane load: effort devoted to learning/schema formation (should be MAXIMIZED)
  • The expertise reversal effect (Kalyuga et al 2003): Scaffolding that helps beginners HURTS experts. Worked examples improve novice learning by 50%+ but decrease expert learning. Math Academy handles this by removing scaffolding as mastery increases.

Edgecraft implication: Our skill files don't currently adapt to learner level. A Level 1 beginner and a Level 4 expert see the same file. Math Academy serves different content at different mastery levels. We should consider:

  • Highlighting the relevant progression level section based on self-assessed mastery
  • Collapsing lower-level content for advanced users (expertise reversal effect)
  • Expanding scaffolding for beginners (cognitive load management)

Summary: What Should Change in Edgecraft's Architecture

Copy from Math Academy

  1. Encompassing relationships — Add to graph.yaml schema. Critical for efficient spaced reinforcement.
  2. Knowledge frontier computation — "What you're ready to learn next" based on mastered prerequisites.
  3. Adaptive diagnostic — Self-assessment checklist that infers your frontier using graph structure.
  4. Conditional completion — When self-assessment is uncertain, assume mastery but flag for re-check.

Adapt (not copy directly)

  1. Spaced repetition — Use practice-log-based tracking, not quiz-based. Incorporate interleaving.
  2. Mastery assessment — Self-assessment against specific benchmarks, not quiz scores. Consider video evidence or coach verification as premium features.
  3. Student-topic learning speed — Model as self-reported difficulty + time spent + downstream performance, not quiz accuracy.

Add (Math Academy doesn't have this)

  1. Diagnostic trees — Math Academy doesn't have symptom→root cause→fix chains. This IS our differentiator.
  2. Cross-domain transfer — Math Academy is single-domain. Rosetta Engine has no precedent there.
  3. Coaching cues — Math Academy teaches via worked examples. We teach via coaching cues (short, memorable, moment-of-action phrases). This is better for motor/strategic skills.
  4. Edge insights — "Where conventional wisdom is wrong" has no equivalent in Math Academy. This is pure Edgecraft.

Don't do

  1. Fine-grained topic decomposition — Our 50-150 skill domains are the right granularity for non-math domains. Going finer creates meaningless nodes.
  2. Automated quiz assessment — Can't assess motor skills, creative skills, or probabilistic judgment via quiz.
  3. XP/gamification — Wrong motivational model for physical practice and professional skill development.

8. Critical Findings from Broader Literature Review

The 85% Rule — Target Error Rate for Optimal Learning

Wilson et al. (2019, Nature Communications, "The Eighty Five Percent Rule for Optimal Learning") derived mathematically that the error rate maximizing learning rate is exactly 15.87% (i.e., ~84% accuracy). Validated on human and animal learning data.

  • Too easy (>95% accuracy) = insufficient error signal, no learning
  • Too hard (<70% accuracy) = too noisy an error signal, learned helplessness
  • Sweet spot = ~85% accuracy

Edgecraft implication: For any assessment or practice recommendation, calibrate to target ~85% success. If a learner consistently gets >95%, advance them. If <70%, wrong level. This applies to diagnostic questions, practice drills, and benchmark challenges.

Challenge Point Framework — Difficulty Must Match Skill Level

Guadagnoli & Lee (2004) showed that optimal practice difficulty depends on skill level:

  • Beginners: Blocked practice (low contextual interference). Dink 50 times in a row.
  • Intermediates: Moderate interleaving. 10 dinks, 10 drives, 10 drops, repeat.
  • Experts: Full randomized practice. Random mix of all skills.

The challenge point shifts rightward as skill increases — harder practice conditions produce better learning, but ONLY if the learner is ready for them. Beginners who interleave too early get overwhelmed; experts who block-practice plateau.

Edgecraft implication: Our progression levels (1-4) should drive practice structure recommendations, not just content. Level 1 = blocked practice. Level 3 = interleaved. Level 4 = full random + pressure simulation.

Expertise Reversal Effect — Scaffolding That Helps Beginners HURTS Experts

Kalyuga et al. (2003): Worked examples improve novice learning by ~50% but DECREASE expert learning by ~30%. The mechanism: experts already have schemas, so step-by-step walkthroughs are redundant noise that interferes with their more efficient holistic strategies.

Edgecraft implication: Our skill files show the same content to all levels. They shouldn't. Content presentation should CHANGE by level:

  • Level 1: Full worked examples, explicit coaching cues, step-by-step
  • Level 2: Faded examples, selective cues
  • Level 3: Problem-based practice, diagnostics only
  • Level 4: Full autonomy, peer coaching, novel situation generation

BKT Exact Formulas — How to Track Mastery Probabilistically

Corbett & Anderson (1995), four parameters per skill:

  • P(L0): Prior probability of mastery (typically 0.1)
  • P(T): Learning rate per opportunity (0.01-0.5)
  • P(G): Guessing probability (0.0-0.4)
  • P(S): Slip probability — knowing but failing (0.0-0.3)

Update after correct answer:

P(Ln|correct) = P(Ln-1)(1-P(S)) / [P(Ln-1)(1-P(S)) + (1-P(Ln-1))P(G)]

Update after incorrect:

P(Ln|incorrect) = P(Ln-1)P(S) / [P(Ln-1)P(S) + (1-P(Ln-1))(1-P(G))]

Mastery threshold: P(Ln) >= 0.95

The P(S) "slip" parameter is particularly valuable for Edgecraft: A learner who has mastered a skill but fails under pressure has high P(S), not low P(L). This is the difference between "doesn't know it" and "knows it but can't execute under stress" — a critical distinction for motor and competitive skills.

Fault Tree AND/OR Gates — What Our Diagnostics Are Missing

Our diagnostic trees treat each symptom → root cause as independent. But real symptoms often have COMPOUND causes:

  • OR gate: "Ball goes into net" caused by racquet face too closed OR contact too low OR insufficient follow-through (any one suffices)
  • AND gate: "Can't execute under match pressure" requires poor shot selection AND technique breakdown AND confidence loss (all three together)

Minimal cut sets identify the most efficient fix strategy — the smallest set of root causes that, if fixed, resolves the symptom. This is the mathematical foundation for our "leverage point" analysis.

Surmise Systems — Disjunctive Prerequisites

Knowledge Space Theory (Doignon & Falmagne, 1985) allows OR-prerequisites that our DAG can't express:

Our DAG: "third-shot-drop requires grip-continental AND paddle-angle AND contact-point" (all must be mastered)

Surmise system: "third-shot-drop requires (grip-continental OR grip-eastern) AND paddle-angle AND contact-point" (either grip pathway works)

Edgecraft implication: Add a prerequisiteMode field or allow nested prerequisite arrays:

prerequisites: [[grip-continental, grip-eastern], paddle-angle, contact-point]
# Inner arrays = OR (any suffices), outer = AND (all required)

Desirable Difficulties — Learners Will Resist Your Best Recommendations

Bjork (1994): Learning conditions that make PERFORMANCE worse during practice make LEARNING (retention + transfer) better. Spacing, interleaving, testing, and reduced feedback all feel harder but produce 2-3x better retention.

The critical problem: Learners consistently rate blocked practice as superior EVEN AFTER demonstrating worse retention (Bjork & Bjork, 2011). They will resist your scheduling recommendations and blame the system.

Edgecraft implication: Every practice recommendation must include WHY it feels harder: "Today's mixed practice will feel less smooth than yesterday's focused drills. That's intentional — research shows you'll retain 2x more. Trust the process."

FSRS vs SM-2 — Use Power-Law, Not Exponential Forgetting

FSRS (Ye, 2022) achieves ~30% fewer reviews than SM-2 for the same retention rate. The key improvement: power-law forgetting curve R = (1 + t/(9S))^(-1) vs SM-2's implicit exponential. The power-law better matches empirical data — SM-2 overestimates forgetting at short intervals and underestimates at long intervals.

For declarative components of our skills (rules, frameworks, terminology), FSRS is the current state of the art.


9. Revised Architecture Recommendations

Based on the complete literature review, here's the updated schema:

New schema fields to add

# In graph.yaml, per skill:
- id: third-shot-drop
  prerequisites: [[grip-continental, grip-eastern], paddle-angle]  # NEW: nested = OR/AND
  encompasses: [grip-pressure, contact-point]  # NEW: skills implicitly practiced
  encompassingWeights:  # NEW: fractional weights
    grip-pressure: 0.8
    contact-point: 1.0
  skillType: motor  # NEW: motor | conceptual | procedural | perceptual
  elementCount: 5  # NEW: simultaneous elements (cognitive load proxy)

New diagnostic structure

# In skill files, diagnostic entries:
### Symptom: Ball pops up on dinks
**Gate type:** OR  # NEW: this symptom has multiple independent causes
**Root causes:**
1. Contact point too high (probability: 0.4)
2. Grip tension (probability: 0.35)
3. Wrong weight transfer (probability: 0.25)
**Discriminating question:** "Does it happen more on forehand or backhand?"
  - Forehand → likely contact point (check: shadow swing, watch paddle face at contact)
  - Backhand → likely grip tension (check: bottom three fingers, are they squeezing?)
**Minimal cut set:** Fix either #1 OR #2 and the symptom resolves in most cases

Level-adaptive content presentation

Level 1: Full worked examples + explicit coaching cues + blocked practice recs
Level 2: Faded examples + selective cues + moderate interleaving
Level 3: Diagnostics + problem-based practice + full interleaving
Level 4: Peer coaching + novel situations + competitive simulation

Sources: Skycak, "The Math Academy Way" (2026, 508 pages); Doignon & Falmagne, "Spaces for the Assessment of Knowledge" (1985); Corbett & Anderson, "Knowledge Tracing" (1995); Gentner, "Structure-Mapping: A Theoretical Framework for Analogy" (1983); Barnett & Ceci, "When and Where Do We Apply What We Learn?" (2002); Wozniak, "SuperMemo SM-2 Algorithm" (1987); Schmidt & Lee, "Motor Learning and Performance" (multiple editions); Shea & Morgan, "Contextual Interference Effects on Motor Skill Acquisition" (1979); Rohrer & Taylor, "The Shuffling of Mathematics Problems Improves Learning" (2007); Bjork, "Memory and Metamemory Considerations in the Training of Human Beings" (1994); Sweller, "Cognitive Load During Problem Solving" (1988); Anderson & Krathwohl, "A Taxonomy for Learning, Teaching, and Assessing" (2001); Kalyuga et al, "The Expertise Reversal Effect" (2003).