Introduction

In 1984, educational psychologist Benjamin Bloom published a finding that haunted educators for decades. Students who received one-on-one tutoring outperformed 98% of students taught in conventional classrooms [1]. The gap was enormous. Two full standard deviations. Bloom called it the "2 sigma problem" and posed a question nobody could answer: how do you give every student a personal tutor?

Forty years later, adaptive learning algorithms are the closest thing to an answer. These are computational systems that estimate what a learner knows, predict what they are about to forget, and decide what they should study next. They power the scheduling engines behind millions of digital flashcard decks. They drive computerized adaptive tests taken by hundreds of thousands of medical and graduate school applicants each year. And they sit at the heart of intelligent tutoring systems that have been shown, across dozens of controlled studies, to produce learning gains approaching those of skilled human tutors [2].

But the story of adaptive learning algorithms is not a story of sudden invention. It is a story that begins with a psychologist in 1885 memorizing nonsense syllables alone in his apartment, passes through Cold War teaching machines and Soviet dolphin experiments, and arrives at neural networks that can track the knowledge state of a student across thousands of skills simultaneously. It is a story of four distinct mathematical traditions converging on the same problem. And it is a story with an ending that has not yet been written.

Abstract neural network connecting flashcards to digital screens in warm and cool tones.

The Curve That Started Everything

Every adaptive learning algorithm ever built rests on one empirical observation: memory decays predictably.

In 1885, Hermann Ebbinghaus, a German psychologist working alone in his Berlin apartment, published the results of an experiment he had conducted on himself. He memorized lists of nonsense syllables, combinations like "WID," "ZOF," and "DAX" designed to have no prior associations. Then he measured how much he retained at various intervals using the "method of savings," tracking how much faster relearning was compared to initial learning [3].

The numbers were striking. After twenty minutes, roughly 58% of the material remained. After one hour, 44%. After one day, just 33%. After a week, about 21%. The decay followed a mathematical curve that Ebbinghaus expressed as R = e^(−t/S), where R is retention, t is elapsed time, and S is memory strength [4].

More than a century later, in 2015, Jaap Murre and Joeri Dros replicated the experiment with one subject under controlled conditions and found broadly similar parameters [5]. The forgetting curve was real. And if forgetting is predictable, then the timing of review can be computed.

That single insight, that memory decay is not random but follows a quantifiable pattern, is the foundation stone beneath every spaced repetition scheduler, every knowledge tracing model, and every adaptive assessment engine in existence.

Large hourglass on wooden desk with flowing sand and fading index cards.

The Bottleneck Inside Your Skull

But forgetting is only half the problem. The other half is capacity.

In 1956, George Miller published what may be the most famous paper in the history of psychology. "The Magical Number Seven, Plus or Minus Two" reported that human short-term memory can hold roughly seven chunks of information at once [6]. In 2001, Nelson Cowan re-examined the evidence with rehearsal controlled and revised the number downward. When people cannot silently repeat items to themselves, the true capacity of working memory is closer to four items [7].

Four. That is the bandwidth through which all new learning must pass.

This bottleneck has direct consequences for instructional design. If a lesson presents five unfamiliar concepts simultaneously, working memory overflows and learning collapses. Adaptive systems address this by breaking material into small units, scaffolding new information onto existing knowledge, and never presenting more than the learner can absorb. The connection between cognitive load theory and adaptive algorithm design is not accidental. It is architectural.

There is another piece of the puzzle. In 1999, Filip Dochy, Mien Segers, and Michelle Buehl reviewed 183 studies and found that prior knowledge accounts for 30 to 60 percent of variance in learning outcomes [8]. What a student already knows is the single strongest predictor of what they will learn next. Every adaptive system that builds a "student model" is, at its mathematical core, trying to estimate this variable.

Translucent glass spheres with geometric shapes above an open book.

Two Effects That Rewrote the Rules of Studying

Two findings from cognitive science shaped adaptive algorithms more than any others: the spacing effect and the testing effect.

In 2006, Hal Pashler, Nicholas Cepeda, and colleagues published a massive meta-analysis in Psychological Bulletin. They analyzed 839 comparisons across 184 articles and found that distributing practice across time produced better retention than massing it together in 96% of studies [9]. The optimal gap between study sessions was roughly 10 to 20 percent of the desired retention interval. Want to remember something for a month? Space your reviews about three to six days apart.

The same year, Henry Roediger and Jeffrey Karpicke at Washington University demonstrated the testing effect with a clean experiment. Students read a prose passage. One group studied it four times. Another group studied it once and took three practice tests. On a test one week later, the group that practiced retrieval outperformed the group that restudied by approximately 50% [10]. Retrieving information from memory, it turns out, strengthens the memory trace far more than passively re-reading it. This is why the testing effect has become central to modern learning science.

These two effects, spacing and testing, are not just interesting findings. They are the operational principles that every spaced repetition algorithm translates into code. SM-2 spaces reviews. Bayesian Knowledge Tracing models the effect of practice. FSRS predicts optimal retrieval timing. All of them are, in different mathematical languages, saying the same thing: test yourself, space it out, and let forgetting do its work.

From Teaching Machines to Thinking Machines

The idea of a machine that adapts to the learner is older than most people realize.

1926
Pressey patents the first teaching machine
1958
Skinner publishes Teaching Machines in Science
1960
PLATO system launches at University of Illinois
1972
Atkinson optimizes vocabulary learning with algorithms
1985
Doignon and Falmagne publish Knowledge Space Theory
1987
Wozniak releases SM-2 algorithm in SuperMemo
1994
ALEKS launches using Knowledge Space Theory
1995
Corbett and Anderson publish Bayesian Knowledge Tracing
2015
Piech introduces Deep Knowledge Tracing at NeurIPS
2022
FSRS algorithm published at ACM SIGKDD

In 1926, Sidney Pressey at Ohio State University built a shoebox-sized mechanical device with a rotating drum of multiple-choice questions. If a student selected the correct answer, the drum advanced. If not, it stayed put. A later version dropped questions once answered correctly twice, an early form of mastery learning [11].

B.F. Skinner took a different approach in 1958. His teaching machine, described in Science, used small instructional frames with near-100% success rates and immediate reinforcement [12]. Skinner deliberately avoided multiple choice because he believed wrong answer options would contaminate learning.

The first computer-based adaptive system was PLATO, launched by Donald Bitzer at the University of Illinois in 1960. By the 1970s, PLATO IV featured touch-screen plasma displays and networked terminals serving thousands of students worldwide [13].

But the real conceptual breakthrough came in 1972, when Richard Atkinson published a study comparing four strategies for teaching German-English vocabulary pairs. Two strategies were driven by a mathematical model of the learning process. The model-based strategies significantly outperformed random presentation and learner-controlled selection [14]. Atkinson had demonstrated, for the first time, that an algorithm could select better study items than a human learner could.

Vintage wooden desk showcasing three teaching machines from different eras.

Four Mathematical Traditions, One Problem

Modern adaptive learning algorithms come from four distinct mathematical traditions. Each asks the same question differently: what does this learner know, and what should they study next?

The first tradition is Item Response Theory (IRT). Georg Rasch, a Danish mathematician, published the foundation in 1960. The simplest IRT model, the Rasch or 1PL model, expresses the probability of a correct response as P(X=1|θ) = 1/(1 + e^(−(θ−b))), where θ is the learner's ability and b is the item's difficulty [15]. More complex versions add a discrimination parameter (2PL) and a guessing parameter (3PL). IRT is the engine behind every major computerized adaptive test: the GRE, GMAT, NCLEX, and the adaptive SAT. The algorithm selects the item that provides maximum information at the learner's current estimated ability level.

The second tradition is Knowledge Space Theory (KST). In 1985, Jean-Paul Doignon and Jean-Claude Falmagne defined a "knowledge space" as a collection of subsets representing all feasible combinations of skills a learner might possess [16]. The "outer fringe" of a knowledge state contains the items a learner is ready to learn next. ALEKS, launched in 1994 with NSF funding, operationalized KST at scale. It represents college algebra as roughly 400 to 500 problem types and uses Bayesian updating to identify each student's knowledge state [17].

The third tradition is Bayesian Knowledge Tracing (BKT). In 1995, Albert Corbett and John Anderson published a model that treats each skill as a hidden binary variable [18]. BKT has four parameters: P(L₀), the probability the student already knows the skill (typically around 0.36); P(T), the probability of transitioning from unknown to known after practice (around 0.10); P(S), the probability of making an error despite knowing (around 0.05); and P(G), the probability of guessing correctly without knowing (around 0.25). After each student response, the model updates its estimate using Bayes' rule. BKT remains the most widely used student modeling framework in intelligent tutoring systems.

The fourth tradition is spaced repetition scheduling. Piotr Woźniak began developing SuperMemo in 1985 and released Algorithm SM-2 in December 1987. SM-2 tracks each flashcard with a repetition count, an easiness factor (starting at 2.5), and an interval in days [19]. The first review comes after one day. The second after six days. Each subsequent interval is the previous interval multiplied by the easiness factor. If the learner rates a card poorly, the cycle resets. SM-2 became the default algorithm in open-source flashcard software and remained dominant for over three decades. For a deeper look at how these scheduling algorithms evolved, see spaced repetition algorithms.

AlgorithmYearCore ApproachKey ParametersPrimary Use Case
IRT (Rasch/2PL/3PL)1960Logistic probability modelθ (ability), b (difficulty), a (discrimination), c (guessing)Computerized adaptive testing (GRE, GMAT, NCLEX)
Knowledge Space Theory1985Set-theoretic skill mappingKnowledge states, outer/inner fringeALEKS assessment and tutoring
Bayesian Knowledge Tracing1995Hidden Markov ModelP(L₀), P(T), P(S), P(G)Intelligent tutoring systems
SM-21987Fixed-rule spaced repetitionEasiness factor, interval, repetition countFlashcard scheduling
Deep Knowledge Tracing2015LSTM neural networkHidden state vectorLarge-scale student modeling
FSRS2022DSR memory model + optimizationDifficulty, Stability, Retrievability (17-21 params)Next-generation spaced repetition
Translucent geometric shapes glowing in blue, purple, green, and amber.

The Architecture of an Intelligent Tutor

Regardless of which algorithm powers them, adaptive learning systems share a common architecture. Since the late 1980s, researchers have described four interlocking components [20].

The first is the domain model, also called the expert model. This is a structured representation of everything the system can teach. It might be a graph of prerequisite relationships between algebra concepts, a taxonomy of medical knowledge, or a Q-matrix that maps test items to skills. The domain model is static. It only changes when the curriculum changes.

The second is the student model. This is a dynamic estimate of what the learner currently knows. In BKT, it is a set of skill probabilities. In IRT, it is a theta vector. In deep learning systems, it is a hidden state inside a neural network. The student model updates after every interaction, tracking the learner's trajectory through the domain.

The third is the pedagogical model, sometimes called the tutor model. This component decides what happens next. Should the system present a new concept? Give a hint? Repeat a failed item? Switch to easier material? Policies range from simple threshold rules ("if mastery probability exceeds 0.95, move to the next skill") to reinforcement learning agents that optimize long-term learning outcomes.

The fourth is the interface model. This determines how content is presented and how learner interactions are captured. Response times, error patterns, hint requests, and even mouse movements feed back into the student model.

These four components form a closed loop. The interface captures data. The student model updates. The pedagogical model selects an action from the domain model. The interface delivers it. And the cycle repeats.

Yes

No

Interface Captures Response

Student Model Updates

Mastery Reached?

Advance to New Skill

Select Review Item

Pedagogical Model Decides Next Action

Domain Model Provides Content

What does this mean in practice? Consider a student working through algebra problems. She answers a question about factoring quadratics correctly. The student model increases her estimated mastery of that skill from 0.72 to 0.81. The pedagogical model checks: is 0.81 above the threshold? Not yet. It selects another factoring problem, but one that is slightly harder. She gets it wrong. The model adjusts downward. A hint is offered. She tries again. The loop continues, every second, adapting to her changing knowledge state in real time.

Circular diagram of glowing cubes connected by light beams on navy background.

When Neural Networks Learned to Track Knowledge

In 2015, Chris Piech and colleagues at Stanford University published a paper that shook the field.

Traditional BKT required human experts to define every skill boundary and hand-tune four parameters per skill. Deep Knowledge Tracing (DKT) replaced the entire framework with a Long Short-Term Memory (LSTM) neural network [21]. The network takes a sequence of student interactions, correct and incorrect responses to specific problems, and produces a hidden state that implicitly represents what the student knows.

On the ASSISTments dataset, a large corpus of student math interactions, DKT achieved an AUC (area under the ROC curve) of 0.86. The best published BKT result on the same dataset was approximately 0.69 [22]. That is a 25% relative improvement. On Khan Academy data, DKT reached 0.85 versus BKT's 0.68.

The improvement was real. But it came with a cost. BKT's four parameters are interpretable. A teacher can look at them and understand what each means. DKT's hidden state is a black box. The network captures something about the student's knowledge, but nobody can say exactly what.

This sparked a wave of research. Zhang et al. (2017) introduced Dynamic Key-Value Memory Networks (DKVMN), which separated knowledge storage from knowledge retrieval. Pandey and Karypis (2019) built SAKT, the first self-attention knowledge tracing model. Ghosh, Heffernan, and Lan (2020) created AKT, which combined monotonic attention with Rasch-style embeddings to restore some interpretability [23]. Choi et al. (2020) built SAINT, a full Transformer encoder-decoder architecture.

But a surprising finding emerged along the way. Khajah, Lindsey, and Mozer (2016) showed that carefully tuned BKT extensions could narrow the gap with DKT substantially. Much of DKT's advantage came not from architectural superiority but from richer input features [24]. The simpler model, given the same information, performed comparably.

This debate, simple interpretable models versus powerful opaque ones, remains unresolved. And it matters far beyond computer science. When an algorithm decides what a student studies, and that decision affects test scores, grade promotion, and college admissions, the question of whether we can explain why it made that decision becomes urgent.

Abstract neural network with luminous nodes in warm amber and cool blue.

The Algorithm That Dethroned SM-2

For thirty-five years, Algorithm SM-2 was the default engine of spaced repetition. Then Jarrett Ye built something better.

Working with data from the MaiMemo language learning platform, Ye developed the Free Spaced Repetition Scheduler (FSRS) and published "A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling" at ACM SIGKDD in 2022 [25]. A companion paper in IEEE Transactions on Knowledge and Data Engineering described the underlying memory model in detail.

FSRS uses the DSR framework: Difficulty, Stability, and Retrievability. Retrievability follows a power-law forgetting curve rather than the exponential curve that Ebbinghaus proposed. Stability represents how resistant a memory is to forgetting, measured in days. Difficulty, clamped between 1 and 10, captures how hard a particular card is for a particular learner. The algorithm has 17 to 21 trainable parameters that are optimized from each user's own review history [26].

FSRS-4 was integrated into major open-source flashcard platforms beginning in late 2023. FSRS-5 added same-day review handling. FSRS-6, released in 2025, introduced forgetting-curve shape parameters and post-lapse stability modeling. Benchmarks on the open SRS-Benchmark dataset, containing approximately 1.7 billion reviews from real users, show FSRS variants consistently outperforming SM-2 by 20 to 30 percent in review efficiency for equivalent retention targets [27].

What does "20 to 30 percent fewer reviews" mean in practice? For a medical student reviewing 200 cards per day, it means finishing 40 to 60 cards sooner. Over a year, that is hundreds of hours saved. Not by studying less, but by studying smarter, reviewing each card at exactly the moment the algorithm predicts it is about to be forgotten.

Does Any of This Actually Work?

The question that matters most is also the hardest to answer. Do adaptive learning algorithms produce better learning outcomes than traditional instruction?

The evidence says yes. But the effect is smaller than enthusiasts claim.

Bloom's original 1984 finding of a two-sigma advantage for one-on-one tutoring has become a rallying cry for educational technology. But subsequent meta-analyses have not replicated that number for machine-based systems. Kurt VanLehn's 2011 review in Educational Psychologist found that intelligent tutoring systems produced effect sizes of approximately 0.76 for step-based tutoring and 0.40 for substep-based tutoring, comparable to but not exceeding human tutoring at approximately 0.79 [28].

Kulik and Fletcher (2016) reviewed 50 controlled evaluations and reported a median effect size of g = 0.66 [2]. Wang et al. (2024) found g = 0.70 for AI-enabled adaptive systems compared to non-adaptive controls [29].

These are medium-to-large effects by the standards of educational research. A gain of 0.66 standard deviations means the average student using an adaptive system performs as well as a student at the 75th percentile of a conventional classroom. Real. Meaningful. But not the two-sigma revolution Bloom imagined.

One of the largest randomized controlled trials came from the RAND Corporation. Pane et al. (2014) studied Carnegie Learning's Cognitive Tutor Algebra I across 73 high schools and 74 middle schools, involving roughly 25,500 students. In the second year, the Cognitive Tutor group scored approximately 0.20 standard deviations higher on algebra assessments, equivalent to about eight percentile points [30].

Eight percentile points may not sound dramatic. But scale matters. Applied to millions of students, a consistent eight-percentile-point gain changes educational trajectories.

Balanced scale with textbooks and glowing tablet on marble surface.

The Bias Problem Nobody Wants to Talk About

Adaptive learning algorithms learn from data. And data reflects the world it comes from, including its inequities.

Ryan Baker and Aaron Hawn published a systematic review in 2021 documenting biases across race, gender, language, nationality, socioeconomic status, and disability in educational AI systems [31]. BKT parameter estimates have shown demographic disparities. "Wheel-spinning" detectors, algorithms that identify students stuck without making progress, have exhibited different accuracy rates across racial groups.

The problem runs deeper than calibration errors. Many adaptive systems are trained primarily on data from American K-12 mathematics classrooms. Whether their models generalize to other cultures, languages, and disciplines is an open empirical question. Kizilcec and Lee (2022) reviewed statistical, similarity-based, and causal fairness criteria and found that no single metric satisfies all desiderata simultaneously [32]. This is not a fixable bug. It is a structural property of fairness itself, formalized in impossibility theorems.

Privacy compounds the problem. Adaptive systems require detailed behavioral data: every click, every pause, every error pattern. Under regulations like GDPR, FERPA, and CCPA, collecting this data from minors raises legal and ethical questions that the field has only begun to address. Differential privacy frameworks offer one path forward [33], but they trade accuracy for privacy. The tension is real and unlikely to disappear.

Then there is the cold-start problem. A new student enters the system with no history. A new item enters the curriculum with no interaction data. The algorithm has nothing to work with. Solutions include content-based bootstrapping, demographic priors, and active-learning probes, but all involve assumptions that can themselves introduce bias [34].

None of these challenges are reasons to abandon adaptive learning. They are reasons to build it more carefully.

The New Frontier: When Language Models Meet Learning Science

The most recent chapter in this story is being written now, and it involves large language models.

In 2025, Zhao published LECTOR (LLM-Enhanced Concept-based Test-Oriented Repetition), a system that combines spaced repetition scheduling with semantic understanding from language models. In simulations across 100 learners over 100 days, LECTOR achieved a 90.2% success rate against 88.4% for the best previous baseline [35]. The system does not just decide when to review a card. It understands the conceptual relationships between cards and schedules related concepts together.

Frameworks like SP-TeachLLM (Wang et al., 2025) and the Socratic Playground (Hu et al., 2025) represent a broader trend: combining knowledge tracing with conversational tutoring [36]. Instead of presenting a static flashcard, the system can generate new questions on the fly, offer natural-language explanations tailored to the learner's specific misconceptions, and engage in Socratic dialogue.

Deng et al. (2024) reviewed 55 studies on LLM-based personalized learning published between 2020 and 2024, reporting improvements in academic performance, motivational states, and higher-order thinking, particularly at the university level [37].

But caution is warranted. LECTOR's results come from simulated learners, not real classrooms. Language models hallucinate facts. They can generate plausible-sounding explanations that are scientifically wrong. And deploying them in educational settings with children raises safety, privacy, and consent questions that existing regulatory frameworks were not designed to address.

The promise is real. A tutor that can explain calculus in plain language, adapt its examples to each student's interests, and schedule reviews at mathematically optimal intervals would be a genuine breakthrough. But the gap between simulation results and field-validated deployment remains large.

Futuristic library with glowing spiral bookshelves and ethereal fog.

Open Learner Models: Letting Students See Inside the Machine

One of the most promising recent developments is also one of the simplest. What if the student could see what the algorithm thinks they know?

Open Learner Models (OLMs) make the system's internal beliefs about the learner visible and interactive. Long and Aleven (2017) tested an OLM with 302 seventh and eighth graders learning equation-solving. Students who could see and interact with the model's estimate of their skills showed improved learning outcomes and better self-regulation [38].

The principle is metacognitive. When a student sees that the system rates their factoring skill at 62% but their graphing skill at 89%, they gain information that helps them allocate study time more effectively. The algorithm is not just deciding for the learner. It is collaborating with the learner.

This approach addresses the black-box problem from a different angle. Instead of making the algorithm interpretable to computer scientists, it makes the algorithm's output interpretable to the person who matters most: the student.

What Comes Next

The story of adaptive learning algorithms is far from over. Several open questions will shape the next decade.

Can adaptive systems close equity gaps, or will they widen them? The answer depends on whether developers prioritize inclusive training data, culturally responsive design, and equitable access to the hardware and connectivity these systems require.

Will the interpretability problem be solved? Neural-symbolic hybrids like AKT, which combine attention mechanisms with Rasch-style embeddings, suggest a middle path between raw prediction accuracy and human understanding [39]. But the fundamental tension between model complexity and transparency is unlikely to disappear.

Can LLM-based tutors be made reliable enough for unsupervised use? The hallucination problem is not a minor glitch. In education, a confidently wrong explanation can create misconceptions that persist for years. Retrieval-augmented generation (RAG) and grounding techniques help, but they do not eliminate the risk.

And perhaps the deepest question: will adaptive algorithms change what it means to learn? When a system knows exactly what you are about to forget and schedules your review with millisecond precision, the experience of struggle, the feeling of reaching for a memory and not quite grasping it, changes fundamentally. Some researchers argue that this productive struggle is itself essential to deep learning [9]. Remove too much difficulty, and you may remove the conditions that produce the strongest memories.

The biological mechanism behind this, long-term potentiation, requires effort to trigger. Algorithms that make learning too effortless may paradoxically undermine the neural processes they are trying to support.

Crossroads in an abstract landscape symbolizing future choices in adaptive learning.

Conclusion

Adaptive learning algorithms have come a long way from Pressey's mechanical quiz box. They now draw on a century of cognitive science, from the forgetting curve to working memory limits to the spacing and testing effects. They encode four distinct mathematical traditions into systems that can track a student's knowledge state across thousands of skills and schedule reviews with precision that no human tutor could match.

The evidence says they work. Not with the magical two-sigma effect Bloom imagined, but with consistent, replicable gains in the range of 0.4 to 0.7 standard deviations. Enough to matter. Enough to change outcomes for millions of students.

But they are not neutral instruments. They carry the biases of their training data. They raise privacy concerns that regulations have not yet caught up with. And as they grow more powerful, the question of who controls the learning process, the learner, the teacher, or the algorithm, becomes increasingly urgent.

The next generation of adaptive learning, powered by large language models and informed by decades of cognitive science, has the potential to bring personalized education closer to Bloom's vision than ever before. Whether it will do so equitably, transparently, and in service of genuine understanding rather than mere performance metrics, is a question that algorithms alone cannot answer.

Frequently Asked Questions

What are adaptive learning algorithms?

Adaptive learning algorithms are computational systems that adjust educational content, sequence, and timing based on individual learner performance. They use data from student interactions to estimate knowledge states and select optimal study activities, drawing on techniques from statistics, machine learning, and cognitive science to personalize instruction automatically.

How does Bayesian Knowledge Tracing work?

Bayesian Knowledge Tracing models each skill as a hidden binary variable with four parameters: prior knowledge probability, learning rate, slip probability, and guess probability. After each student response, the model applies Bayes' rule to update its estimate of whether the student has mastered the skill, enabling real-time tracking of learning progress.

What is the difference between IRT and BKT?

Item Response Theory estimates a learner's overall ability on a continuous scale and selects test items that provide maximum measurement information at that ability level. Bayesian Knowledge Tracing tracks mastery of individual skills over time as a binary variable. IRT is primarily used for assessment, while BKT is used for instructional sequencing.

What is FSRS and how does it improve on SM-2?

FSRS (Free Spaced Repetition Scheduler) is a modern spaced repetition algorithm that uses a Difficulty-Stability-Retrievability memory model with 17 to 21 trainable parameters optimized from individual review histories. Benchmarks show it reduces the number of reviews needed for equivalent retention by 20 to 30 percent compared to the older SM-2 algorithm.

Can adaptive learning algorithms replace human teachers?

Current evidence suggests adaptive learning algorithms complement rather than replace human teachers. Meta-analyses show intelligent tutoring systems produce learning gains comparable to human tutoring in structured domains like mathematics, but teachers provide social-emotional support, motivation, and contextual judgment that algorithms cannot replicate. The most effective implementations combine both.