Introduction

A math teacher in Tampa, Florida, handed her seventh graders a worksheet. Half the class got problems grouped by type. All the graph problems first. Then all the slope problems. Then all the fraction problems. The other half got the same problems, but shuffled. Graph, slope, fraction, graph, fraction, slope. No pattern. No warning about what was coming next.

The grouped-practice students finished faster. They felt more confident. They were sure they had learned more. One month later, a surprise test proved them wrong. The shuffled-practice students scored 61 percent. The grouped students scored 37 percent [1]. That is not a small gap. That is almost double the score. And the students who performed better were the ones who had reported the practice feeling harder and more confusing during the learning phase.

This is the story of interleaving vs blocking. Two ways to arrange practice that look nearly identical on paper but produce dramatically different results in the brain. Blocking means practicing one skill at a time before moving to the next: AAA-BBB-CCC. Interleaving means mixing related skills together: ABC-BCA-CAB. The total amount of practice stays the same. The total time stays the same. Only the sequence changes. And that change, as four decades of research now show, rewires how the brain encodes, stores, and retrieves knowledge [2].

Study desk flat lay with colorful index cards arranged by blocking and interleaving.

The Experiment That Started It All

The year was 1979. John Shea and Robyn Morgan, working at the University of Colorado, set up a simple motor task. Participants sat in front of a board with three small barriers arranged in different patterns. Their job was to knock down the barriers as fast as possible using a tennis ball, following a specific sequence for each pattern. Three patterns. Three different movement sequences [3].

One group practiced in blocks. Pattern A fifteen times, then pattern B fifteen times, then pattern C fifteen times. The other group practiced in random order. They never knew which pattern was coming next.

During practice, the blocked group was clearly faster. They settled into a rhythm. Each repetition felt smoother than the last. The random group stumbled. They made more errors. Their response times were slower. By every visible measure, blocking was winning.

Then came the retention test. Ten minutes later, the random group was faster. Ten days later, the gap had widened. And on a transfer test where participants had to perform a new, more complex pattern they had never seen before, the random group crushed the blocked group [3].

Shea and Morgan called it the contextual interference effect. The idea was simple but radical: interference during practice, the very thing that makes learning feel harder, actually makes the resulting knowledge stronger and more flexible. They had stumbled onto something that would take the next forty years to fully understand.

But the seeds had been planted even earlier. William Battig, a cognitive psychologist at the University of Colorado, had observed a similar pattern in verbal learning tasks during the late 1960s and 1970s [2]. He noticed that conditions producing the most errors during acquisition often produced the best long-term retention. Battig called this "intratask interference" and argued it could actually help learning, not hurt it. Almost nobody listened. The idea that difficulty could be beneficial ran against every intuition teachers and coaches held about good instruction.

1966
Battig proposes intratask interference aids learning
1979
Shea and Morgan discover contextual interference effect
1986
Goode and Magill replicate in badminton serving
1994
Hall et al. confirm with collegiate baseball players
2008
Kornell and Bjork show effect in visual category learning
2015
Rohrer et al. demonstrate in math classrooms
2019
Brunmair and Richter publish definitive meta-analysis
2020
Rohrer et al. RCT with 787 students confirms large effect
2021
Samani and Pan extend to undergraduate physics
2023
Park et al. link executive function to interleaving benefit

Why Blocking Feels Right But Goes Wrong

Here is the paradox that makes interleaving so hard to accept. Blocking feels better. Every measure a learner can self-observe during practice says blocking is working. The problems get easier. Speed increases. Errors decrease. Confidence rises.

But this feeling is a lie.

Cognitive psychologists call it the illusion of fluency. When the same type of problem repeats over and over, the brain does not need to retrieve the solution strategy from long-term memory. It just keeps the strategy loaded in working memory and applies it mechanically. The third algebra equation in a row requires almost no thought if you just solved two identical ones. You are not learning. You are repeating [4].

Nate Kornell and Robert Bjork at UCLA demonstrated this beautifully in 2008. They showed participants paintings by twelve obscure artists, six paintings each. Half the artists were studied in blocks (all six paintings by one artist, then all six by the next). The other half were interleaved (one painting per artist, cycling through). On a transfer test with new paintings by the same artists, interleaved study produced about 65 percent accuracy versus 50 percent for blocked study [5].

The shocking part was not the test result. It was what happened when participants were asked which method they thought worked better. Seventy-eight percent said blocking. Even after seeing their own scores. Even after the evidence was staring them in the face. The feeling of fluency during blocked practice was so strong that it overrode direct evidence of failure [5].

Verena Yan, Elizabeth Bjork, and Robert Bjork dug deeper into this metacognitive illusion in 2016. They identified two forces keeping learners trapped. First, the raw feeling of ease during blocked practice gets misread as a signal of learning. Second, people arrive with a prior belief that one-topic-at-a-time is the correct approach. Even when researchers tried to debias participants with explanations, the illusion barely budged. Only showing participants their own test scores, side by side, produced any change in belief [6].

A 2022 survey by Hartwig, Rohrer, and Dedrick confirmed this at scale. When 368 university students were asked to design their own math study schedules, they systematically chose blocking over interleaving. They did not merely fail to appreciate interleaving. They actively avoided it [7].

What does this mean practically? It means that offering interleaving as an option will not lead to its adoption. Students will not choose it for themselves. It must be built into the structure of assignments, worksheets, and review schedules by teachers and curriculum designers. The brain's own feedback system cannot be trusted on this question.

Smooth path with uniform flowers vs. winding rocky trail through diverse terrain.

Three Theories for Why Mixing Works

For decades, two competing theories tried to explain the contextual interference effect. In recent years, a third theory has unified much of the debate.

The first theory is the elaboration and distinctiveness hypothesis, proposed by Shea, Zimny, and Hunt in the early 1980s. Their argument: when multiple tasks sit together in working memory during interleaved practice, learners can compare and contrast them. This produces richer, more distinctive memory traces. Blocking keeps only one task active at a time, so no comparison happens [2].

The second theory is the forgetting-reconstruction hypothesis, introduced by Timothy Lee and Robert Magill in 1983. Their argument: random practice forces the learner to forget the action plan for each task because intervening trials displace it from working memory. On the next trial of that task, the learner must reconstruct the plan from scratch. This repeated cycle of forgetting and rebuilding strengthens the long-term memory trace. Blocked practice, by contrast, lets the plan sit in working memory for multiple consecutive trials, requiring no reconstruction at all [8].

The third and most recent theory is the Sequential Attention Theory, developed by Paulo Carvalho and Robert Goldstone at Indiana University, beginning in 2014. This framework makes a prediction neither of the older theories could match. Carvalho and Goldstone proposed that when two consecutive items belong to the same category (blocking), attention shifts toward the similarities between them. When two consecutive items belong to different categories (interleaving), attention shifts toward the differences [9].

This sounds abstract, so consider a concrete example. Imagine learning to distinguish three types of rocks: igneous, sedimentary, and metamorphic. If you study three igneous rocks in a row (blocking), your brain naturally focuses on what they share: crystalline structure, volcanic origin, glassy texture. If you study an igneous rock, then a sedimentary rock, then a metamorphic rock (interleaving), your brain focuses on what makes each one different: this one has crystals, that one has layers, this one has folded bands.

Whether interleaving helps or blocking helps therefore depends on what the learning task requires. If the challenge is spotting differences between categories (like telling apart similar-looking rocks), interleaving wins. If the challenge is spotting similarities within a category (like learning what all igneous rocks share), blocking wins [9].

Carvalho and Goldstone tested this across multiple experiments and formalized it into a computational model called SAT-M in 2022. The model fit empirical data remarkably well [10]. It provided, for the first time, a principled way to predict when each method would be more effective.

Conceptual diagram of brain structures highlighting similarities and differences.

Inside the Brain During Interleaved Practice

Until 2011, the evidence for interleaving was entirely behavioral. Researchers knew that mixing tasks produced better retention and transfer. They did not know what was happening inside the brain to cause this.

That changed when Chien-Ho Janice Lin and her colleagues at UCLA published a landmark study combining functional magnetic resonance imaging with motor sequence learning [11]. They scanned participants' brains while they learned finger movement sequences in either blocked or random order.

The results were striking. During acquisition, interleaved practice recruited the dorsolateral prefrontal cortex (DLPFC, a region behind the forehead involved in planning and executive control), the premotor cortex (which prepares movement plans), and parietal regions (which integrate sensory information) far more strongly than blocked practice. The brain was working harder. Much harder.

But here is where it gets interesting. On a retention test 72 hours later, those same regions showed less activation in the interleaved group. The brain had become more efficient. It needed less effort to perform the same task. This pattern, higher activation during learning but lower activation during recall, is what neuroscientists call a neural efficiency signature. It is a hallmark of genuine, deep learning [11].

A follow-up study by Lin and colleagues in 2013, published in Human Brain Mapping, went further. Using psychophysiological interaction analysis (a method that measures how strongly different brain regions communicate with each other), they found that interleaved practice produced sustained functional connectivity between the DLPFC and premotor areas for up to 72 hours after practice ended [12]. Blocked practice did not produce this sustained coupling. The brain regions that worked together during interleaved learning stayed connected long after the practice session ended, as if the neural circuit had been permanently strengthened.

In 2018, the same group published a third study, this time using resting-state fMRI to measure what happens in the brain during the hours after practice, during memory consolidation. They found that interleaved practice on Day 1 led to greater functional connectivity within fronto-parietal networks during rest and sleep, and this connectivity predicted how well participants performed on a retention test two days later [13].

The average response times tell the story clearly. During practice, blocked learners averaged 962 milliseconds per response while interleaved learners averaged 1,191 milliseconds, a statistically significant difference (t(25) = 5.86, p = 0.000004). The interleaved group was slower during practice. But they were faster at retention [13].

Causal evidence came from transcranial magnetic stimulation studies. Kantak and colleagues showed in 2010 that disrupting the DLPFC with repetitive TMS immediately after random practice erased the retention benefit. Disrupting primary motor cortex (M1) had no effect on random-practice retention but did impair blocked-practice retention [11]. This double dissociation revealed that interleaving and blocking engage fundamentally different neural circuits. Blocking relies on motor cortex for rote repetition. Interleaving recruits prefrontal executive control for strategic, flexible learning.

Stylized transparent brain highlighting prefrontal, premotor, and parietal regions.

The Classroom Evidence

Laboratory experiments with college students learning to knock down barriers are one thing. Does interleaving work where it matters most? In real classrooms, with real students, over real semesters?

Doug Rohrer at the University of South Florida has spent two decades answering this question, and the answer is a resounding yes.

In 2015, Rohrer, Dedrick, and Stershic published the first study to test interleaving in actual middle school classrooms [14]. They worked with 126 seventh graders in Tampa, Florida. Over three months, students received ten worksheets containing problems on slope, graphs, and fractions. Half the class got blocked worksheets (all slope problems together, then all graph problems together). The other half got interleaved worksheets (problems of all types mixed together).

The test results at two time points told a clear story:

Test TimingInterleaved Group ScoreBlocked Group ScoreCohen's d Effect Size
1-day delay80%64%0.42
30-day delay74%42%0.79

The blocked group's scores collapsed from 64 percent to 42 percent over thirty days, a 34 percent relative decline. The interleaved group dropped only from 80 percent to 74 percent, an 8 percent relative decline [14]. Interleaving did not just produce higher scores. It produced scores that lasted.

In 2020, Rohrer returned with something bigger: a pre-registered, randomized controlled trial across 54 seventh-grade classes in five schools, involving 787 students [1]. This was not a convenience sample. It was a proper clinical trial of a teaching method. On the one-month delayed test, interleaved students scored 61 percent versus 37 percent for the blocked group. The effect size was Cohen's d = 0.83, with a 95 percent confidence interval of 0.68 to 0.97 [1]. In educational research, anything above d = 0.40 is considered a large effect. This was double that threshold.

What makes these mathematics results particularly significant is the reason why interleaving helps in math. As Rohrer has argued since 2012, most math assignments present a block of problems all requiring the same procedure. Students do not need to figure out which procedure to use because the assignment title tells them. A worksheet called "Quadratic Formula Practice" does not require students to recognize a quadratic when they see one [15]. Interleaving forces students to first identify the problem type, then select the appropriate strategy, and only then execute the procedure. This trains the very skill that matters on a cumulative exam, where every problem type appears unpredictably.

Beyond Math: Sports, Music, Medicine, and Physics

The interleaving effect is not limited to mathematics. It has been replicated across a remarkably wide range of domains.

In sports, the evidence began accumulating shortly after Shea and Morgan's original 1979 study. Goode and Magill tested badminton players in 1986, having thirty undergraduates learn three different serves over three weeks. The random-practice group outperformed the blocked group on both retention and transfer tests [16]. Hall, Domingues, and Cavazos took it to the collegiate level in 1994, giving thirty baseball players twelve extra batting practice sessions with fastballs, curves, and change-ups. The randomly-ordered group hit better on every test measure [17].

A 2024 meta-analysis by Czyż, Wójcik, and Solarská examined 54 studies of contextual interference in motor learning. They found a medium effect (SMD = 0.54) in adults, and an even larger effect in older adults (SMD = 1.28) [18]. The finding that older learners benefit more from interleaving is intriguing and not yet fully explained.

In physics, Samani and Pan published a striking result in 2021 in npj Science of Learning. Two lecture sections of an introductory physics course completed thrice-weekly homework over eight weeks. One section got blocked homework. The other got interleaved homework. On surprise transfer tests administered after the course, the interleaved group showed median improvements of 50 percent on the first test and 125 percent on the second test, relative to the blocked group [19]. And true to the metacognitive illusion, students in the interleaved group rated their homework as harder and less effective than the blocked group did.

In medical education, interleaving has shown benefits for diagnostic skills. A 2003 study found that interleaved practice led to more accurate electrocardiogram interpretations than blocked practice [20]. The effect makes intuitive sense: doctors in emergency rooms do not encounter patients sorted by disease type. Every patient is a new diagnostic puzzle, and the ability to discriminate between similar-looking conditions is exactly what interleaving trains.

In foreign language learning, results have been more mixed. Pan, Tajran, Lovelett, Osuna, and Rickard found in 2019 that single-session interleaving of Spanish verb conjugations did not beat blocking, but when practice was distributed across two weekly sessions, interleaving produced substantially better one-week retention [21]. The lesson: interleaving in language learning may require some initial familiarity before it kicks in.

The Meta-Analysis That Settled the Debate

Individual studies can be cherry-picked. Meta-analyses cannot. In 2019, Elisabeth Brunmair and Tobias Richter at the University of Würzburg published the most rigorous meta-analysis of interleaved learning to date in Psychological Bulletin, the field's most prestigious review journal [2].

They analyzed 59 studies containing 238 effect sizes from 158 independent samples. Their overall finding: interleaving produces a moderate benefit over blocking, with an average Hedges' g of 0.42. But the headline number hides important variation. The effect depends heavily on what is being learned.

Interleaving Effect Size by Domain (Brunmair & Richter, 2019)PaintingsMathArtificialTextsWords0.80.70.60.50.40.30.20.10-0.1-0.2-0.3-0.4-0.5Hedges' g

Paintings produced the largest effect (g = 0.67). This makes sense. Telling apart artistic styles requires precisely the kind of between-category discrimination that interleaving trains. Mathematics showed a solid effect (g = 0.34). Artificial visual stimuli like butterfly wings and geometric shapes fell in a similar range (g = 0.36). Expository texts showed essentially no effect (g = 0.01). And word lists actually reversed: blocking was better (g = -0.39) [2].

The key moderator? Similarity. When categories were highly similar, interleaving helped the most. When categories were dissimilar, blocking helped more. This aligns perfectly with Carvalho and Goldstone's Sequential Attention Theory: interleaving helps learners spot fine differences between confusable categories, but when categories are already distinct, blocking is better for noticing internal structure [9].

One sobering detail: heterogeneity was substantial, with I-squared = 77 percent. This means that most of the variance across studies reflects genuine differences, not random sampling noise. Implementation matters. The way interleaving is structured, how similar the mixed categories are, how much prior knowledge learners have, all of these factors shift the outcome [2].

Yan, Sana, and Carvalho synthesized the practical implications in a 2024 policy paper. They noted that the average g = 0.42 translates roughly to moving a student from the 50th percentile to the 66th percentile. Not magic. But meaningful [22].

Abstract data visualization with colorful vertical bars on a dark surface.

When Blocking Wins

Interleaving is not universally superior. The research identifies several clear boundary conditions where blocking outperforms interleaving, and ignoring these leads to bad advice.

The first and most well-documented boundary is category similarity. Carvalho and Goldstone's 2014 experiments showed directly that low-similarity categories are better learned through blocking [9]. When categories do not share overlapping features, there is little to discriminate, and interleaving provides no benefit. Blocking, by contrast, helps learners build strong representations of what defines each category.

The second boundary is learner expertise. Complete beginners who lack the foundational knowledge to make sense of any category may be overwhelmed by the rapid switching that interleaving demands. The medical education literature suggests a "block first, then interleave" approach: let learners acquire basic procedures in short blocks, then switch to interleaved practice once fundamentals are in place [20].

The third boundary involves learning strategy. Little and Nepangue found in 2025 that when learners adopt a rule-extraction strategy (trying to discover explicit rules that define a category), blocking outperforms interleaving. When learners adopt a memory-based, similarity-matching strategy, interleaving wins [23]. The optimal schedule depends on how the learner approaches the task.

The fourth boundary is assessment timing. If the test happens immediately after practice, blocking often wins. The performance advantage of blocked practice during and immediately after learning is real. It only reverses after a retention interval of at least one day, and the interleaving advantage grows with longer delays [4]. This is the central point of Soderstrom and Bjork's influential 2015 paper on "learning versus performance." Performance during training and actual learning are different things. They often move in opposite directions.

The fifth boundary applies to word lists and vocabulary, where Brunmair and Richter found a clear blocking advantage (g = -0.39). Learning isolated word pairs lacks the between-category discrimination component that interleaving trains. There are no "categories" to compare. Each item stands alone [2].

Fork in a trail: forest path vs. open terrain landmarks.

Executive Function: The Engine Behind the Effect

A 2023 study by Park, Varma, and Varma at the University of Minnesota added an important piece to the puzzle. They taught eighth graders about three types of rocks (igneous, sedimentary, metamorphic) using either blocked or interleaved instruction, then measured the students' executive function abilities [24].

Executive function is a family of cognitive abilities managed by the prefrontal cortex. It includes three core components. Shifting is the ability to switch flexibly between tasks or mental sets. Inhibition is the ability to suppress automatic responses. Working memory updating is the ability to hold and manipulate information.

Park and colleagues found that interleaved instruction produced better recognition memory at a two-week delay, replicating the standard finding. But the new discovery was this: shifting and inhibition abilities specifically predicted how much students benefited from interleaving. Students with stronger executive function gained more from interleaved instruction. Students with weaker executive function gained less [24]. Neither ability predicted learning under blocked instruction.

This result connects directly to the neuroimaging evidence. Interleaving recruits the DLPFC and fronto-parietal networks, the same circuits that support executive function [11]. If a student's prefrontal executive systems are not yet mature (as in young children) or are temporarily depleted (as in exhausted students), interleaving may be less effective or even counterproductive. This helps explain why some studies with very young learners or with highly complex unfamiliar material find reduced or absent interleaving benefits.

Pan and colleagues extended this line of inquiry in 2025, showing that fluid intelligence moderates the perceptual interleaving effect [25]. Higher fluid intelligence predicted greater interleaving benefits. The picture emerging is that interleaving works partly because it offloads strategic processing to the learner, requiring them to deploy executive control to manage the complexity. Learners with stronger cognitive resources extract more benefit.

What does this mean for education? It suggests that interleaving should be introduced gradually. Start with short blocks to build foundational understanding, then progressively increase the degree of mixing as learners gain confidence and cognitive capacity. A sudden switch from fully blocked to fully interleaved may overwhelm weaker students.

Overhead view of a steampunk control panel with levers and dials.

Desirable Difficulties: The Bigger Picture

Interleaving does not exist in isolation. It belongs to a family of learning strategies that Robert Bjork at UCLA calls desirable difficulties, a concept he introduced in 1994 [3].

The core idea is counterintuitive but well-supported: conditions that slow down or impair performance during training often enhance long-term retention and transfer. Spacing practice over time instead of massing it. Testing yourself instead of re-reading. Generating answers instead of looking them up. Varying the conditions of practice. And yes, interleaving. All of these share a common feature: they make practice feel harder and less productive while actually making learning deeper and more durable [4].

Soderstrom and Bjork formalized this in their 2015 paper, "Learning Versus Performance," published in Perspectives on Psychological Science. Performance is what we observe during training. Learning is the relatively permanent change in knowledge that we can only measure later, through retention and transfer tests. The two often dissociate. Conditions that maximize performance (like blocking and massing) minimize learning. Conditions that maximize learning (like interleaving and spacing) minimize performance [4].

Interleaving has a particularly close relationship with two other desirable difficulties: spaced repetition and active recall. Spacing distributes practice over time. Interleaving distributes practice over topics. The two are structurally linked because mixing topics within a session necessarily spaces out repetitions of any single topic. But they are not identical. Birnbaum, Kornell, Elizabeth Bjork, and Robert Bjork demonstrated in 2013 that the interleaving benefit depends specifically on category juxtaposition (placing different categories next to each other), not on temporal spacing alone. When they added pure temporal spacing without juxtaposition, the benefit disappeared [26].

Taylor and Rohrer confirmed this independence in 2010 by using filler tasks to equate spacing across blocked and interleaved conditions. Even with spacing held constant, interleaving nearly doubled accuracy (77 percent versus 38 percent) on a delayed test of fourth-grade geometry [27].

The practical implication is that an optimal learning schedule combines all three: spaced repetition to determine when to review, interleaving to determine what to mix together, and active recall to determine how to practice. Large-scale reviews by Dunlosky and colleagues have rated distributed practice and retrieval practice as the only two "high utility" learning techniques out of ten evaluated [28]. Interleaving amplifies both.

Desirable DifficultyWhat It ControlsKey MechanismTypical Effect Size
Spaced repetitionWhen to reviewForgetting and reconsolidationd = 0.42 to 0.80
InterleavingWhat to mix during practiceDiscriminative contrast and strategy selectiong = 0.42 (meta-analytic average)
Retrieval practiceHow to studyMemory retrieval strengthens storaged = 0.50 to 0.70
GenerationActive production vs passive readingEffort during encodingd = 0.30 to 0.50

The connection to spaced repetition systems is direct. When a scheduling algorithm mixes cards from different topics together at each review session, it is applying interleaving on top of spacing. The combination produces some of the largest effect sizes in the educational psychology literature, often exceeding d = 0.70 even in conservative classroom designs.

Four colorful puzzle pieces representing desirable difficulties on a wooden desk.

The Emerging Frontier: Digital Learning and AI

The most active area of interleaving research in 2024 through 2026 focuses on digital learning environments, where practice schedules can be personalized algorithmically.

Li, Liu, Xu, and Yi published a major 2024 study in MIS Quarterly examining interleaved design in e-learning platforms. Their two-month field experiment showed that "related interleaving" (mixing topics that share underlying structure) outperformed both no-interleaving and unrelated interleaving, and the effect was strongest for weaker learners [29]. This finding is significant because it suggests that not all interleaving is created equal. Mixing structurally related topics produces better results than random mixing, likely because it gives learners something meaningful to compare and contrast.

Sana and Yan demonstrated in 2022 that combining interleaving with retrieval practice in high school science produced benefits for both memory and transfer [30]. Their study with 155 ninth-to-twelfth grade students used weekly quizzes that were either blocked or interleaved. The interleaved retrieval group outperformed both the blocked retrieval group and the unquizzed control.

An emerging line of work examines how note-taking interacts with interleaving. Little, Fealy, Kobayashi, and Roth found in 2025 that note-taking can modulate the interleaving advantage, possibly by reducing the cognitive demand of switching between topics [31].

EEG studies are also advancing. A 2024 preprint from researchers using high-density electroencephalography found distinct neural signatures of blocked versus interleaved practice in alpha and theta band oscillations, providing a new window into the real-time neural dynamics of contextual interference [32].

These developments suggest that the future of interleaving research lies not in asking "does it work?" but in asking "how should it be implemented in specific contexts?" The answer increasingly involves adaptive algorithms that adjust the degree and type of interleaving based on individual learner characteristics, content similarity, and performance history.

Futuristic tablet on desk displaying glowing abstract card shapes.

What the Critics Say

No scientific finding should be accepted without scrutiny, and interleaving has received its share.

The first concern is ecological validity. Most interleaving studies compare carefully controlled laboratory conditions. Real classrooms are noisier. Teachers have limited time. Students have varying abilities. Rohrer's 2015 and 2020 classroom studies addressed this, but they remain among the few to do so with methodological rigor [1].

The second concern is confounding with spacing. As noted earlier, interleaved schedules inherently space out repetitions of each topic. Some critics argue that the "interleaving effect" is partly or largely a spacing effect in disguise. Birnbaum et al. (2013) and Taylor and Rohrer (2010) provide evidence against this interpretation, but the concern has not been fully resolved [26]. Brunmair and Richter acknowledge this limitation in their meta-analysis.

The third concern is domain specificity. With g = -0.39 for word lists and near-zero effects for expository texts, interleaving clearly does not transfer to all learning tasks [2]. Advocates who claim interleaving is "always better" are overstating the evidence.

The fourth concern is cognitive load. Chen, Paas, and Sweller argued in a 2021 Educational Psychology Review paper that interleaving may impose excessive extraneous cognitive load on novice learners, interfering with schema construction rather than aiding it [33]. From a cognitive load theory perspective, the benefit of interleaving depends on the learner having sufficient prior knowledge to manage the switching demands. Without that foundation, interleaving adds noise rather than signal.

The fifth concern is sample characteristics. The fMRI literature on interleaving uses modest sample sizes (typically n = 20 to 30) and motor tasks. How cleanly these neural findings translate to abstract conceptual learning remains an open question [11].

Honest science requires honest acknowledgment of these limitations. The interleaving effect is real, replicated, and practically significant. But it is not a universal law. It is a conditional principle: when categories are similar, when learners have basic familiarity, when the test requires discrimination, and when performance is measured after a delay, interleaving beats blocking. Change any of those conditions and the advantage may shrink, vanish, or reverse.

Scientific desk with magnifying glass over abstract data chart.

Practical Principles for Effective Interleaving

The research points to several principles for anyone designing study sessions, training programs, or curricula.

First, mix related topics, not random ones. Interleaving works best when the mixed categories share enough similarity to create productive confusion. Mixing algebra with geometry makes sense. Mixing algebra with medieval history does not.

Second, expect it to feel wrong. The illusion of fluency during blocked practice is strong. Students will prefer blocking and believe it works better. This is normal. It does not mean the interleaving is failing.

Third, start with short blocks, then interleave. Absolute beginners need some initial focused instruction to build a mental framework. Once basic procedures are understood, switch to mixed practice. The transition can be gradual.

Fourth, use interleaving specifically for discrimination training. If the learning goal is to tell apart confusable concepts, procedures, or categories, interleaving is the tool. If the goal is to understand the deep internal structure of a single concept, blocking may serve better initially.

Fifth, combine interleaving with spaced practice and retrieval. The three desirable difficulties are stronger together. A review session that mixes old and new material from multiple topics, presented as self-testing rather than re-reading, and scheduled at expanding intervals, hits all three.

Sixth, build interleaving into the structure, not into learner choice. Students will not choose it voluntarily. Worksheets, homework, and review sessions should be interleaved by default.

Seventh, test after a delay. If the only assessment happens immediately after practice, blocking will look better. The interleaving advantage appears on delayed tests. Cumulative final exams, board exams, and real-world performance all fall into this category.

Colorful geometric shapes arranged in a spiral on a white surface.

Conclusion

The research record spanning nearly five decades tells a consistent story. When learners practice one skill until it feels mastered before moving to the next, they build an illusion of competence that crumbles under the pressure of delayed tests and real-world application. When learners mix related skills together during practice, they build flexible knowledge that transfers and lasts.

The meta-analytic evidence puts a number on this: a moderate overall advantage (g = 0.42), with effects ranging from large in visual discrimination tasks to null or negative in word-list learning. The neuroscience reveals a mechanism: interleaving recruits prefrontal executive networks that build stronger, more efficient neural representations [11]. The classroom trials confirm practical significance: nearly double the scores on delayed math tests [1]. And the boundary conditions are increasingly well-mapped: high category similarity, sufficient prior knowledge, discrimination-focused learning goals, and delayed assessment all predict when interleaving will shine.

The biggest obstacle to adoption is not the evidence. It is human psychology. Learners prefer blocking. Teachers design blocked curricula. Textbooks present one chapter at a time. The feeling of fluency during massed practice is too comfortable to give up voluntarily. But the science is clear. The brain learns better when it struggles, and interleaving is one of the most reliable ways to create the right kind of struggle.

Frequently Asked Questions

What is the difference between interleaving and blocking in studying?

Blocking means practicing one topic or problem type repeatedly before moving to the next (AAA-BBB-CCC). Interleaving means mixing different topics within a single practice session (ABC-BCA-CAB). Research shows interleaving produces better long-term retention and transfer, even though blocking feels easier during practice.

Does interleaving work for all subjects?

No. A 2019 meta-analysis found interleaving works best for visually similar categories (like art styles or similar-looking scientific specimens) and mathematics. It shows little or no benefit for expository texts, and blocking actually works better for memorizing word lists. The effect depends on whether the learning task requires distinguishing between similar categories.

Why does interleaving feel harder than blocking?

Interleaving forces the brain to switch strategies and retrieve different solution methods on each trial, which creates more errors and slower performance during practice. This extra effort strengthens long-term memory but creates an illusion that blocking is more effective. About 78 percent of learners incorrectly believe blocking works better.

How does interleaving relate to spaced repetition?

Interleaving and spaced repetition are complementary but distinct strategies. Spacing controls when you review material. Interleaving controls what you mix together during each session. Interleaving inherently creates some spacing between repetitions of each topic, but research shows the interleaving benefit comes from category juxtaposition, not just temporal spacing.

Should beginners use interleaving or blocking?

Research suggests beginners benefit from starting with short blocks to build foundational understanding, then transitioning to interleaved practice once basic procedures are familiar. Executive function capacity also matters: learners with stronger cognitive flexibility gain more from interleaving. A gradual shift from blocked to interleaved practice is often the most effective approach.