INTRODUCTION
Every flashcard app makes one critical decision thousands of times per day: when should this card appear again? Get it right and the student remembers with minimal effort. Get it wrong and the card either shows up too early, wasting time, or too late, after the memory has already faded. For over three decades, one algorithm dominated that decision. It was called SM-2, created by a Polish university student in 1987. It powered SuperMemo, then Anki, then dozens of other apps. And it worked well enough that nobody seriously challenged it until 2022, when a Chinese undergraduate named Jarrett Ye published a machine learning approach that outperforms SM-2 for 99.6 percent of users. That algorithm is called FSRS. This is the story of both, what makes them different, and what it means for anyone who studies with flashcards.

The Algorithm That Ruled for 35 Years
Piotr Woźniak was a student at the University of Technology in Poznań, Poland, when he started experimenting with memorization schedules on paper in 1985. He called his method SuperMemo. The first version — SM-0 — used fixed interval tables for entire pages of vocabulary. It was crude. He reviewed groups of about forty word-pairs at a time and had no way to track how hard each individual item was.
Two years later, on December 13, 1987, Woźniak wrote the first computer version in Turbo Pascal on an IBM PC. This was SM-2. The breakthrough was simple but powerful: instead of treating all cards the same, SM-2 assigned each card its own "easiness factor" — a number starting at 2.5 that increased when the student answered easily and decreased when the student struggled. The review interval for each card was calculated by multiplying the previous interval by this factor. Easy cards spaced out fast. Hard cards came back sooner.
The student rated each answer on a scale from 0 to 5. A perfect recall scored 5 and nudged the factor up by 0.10. A complete blackout scored 0 and dropped it by 0.80. Anything below a 3 meant the card was reset — back to a one-day interval, start over. Woźniak described his results in a 1990 master's thesis: after one year of using SM-2 on 10,255 English vocabulary items, studying about 41 minutes per day, he achieved an overall retention rate of 92 percent.
That thesis became the foundation of modern spaced repetition. And SM-2 specifically became the most widely adopted scheduling algorithm in history — not because it was the best possible approach, but because it was free, well-documented, and good enough.
How Anki Adopted SM-2 (And Almost Didn't)
When Australian programmer Damien Elmes created Anki in 2006 to help himself learn Japanese, he initially implemented SM-5 — a more advanced SuperMemo algorithm that used optimization matrices instead of simple multiplication. But his implementation exhibited strange behavior: harder cards sometimes had intervals growing faster than easier ones. Elmes suspected a bug in his code, but since Woźniak had never published SM-5 in full algorithmic detail, he could not be sure.
So Elmes switched to SM-2. It was simpler, more predictable, and fully documented. He modified it for Anki's four-button interface — Again, Hard, Good, Easy — instead of the original six-grade scale, and added features like configurable learning steps and a minimum ease floor of 130 percent. These changes made SM-2 more practical for daily use. They also introduced the algorithm's most infamous problem.

The Ease Hell Problem
In Anki's implementation, pressing "Again" drops a card's ease factor by 20 percentage points. Pressing "Hard" drops it by 15. Pressing "Good" — the most common response — changes nothing. Only pressing "Easy" recovers lost ease, adding 15 points. But almost nobody presses Easy regularly.
The math is brutal. After roughly six "Again" responses — not even consecutive, just six total over the life of a card — the ease factor hits Anki's minimum of 130 percent. At that point the interval barely grows. A card that should be appearing once a month shows up every few days instead. And there is no way out. The student keeps seeing the card, keeps pressing Good, and the ease never recovers because Good changes nothing. The Anki community calls this "ease hell."
Before FSRS existed, users built workarounds. The "Low-key Anki" approach fixed all ease factors permanently at 250 percent. Add-ons like Straight Reward automatically increased ease after several consecutive correct answers. Some users manually edited the SQLite database to bulk-reset ease factors. All of these were patches on a structural flaw in the algorithm itself.
Anki's own FAQ page acknowledges the problem directly: repeated failures cause cards to get stuck in low-interval cycles. This was one of the primary motivations for integrating a fundamentally different algorithm.
A Student Builds Something Better
Jarrett Ye was a high school student in Qingyuan, Guangdong, China, when he started using Anki. Over 1.5 years, his grades improved dramatically — enough to gain admission to Harbin Institute of Technology. He went on to study computer science and joined MaiMemo, a Chinese language-learning company with access to 220 million student memory behavior logs.
In August 2022, Ye published a paper at KDD — one of the top conferences in data mining — proposing a new way to schedule spaced repetition reviews using stochastic dynamic programming. The paper showed a 12.6 percent improvement over existing methods. He posted it on Reddit's r/Anki forum. A commenter dismissed it as something that sounds impressive but nobody would actually implement.
That comment stung. Within a month, Ye had built a working implementation. On September 18, 2022, FSRS v1 was released as an Anki add-on. The skeptical commenter — believed to be the blogger Expertium — later became one of the most active FSRS contributors. By November 2023, Damien Elmes had integrated FSRS natively into Anki 23.10 as an opt-in alternative to SM-2.
What FSRS Actually Does Differently
SM-2 tracks one number per card: the ease factor. FSRS tracks three.
The first is Difficulty. This is a number between 1 and 10 representing how inherently hard the card is. Unlike SM-2's ease factor, Difficulty uses mean reversion — if a student answers a card correctly multiple times, the Difficulty gradually returns toward a baseline instead of staying permanently damaged. This single design choice eliminates ease hell entirely.
The second is Stability. This measures how long it takes for the probability of recalling a card to drop from 100 percent to 90 percent. A card with a Stability of 30 means that after 30 days without review, recall probability is roughly 90 percent. After 60 days it might be 75 percent. Stability increases each time a card is successfully recalled, and it increases more when the recall was harder — reflecting the well-documented spacing effect.
The third is Retrievability. This is the current probability that the student can recall the card right now. Unlike Difficulty and Stability, which only change at review time, Retrievability decays continuously. FSRS uses a power-law forgetting curve to model this decay. When Retrievability drops below the target retention rate — typically 90 percent — the algorithm schedules a review.
The current version, FSRS-6, has 21 trainable parameters. Its default values were trained on roughly 700 million reviews from about 10,000 Anki users. Even without per-user optimization — just using those defaults — FSRS produces more accurate recall predictions than SM-2 for 99.5 percent of users tested in the open-spaced-repetition benchmark.
The Numbers: What the Benchmark Shows
The open-spaced-repetition benchmark is the largest public comparison of spaced repetition algorithms. It evaluates predictions across 9,999 Anki collections containing approximately 350 million filtered reviews. The primary metric is log loss — a standard machine learning measure of how well predicted probabilities match actual outcomes. Lower is better.
FSRS-6 with per-user optimization achieves a mean log loss of 0.344. SM-2, adapted with extra formulas to produce probability predictions, scores significantly higher. In 99.6 percent of collections, FSRS-6 has a lower log loss. The practical implication, based on simulation data, is that students using FSRS need 20 to 30 percent fewer reviews to maintain the same retention rate.
One important caveat. The benchmark authors themselves note that SM-2 was never designed to predict probabilities. The comparison requires adding extra formulas to SM-2 that it was not built with. And the 20 to 30 percent efficiency claim comes from simulation, not a controlled empirical study with real students. Still, the direction is clear and the magnitude is large enough to matter. For a medical student reviewing 500 cards per day, cutting 100 to 150 reviews while maintaining 90 percent retention is a significant quality-of-life improvement.
A study by Price et al. (2025) in Academic Medicine tested spaced repetition with over 26,000 physicians and found retention rates of 58 percent versus 43 percent for the control group. Research by Dunlosky et al. (2013) rated distributed practice and retrieval practice as the two highest-utility learning techniques out of ten studied. The science behind spacing is not in question. The question is which algorithm spaces most efficiently — and the data increasingly points toward FSRS.

The Debate That Isn't Over
Piotr Woźniak, the creator of SM-2 and its successors up to SM-18, has responded extensively to the FSRS benchmarks. His position is nuanced. He acknowledges that Ye "truly understood Algorithm SM-17 and the 3-component model of memory" and that the design "deserves praise." But he disputes the comparison methodology, arguing that standard machine learning metrics like log loss are inappropriate for evaluating spaced repetition algorithms and that Anki user data is biased toward crammers and procrastinators.
Woźniak has also claimed that SM-19 and SM-20 — unreleased algorithms under development — outperform FSRS on his proposed "Universal Metric." FSRS contributors counter that these comparisons test unoptimized FSRS against optimized SuperMemo algorithms, making the comparison unfair. A direct head-to-head comparison within the same software is planned once SuperMemo releases its API, expected sometime in 2026.
This is a genuine scientific debate, not a settled question. But for practical purposes, the relevant comparison for most students is not FSRS versus SM-18 — it is FSRS versus SM-2, because SM-2 is what Anki used by default for seventeen years and what millions of active decks are still running on.
Beyond Both: Other Approaches to Scheduling
Not every flashcard system uses SM-2 or FSRS. The original Leitner system from 1972 uses fixed box-based intervals — typically something like 1, 3, 5, 8, 16, and 31 days. A correct answer moves the card to the next box with a longer interval. An incorrect answer sends it back. The system is simple, requires no computation, and works with physical index cards and a shoebox. But it treats all cards in the same box identically, regardless of individual difficulty, and the intervals never adapt to the user.
Some modern tools take a middle path. Instead of tracking difficulty per card like SM-2 or modeling memory states per card like FSRS, they adjust intervals at the deck or subject level based on aggregate performance. If a student consistently struggles with cards in a particular topic at a particular review stage, the system extends or shortens the intervals for that entire stage — but only for that subject. The result sits between pure Leitner and per-card algorithms: more personalized than fixed boxes, simpler than machine learning models, and potentially well-suited for learners who want adaptation without complexity.

CONCLUSION
SM-2 defined spaced repetition for a generation of students. It worked. Millions of people learned languages, passed medical boards, and memorized entire textbooks using an algorithm from 1987 running on a four-button interface. But its structural flaws — the ease hell spiral, the lack of population-level learning, the inability to predict recall probability — became harder to ignore as better alternatives emerged. FSRS addresses all three. It models memory as a dynamic system with difficulty, stability, and retrievability, trained on hundreds of millions of real reviews. The benchmark data is compelling. The transition is already underway. For a deeper look at how these algorithms play out in practice across different apps, including current App Store rankings and alternative tools, the picture becomes even clearer: the future of spaced repetition is adaptive, personalized, and data-driven. Some platforms, like Mindomax, are exploring yet another path — adapting Leitner-based intervals at the deck level rather than per card, which sits between the simplicity of fixed boxes and the sophistication of FSRS.
Frequently Asked Questions
What is the main difference between FSRS and SM-2?
SM-2 uses a single ease factor per card to multiply review intervals. FSRS uses a three-variable memory model tracking difficulty, stability, and retrievability per card, trained on hundreds of millions of reviews. FSRS predicts when recall probability drops below a target and schedules reviews at that moment.
Does FSRS completely replace SM-2 in Anki?
No. FSRS is available as an opt-in alternative since Anki version 23.10, released in November 2023. SM-2 remains the default algorithm. Users can switch to FSRS in the deck settings without losing their review history.
What is ease hell and does FSRS fix it?
Ease hell occurs when SM-2's ease factor spirals downward irreversibly, causing cards to appear far too frequently. FSRS eliminates this through mean reversion of difficulty, which gradually restores a card's difficulty rating after consistent correct answers instead of leaving it permanently damaged.
How much more efficient is FSRS compared to SM-2?
Simulation data suggests FSRS requires 20 to 30 percent fewer reviews for the same retention rate. This estimate comes from algorithm simulation rather than a controlled empirical study, but the benchmark data across 350 million reviews strongly supports improved scheduling accuracy.
Is the Leitner system still worth using in 2026?
The Leitner system remains effective for its simplicity, especially for beginners or learners who prefer physical flashcards. It lacks per-card adaptation and relies on fixed intervals, which makes it less efficient for large volumes of material. Adaptive variants that adjust intervals at the deck level offer a middle ground between Leitner simplicity and FSRS sophistication.

