You can finish the entire Duolingo tree, keep a streak of a thousand days, and still freeze when a waiter asks you a simple question. This is not a personal failure. The core Duolingo lessons train you to recognize and recall language, not to produce it. Recognition and production are different skills, and the one that lets you speak is the one the free product practices least. Here is why, and what closes the gap.

Recognition is not production

Most of the time you spend in Duolingo, you are choosing, not creating. You pick words from a word bank. You select between multiple-choice answers. You tap the tiles in the right order. According to Duolingo’s own description of its approach to speaking skills, even the spoken exercises ask you to repeat words, translate a sentence out loud, or speak an answer instead of typing it. The microphone compares your pronunciation against a native model and highlights the words.

This is genuinely useful. It builds comprehension, vocabulary, and pronunciation. But notice what every one of these tasks has in common: the language is already on the screen. You are reproducing a sentence the app handed you. You are recognizing the right answer among options, or echoing a model.

Speaking is the opposite motion. Nobody hands you the sentence. You have a thought, and you have to assemble it into the target language, in real time, while another person waits and then says something back that you did not plan for. The tiles are gone. The options are gone. That is a different cognitive act, and it is the one the lesson tree rarely asks of you.

What the research says about why this happens

The clearest explanation comes from linguist Merrill Swain. Working with Canadian French immersion students in the 1980s, she noticed something strange: after years of immersion, the students understood French almost perfectly, but their speaking lagged badly behind. Comprehensible input alone had not produced fluent speakers.

Her conclusion became the output hypothesis: producing language, not just understanding it, is what drives acquisition. Output forces three things that input cannot. You notice the gap between what you want to say and what you can actually say. You test a hypothesis about how the language works and find out, from the response, whether it held. And you reflect on the structure consciously, which helps lock it in. This is the engine of speaking, and it only turns when you produce your own language.

Stephen Krashen’s input hypothesis is the famous counterweight, the idea that comprehensible input is the main driver. The two are not enemies. You need input to understand and output to speak. The problem is that recognition-based practice delivers a great deal of the first and very little of the second. You end up with a learner who understands a lot and can produce almost nothing on demand. That is exactly the Duolingo-veteran experience: you read the menu fine, then go silent when asked what you want.

The freeze is real, and it has a name

There is a reason the silence feels physical. The discomfort of being put on the spot in a language you are still learning is documented as foreign language anxiety. One of its core components is communication apprehension, the anxiety of speaking with or in front of others. It pushes learners toward avoidance: staying quiet to dodge the risk of an error and the judgment that might follow.

Here is the trap. The only thing that reduces this anxiety is the very thing it makes you avoid: actually speaking, often, in low-stakes situations until it stops feeling like a performance. A recognition-based app never triggers the anxiety, because you are never truly on the spot, so it never desensitizes you to it either. The first time you face a real conversation, the freeze is fresh, no matter how many lessons sit behind you.

Credit where it is due: Duolingo does have real conversation

This is important, and it is where a lot of “Duolingo can’t teach speaking” takes get it wrong. Duolingo does have a genuine open-conversation feature, and it is good.

It is called Video Call with Lily. In Duolingo’s own words, you can call Lily to have a “spontaneous, free-flowing conversation in your target language.” Lily starts, you can talk about anything, and you can ask her to slow down or repeat. That is real production, the kind the output hypothesis says you need. It is not a scripted exercise.

The honest case against relying on it is not that it does not exist. It is about access and dose. Video Call with Lily is available only to subscribers of Duolingo Max, the company’s most expensive tier, and the conversations are short by design: about a minute early in a course, up to roughly three minutes as you advance. It covers six languages on all platforms (English, Spanish, French, German, Italian, and Portuguese), plus Japanese and Korean on iOS. So the one feature that trains real speaking is the one feature most people, on the free product, never touch. The default experience stays recognition. The production lives behind the paywall, in short bursts.

What actually closes the gap

If the missing skill is producing your own unscripted language, the fix is to produce your own unscripted language, frequently, with someone who responds. That is the whole prescription. It is also the part that is genuinely hard to arrange, which is why most learners skip it. You need a patient partner, a schedule, and enough nerve to sound clumsy. We wrote about engineering this into your daily life in the science of language immersion at home, and about how the most efficient learners prioritize speaking over polish in what polyglots do differently.

Mintza was built to remove the friction from that one missing piece. It is an AI voice teacher you talk to in a real, open, spoken conversation about anything you want. No script, no multiple choice, no tiles to tap. It talks back in real time like a person and follows wherever you take the conversation, which means you are doing the thing the lesson tree never asks of you: generating your own sentences on the spot and getting a response.

A few things make it practical for the exact learner who got stuck after Duolingo:

  • It catches you when you fall. When you get stuck, Mintza switches to the language you already speak, helps you, and brings you back. You are never left stranded mid-sentence, which is the moment that usually ends a real conversation and feeds the anxiety.
  • It meets you at your level. Four levels, Starter, Beginner, Intermediate, and Advanced, with vocabulary and speed matched to you. You can pick a regional accent too, British or Australian English, Madrid or Buenos Aires Spanish, and others.
  • It remembers. It keeps your past conversations and chosen topics, so you build continuity instead of starting cold every time.
  • It gives you room. Conversations run up to 30 minutes each, with no daily cap. That is enough time to actually warm up and lose the self-consciousness, not the one-to-three-minute window of a premium add-on.

It covers fifteen languages in either direction: English, Spanish, Portuguese, French, Italian, German, Greek, Chinese, Russian, Turkish, Swedish, Arabic, Japanese, Korean, and Hebrew. You start with 10 free minutes, no subscription and no card required, and those free minutes never expire. Paid plans are simple monthly minute pools: Basic at $22.99 for 180 minutes, Plus at $39.99 for 360 minutes, and Pro at $59.99 for 600 minutes, cancel anytime. It needs an internet connection because the conversation is processed by AI in real time, and your voice is streamed live, not recorded or stored.

The honest summary

Duolingo is a good tool for what its core lessons actually train: recognizing words, recalling vocabulary, and practicing pronunciation against a model. If you finished the tree and still cannot speak, the app did not fail at its job. You just never practiced the different skill that speaking requires, because the core product mostly hands you the language instead of asking you to produce it. Duolingo’s one feature that does train open conversation, Video Call with Lily, is real and worth crediting, but it lives behind the most expensive tier and runs in short bursts.

The path to speaking is not more recognition. It is production, often, with feedback, until your own sentences start arriving without a delay. That is exactly the gap Mintza was built to fill.

Mintza is available for iOS and Android.