2026-04-089 min read

The Research Behind Delivery: What Linguistics Actually Says About Being Understood

We built ShadowSpeak on a specific claim: that stress, chunking, and pitch matter more for being understood than pronunciation accuracy. This post lays out the actual research behind that claim — study by study, finding by finding.

No vague appeals to "research shows." Just the papers, the methods, and what they found.

The Core Question

When a non-native English speaker is hard to understand, what causes the breakdown?

Applied linguists break speech into two layers:

Segmental features — individual vowels and consonants (what pronunciation apps measure)
Suprasegmental features — stress, rhythm, intonation, and pausing (what most apps ignore)

The question is: which layer matters more for real-world comprehensibility?

Study 1: Anderson-Hsieh, Johnson & Koehler (1992)

What they did: Had native English listeners rate speech samples from 60 L2 speakers across multiple language backgrounds and proficiency levels. Rated each sample on a 7-point accentedness scale, then measured deviations in three areas: segmental accuracy, prosody, and syllable structure.

What they found: Prosody had the strongest correlation with accentedness ratings — stronger than segmental accuracy, stronger than syllable structure. Speakers whose prosodic patterns deviated the most from native norms were consistently rated as hardest to understand.

Why it matters: This was one of the first large-scale studies to show that the "melody" of speech — stress and intonation — predicts listener difficulty better than whether individual sounds are right or wrong.

Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42(4), 529-555.

Study 2: Munro & Derwing (1995)

What they did: Had native English speakers listen to extemporaneous speech from Mandarin speakers. Listeners transcribed what they heard (measuring intelligibility) and rated each sample for degree of foreign accent and perceived comprehensibility.

What they found: Accent and intelligibility are partially independent. Speech can be heavily accented but still highly intelligible. The correlation between accent strength and actual understanding was weaker than most people assume.

Why it matters: This demolished the assumption that "less accent = more understanding." Your accent is not the problem. Your delivery patterns are.

Munro, M. J., & Derwing, T. M. (1995). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45(1), 73-97.

Study 3: Derwing, Munro & Wiebe (1998)

What they did: Randomly assigned 36 ESL learners into three groups. One group received explicit instruction on suprasegmental features (stress, rhythm, intonation). One group received instruction on four vowel sounds (segmental). A control group received the same content without explicit pronunciation instruction. All groups were tested before and after.

What they found: Only the suprasegmental group was rated as significantly more comprehensible after training. And critically — the suprasegmental group was the only group that also improved fluency. The segmental group improved on targeted vowel sounds but showed no gain in overall comprehensibility.

Why it matters: This is direct experimental evidence that teaching delivery (suprasegmentals) improves real-world understanding more than teaching pronunciation (segmentals). Improving individual sounds is not enough.

Derwing, T. M., Munro, M. J., & Wiebe, G. E. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning, 48(3), 393-410.

Study 4: Hahn (2004)

What they did: Created three versions of the same academic lecture delivered by a nonnative speaker: one with correct primary stress, one with incorrectly placed stress, and one with stress missing entirely. Native English-speaking university students listened to one version and were tested on content recall.

What they found: Students who heard correctly stressed speech recalled significantly more content and rated the speaker more favorably. When stress was missing, comprehension dropped measurably — even though every word was the same.

Why it matters: This shows that stress is not just about sounding natural. It directly affects whether listeners absorb information. In a real conversation, misplaced stress means your message doesn't land.

Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201-223.

Study 5: Field (2005)

What they did: Recorded two-syllable English words with standard stress and again with shifted stress. Native and non-native listeners heard the words and tried to identify them.

What they found: Shifting stress to the wrong syllable significantly reduced word recognition, even when every sound was produced correctly. Stress shifted to the right (e.g., from first to second syllable) caused more damage than stress shifted to the left.

Why it matters: This is word-level evidence that stress is not decoration — it's structural. Native English listeners use stress patterns as the primary cue for word identification. Wrong stress literally means a different word.

Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39(3), 399-423.

Study 6: Kang, Rubin & Pickering (2010)

What they did: Analyzed 26 speech samples from iBT TOEFL examinees using acoustic measurement tools. Measured suprasegmental features: speech rate, pause frequency, pause duration, pitch range, and pitch variation. Then had 188 native English-speaking undergraduates rate each sample for oral proficiency and comprehensibility.

What they found: Suprasegmental measures alone accounted for 50% of the variance in how listeners judged comprehensibility and proficiency. This was before considering any segmental (pronunciation) measures. Pause patterns and pitch range were the strongest individual predictors.

Why it matters: This is the most quantitative evidence in the set. Half of what makes a listener judge you as "easy to understand" or "hard to follow" comes from suprasegmental features — not from whether you pronounce every sound correctly.

Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94(4), 554-566.

What This Means in Practice

The research converges on three points:

1. Suprasegmentals consistently predict comprehensibility more strongly than segmental accuracy. This is not one study — it's a pattern across multiple research groups, methods, and populations over 30+ years.

2. You can have a strong accent and be perfectly understood. Accent and intelligibility are partially independent (Munro & Derwing, 1995). The goal is not to erase your accent but to align your delivery patterns — stress, pausing, pitch — with what listeners expect.

3. Suprasegmental training transfers to real speech; segmental training often doesn't. Derwing et al. (1998) showed that learners who practiced stress and intonation improved in natural conversation. Learners who practiced individual sounds improved on those specific sounds — but not in real-world comprehensibility.

Why We Built ShadowSpeak This Way

This research is why ShadowSpeak measures four axes — pronunciation, stress, chunking, and pitch — instead of just pronunciation. And why we compare your delivery to a specific speaker rather than an abstract standard.

If suprasegmental features predict comprehensibility more strongly than segmental accuracy, then a tool that only measures pronunciation is measuring the smaller piece of the puzzle. ShadowSpeak measures all of it.

Full Reference List

Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42(4), 529-555.
Brown, A. (1988). Functional load and the teaching of pronunciation. TESOL Quarterly, 22(4), 593-606.
Catford, J. C. (1987). Phonetics and the teaching of pronunciation. In J. Morley (Ed.), Current perspectives on pronunciation (pp. 87-100). TESOL.
Derwing, T. M., Munro, M. J., & Wiebe, G. E. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning, 48(3), 393-410.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39(3), 399-423.
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201-223.
Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94(4), 554-566.
Levis, J. M. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39(3), 369-377.
Munro, M. J., & Derwing, T. M. (1995). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45(1), 73-97.

Ready to practice with real podcasts?

Join the waitlist for ShadowSpeak — podcast-based English delivery practice.

Get Early Access