Why the hardest part of overtone singing isn't what you think
You watched the tutorial. You positioned your tongue. You held your drone.
And… nothing. Or almost nothing. A vague shift in timbre, something that might be a harmonic, but you’re not sure. You try again. You adjust. You push harder. Still that doubt.
If this sounds familiar, you’re in good company. You’re not tone-deaf, you’re not doing it wrong, and your anatomy is not the problem. You’re hitting an obstacle that almost nobody names.
What tutorials get right
Good overtone singing tutorials teach real, useful things: how to sustain a stable drone, how to position the tongue, how to shift from “oo” to “ee” while searching for the whistling tone. If you’ve never tried, this is a solid place to start — and anyone can learn.
These instructions work. Many people find their first harmonic by following them.
But a significant number get stuck despite doing everything correctly. On a spectrogram, you might already see a harmonic starting to emerge. Except they don’t know that — because they can’t hear it.
The obstacle nobody names
After years of teaching, in workshops across France, Belgium, Greece, and Suisse, with complete beginners and seasoned professionals, I’ve arrived at a conviction that shapes everything I do.
The hardest part of overtone singing is not the production of the sound. It’s the interpretation of what you hear.
Technique matters, it matters enormously. Learning to sculpt the vocal tract with precision is the other essential dimension of mastery, and I’ll dedicate an entire article to it. But today, I want to talk about what resists the longest, and what nobody explains.
The shape-sorting game
You know those children’s toys, the shape sorters? A box with cut-out holes: a circle, a square, a triangle. The child picks up a block, finds the matching hole, pushes it through. Satisfaction.
I believe our brain does exactly this with the sounds of the voice.
Since early childhood, we’ve carved “holes” in our perception, holes shaped like vowels. One for “oo.” One for “ah.” One for “ee.” Every time the resonance of a voice changes, because the mouth opens, because the tongue shifts, our brain reads that change as a vowel change. It’s fast, efficient, and extraordinarily useful: it’s how we understand speech in real time, in noise, across accents.
Now, in overtone singing you change resonance for a completely different purpose: to focus the energy of your voice onto one specific harmonic, until it becomes audible as a separate tone above the drone. The resonance is the same tool, but the goal is different.
The problem is that your brain doesn’t know this yet. It does what it has always done: it reads every resonance shift as a vowel label. And as long as it’s reading vowels, it doesn’t perceive the harmonic that’s emerging.

This isn't just a metaphor
What I’ve described has a name in cognitive science.
In 1957, Alvin Liberman discovered categorical perception: when sounds vary along a physical continuum, listeners don’t perceive a smooth glide, they perceive an abrupt jump from one category to another. The differences between categories are amplified. The differences within a category are compressed, nearly erased.
In 1991, Patricia Kuhl showed that the prototype of each vowel functions as a perceptual magnet: it pulls neighboring sounds toward itself, making them harder to tell apart. Perceptual space is literally warped, compressed around the prototypes. The hole doesn’t just receive shapes. It pulls them in.
The most illuminating detail: newborns can discriminate all phonetic contrasts in all the world’s languages. Then, by six months, their perception reconfigures to retain only the categories of their native language (Kuhl et al., 1992). The holes deepen. The other distinctions fade.
We gain extraordinary linguistic efficiency, and pay for it with a loss of sensitivity to anything that doesn’t “serve” our language.
What I know from the inside
I’m Greek. I started speaking French at 28. More than twenty years later, my accent is still there.
Greek has 5 vowel sounds. French has 15. The French u in “lune,” the eu in “veux,” the difference between é and è, after two decades of daily life in French, these sounds still resist me. Not always, not everywhere, but enough that my “Southern” accent, as the French politely call it, fools nobody.
Here’s the fascinating part: my mouth can produce them all. If you asked me to glide across every possible resonance on the vowel chart, Greek, French, or otherwise, I could do it with ease. My vocal tract has no problem.
But the moment I speak, my Greek categories take over. The left brain, the one managing language, doesn’t take the time to fine-tune the timbre. It categorizes, compresses, rounds off. Twenty years haven’t changed much. This is exactly what Liberman and Kuhl describe: the holes carved in childhood resist time remarkably well.
Overtone singing poses the same challenge, only more radical. When you move from “oo” to “ee,” your brain reads a sequence of vowels, hole to hole. But the harmonic emerging from your resonance adjustments has no hole of its own. So it gets swallowed by the nearest vowel. And you don’t perceive it.
The shift
There’s a moment in learning where something changes.
It’s not a change in technique. The tongue hasn’t moved. The breath is the same. But suddenly the harmonic is there — clear, obvious, unmistakable. Like an image emerging from a stereogram.
Students who experience this describe it the same way: “It was there all along!”
The ear stopped reading the resonance as a vowel label, and began to perceive what the resonance was actually doing: amplifying a harmonic. Same sound, same mouth, same breath. But a different attention.
What’s remarkable (and a study involving Wolfgang Saus has shown this) is that overtone singing primarily activates the right hemisphere of the brain. Not the left, which manages language and its categories. The right, the hemisphere of musical perception, timbre, spatial hearing. The perceptual shift I’m describing may be, quite literally, a shift in hemispheres.

Something you can try now
What this changes
When you understand the deepest difficulty is perceptual, learning changes shape. You stop searching for the right position and start searching for the right state of listening.
Of course, listening alone isn’t enough. Learning to sculpt the vocal tract is the other essential side of mastery. Listening tells you where you are; sculpting lets you go elsewhere. The two advance together.
But listening unlocks the door. Without it, you sculpt blind. With it, every micro-adjustment becomes audible, meaningful, guided.
Technique will come. Listening will guide it.
If you’d rather experience than read, book a lesson or join me for a vocal retreat on Aegina Island. The best way to understand listening is still to listen.
The harmonic isn’t found. It’s recognized, when the ear stops filing it into a hole.
By Iannis Psallidakos — overtone singing teacher & vocal acoustics researcher. About →