Developer to Researcher: The Capo Features I Didn't Know Were Important

After 13 years of building ear learning software, I thought I understood how musicians learn from recordings. I'd been doing it myself for decades, built tools that people genuinely find useful, and had clear ideas for new features that seemed obviously helpful.

When I started my Master's program in 2022¹, I appeared—as I put it in my thesis—"guns blazing," ready to research and build something new for Capo. I had the technical expertise and what felt like a solid understanding of the problem I was solving.

But when I tried to justify why these features would actually help musicians, my foundation started to reveal its cracks.

I couldn't find research to support my assumptions. Worse, I couldn't articulate the connection between my solutions and what musicians actually needed. My working knowledge of ear learning—which had seemed perfectly adequate for over a decade—suddenly felt insufficient when I tried to build on it.

So I decided to study the problem from scratch. Actually, my hand was forced—there wasn't much relevant literature to be found.

Most academic research focused on simple melodies in isolation, nothing like the complex recordings that pop instrumentalists actually learn from. Additionally, studies were often targeted at formally trained (classical) music students, or wholly untrained non-musicians. So I turned to YouTube, analyzing videos of skilled musicians demonstrating their process as a way to bootstrap my understanding of what people are doing these days when they learn by ear. I studied 18 videos of musicians actually learning songs, plus 29 lesson videos teaching the skill.²

What I discovered was a gap between common assumptions about ear learning and what actually works.

The biggest challenge wasn't what most people think. It's not primarily about hearing ability, music theory knowledge, or instrument technique. The real bottleneck is working with your cognitive limitations—particularly memory—rather than fighting against them.

Successful ear learners had developed methods that respect how our brains actually process musical information. They work one note at a time when memory gets overloaded. They stop recordings immediately after hearing target notes to avoid interference. They use their voice to hold onto pitches while searching on their instrument.³

These weren't revolutionary techniques—they were strategies experienced musicians had discovered through trial and error. But they perfectly aligned with decades of research on tonal working memory and cognitive processing.

Here's what really surprised me: when I looked back at features I'd already built into Capo, some were accidentally addressing the same memory challenges that I'd observed in the research.

The Audio Freezer that extends notes indefinitely? It helps people who can't hold even a single note in memory long enough to find it on their instrument. The transcription playhead that restarts from the same location? It supports learning one element at a time without interference.⁴

I'd been building tools that worked, but I hadn't understood why they worked.

This taught me something important about the relationship between intuition and understanding. My working knowledge had been sufficient to build helpful features, but insufficient to explain why they helped. Understanding the underlying process matters—especially when you're trying to build better tools or improve your own learning.

The key insight isn't that memory is a problem to solve. It's that successful ear learning is about developing methods that work within the constraints of your cognition, and the best tools support these human-centered approaches rather than trying to automate them away.

But this is just one piece of what I discovered during two years of research. I'm planning to share more about these memory-oriented ear learning techniques and how they can improve the way musicians approach learning from recordings, so keep an eye out for this sort of thing in the future.

For now, I'll share one small tip you can use while learning a melody or solo: The next time you lose a phrase in your mind while you go searching for it on your instrument, try stopping the recording sooner and learn each note one at a time. Sometimes the method matters more than the effort you put into it.

This might come as a surprise to some of you, but I had decided to try my hand at grad school almost 20 years after my undergrad. While I have a nice piece of paper that claims I am a "Master of Mathematics" now, I can assure you that I still find it very challenging. ↩
I should clarify that when I say "musicians" I'm largely talking about "pop instrumentalists," and "ear learning" in this context refers specifically to learning from pop music recordings. These qualifiers appear to make this a difficult area to research. ↩
The sensorimotor process that powers vocal pitch imitation is actually tied to your memory for pitches. So don't be afraid to practice singing, humming, or whistling along to songs to develop this further! ↩
Part of the reason I pushed so hard to get the 4.4 update out was so I could improve upon these features in light of my findings. You can now create bindings to toggle Audio Freezer without scrubbing, and keyboard bindings that only play while a key is held down. ↩