I read a few articles about Capo 2 lately, and noticed that some authors have claimed that Capo will “automatically tab out” music. Here’s what Capo’s product page says:

Tab it Out!

By simply drawing atop the spectrogram, Capo will generate tablature automatically for you. It really doesn’t get much easier than this!

The “automatic” bit is related to the process of translating your entered note data (what I like to refer to as “truth notes”) into tablature below the spectrogram display. In the future, I plan on adding support for standard notation, though it’s a much tougher problem to solve.

To me, the term “Auto Tabbing” is the same as “automatic transcription.” There are many researchers that are working to advance this technology, but it’s far from ready. I know, because I researched it for much of the past year.

The Research

I started researching different methods of automatic transcription in mid-2009, because I was curious about how far along this technology was, and if it could be integrated into a future version of Capo.

Each of these automatic transcription algorithms start out with some kind of intermediate represenation of the audio data, and then they transfer that into a symbolic form (i.e. note onsets, and durations).

This is where I encountered some computationally expensive spectral representations (The Continuous Wavelet Transform (CWT), Constant Q Transform (CQT), and others.) I implemented all of these spectral transforms so that I could also implement the algorithms presented by the papers I was reading. This would give me an idea of whether they would work in practice.

Boy was I in for a surprise—just implementing the front-end to many of these transcription algorithms had me feeling defeated. In one paper, the authors claimed to have computed a CWT on a 30s audio sample in only 1.9s, where my own implmentation was taking upwards of 15 minutes (on an 8-Core Mac Pro!)

Sure enough, contacting the researchers revealed that they were using a modified version of the CWT (contrary to what the paper said,) which they are keeping as a closely-guarded secret. So that was the end of that…

I then (re-)stumbled on the Constant-Q Transform (which I had first encountered in FuzzMeasure research back in 2004 or 2005.) This is considered by some to be a special case of a wavelet transform. My first implementation was promising (only about half the time, and a tiny fraction of the RAM usage.) Then, I ran with that and made it better.

I grafted some transcription approaches on top of this spectral representation, and realized very quickly that these algorithms are not ready for prime time.

Even the best automatic transcription algorithms today only work with a single instrument voice (i.e. just a violin, or a flute, etc.). Some can go further to transcribe multiple voices of the same instrument (i.e. 3 cellos, 2 flutes, etc.), but their accuracy drops considerably. The best that I encountered was in the 60-70% range.

The Road-Block

I think that the major problem that affects automatic transcription right now is in filtering and separation. Because the single-voice algorithms are progressing steadily, it would seem that one simply has to separate the individual instruments into different streams, and then apply the algorithms on each stream.

Unfortunately, you can’t unbake a cake. The stereo recordings we listen to are mixed down from many tracks, and processed heavily, in order to get the final result.

My opinion is that we’re stuck at a road block, and we’ll only be able to pass through it when music is distributed in a multitrack form, with mixing/processing done by the listener.

In short: Don’t hold your breath.

This afternoon I released FuzzMeasure 3.2.2, which you can download from http://fuzzmeasure.com.

The largest change in this release is an important one, as I believe I may have finally found a ‘silver bullet’ to deal with widespread confusion about FuzzMeasure’s results. Now, FuzzMeasure will fail gracefully when a measurement is captured incorrectly, rather than giving you bad data.

You see, I get a lot of support emails from customers who are new to FuzzMeasure, and acoustic measurements in general. In many cases, their misunderstanding stems from the fact that FuzzMeasure will happily let you measure silence, and give you a meaningless graph.

Not any more!

FuzzMeasure now checks and ensures that there is a distinct impulse peak that should be well above the noise floor of a signal. This serves as a built-in failsafe that will force you to repeat measurements until you get your levels set up correctly.

Setting levels may be a combination of adjusting the volume of your amplifier to produce a louder input for your microphone, or changing the gain on your microphone preamplifier to pick up more signal. At any rate, you should be sure to use the Level Meter window to keep an eye on how quiet (or loud) your input is being picked up during the measurement.

In addition to this change, you can now also choose to normalize the impulse graph “post calculation.” Just choose Impulse > Normalize from the main menu to enable this feature. This feature basically allows you to view the Envelope Time Curve and Log Squared Impulse Response with their peak at 0dB. Normalizing the record (Measurement > Normalize) would not necessarily guarantee this.

I also boosted performance considerably for users that find themselves viewing lots of Envelope Time Curve graphs. Because that calculation is very intensive, I now use a caching scheme that will stash already-calculated ETCs to disk for faster recall when switching between records.

If you hit any trouble, you can always email support@supermegaultragroovy.com with your concerns.

After nearly a year of research and development, Capo 2.0 is here!

Now that the cat’s out of the bag, you can see why I’ve been hacking on OpenCL so much. As a result of my OpenCL use, Capo 2 requires Mac OS 10.6 Snow Leopard.

Sometimes I need to step back and look at what I’ve accomplished here. You can now not only listen to what is going on, but you can also see what’s happening. In addition to that, you can actually tab out the songs by clicking and dragging notes on top of the spectrogram!

Needless to say, this is a revolutionary application. I would like to believe that I’ve completely changed the way people will learn to play their favorite songs, given only the studio recordings.

I spent way more time than I should have getting my Mac Pro to boot a version of the Ultimate Boot CD

I have some WD Green Power drives in a ReadyNAS NV+, and the latest one I purchased was parking its heads far too often. Its load cycle count was growing very high over the few days I owned it.

I followed these instructions to disable the idle timer on my drives, but I just couldn’t get the Ultimate Boot CD to boot at all on my Early 2008 Mac Pro. It would freeze before it ever presented me with its (large) options menu. The CDs booted fine on my 13″ MacBook Pro, so obviously it didn’t like my Mac Pro. (I can’t say I ever imagined I’d boot any flavour of DOS on an 8-core machine with 10GB RAM—crazy!)

After a whole lot of searching, and failed attempts, I finally decided to try the 5.0 beta of UBCD this morning, and that managed to work.

The key (for me, anyway) was to choose option #3 in the FreeDOS boot menu that’s presented to you. That was the only one that was able to detect my CD drive, and actually pass the initial part of the boot process.

So now all my WD Green Power drives have their idle timers disabled, and they seem to be living happily in the ReadyNAS, for now…

Today I’ve released FuzzMeasure 3.2 out in the wild for the world to use. Here’s the official press release.

I think this is a very solid update to the app, and I’m happy to finally have it in my customers’ hands after such a long time. For me, the best part is that I now get to focus on all the exciting features I had to fight so hard to keep out of 3.2!! :)