Archive for the 'Software Development' Category

Math and Cocoa

A few years ago when I was working on FuzzMeasure, I found myself in need of a math library to work with large data sets, but I didn’t want to deal with building C++ classes to fill out my framework. Ideally, I’d build something that fit in with Cocoa as much as possible.

Why not C++?

Well, I really dislike C++, for starters. It has a whole lot of fancy OO features, but it’s an awful lot of rope to hang yourself with. Every time I’ve worked with C++, I’ve hated it. If I’m coding for fun, I’ll code with a language I like, thanks.

But Objective-C is missing some crucial things, right?

  • Syntactic sugar. You couldn’t write aVector + bVector to add two vectors together using Objective-C, as there is no operator overloading (not a bad thing, as you’ll see below).
  • Templates. Yes, I actually wrote separate SMUGRealVector, SMUGDoubleVector, and SMUGComplexVector classes.
  • Speed. Maybe, but I’m not so sure the win for C++ is cut & dry.

After using this library for over 4 years, I haven’t found these missing “features” to be a problem. In fact, I find the Objective-C language to be far better, in the long run, because I’ve gained these features (among others):

  • Extensibility. You can build specific features onto existing classes, which is especially powerful if you’re consuming the class as part of a framework. (i.e. You don’t own it.)
  • Readibility. The code’s more verbose, but you can read the code again in a few years without much trouble.
  • Frameworks. Cocoa is amazing. It’s chock full of extremely useful classes that basically do what I want, anyway.

When I talk about readability above, I’m especially poking at overloading operators in C++. How do you know what v * C does, just by looking at it? Does it scale v by a constant? How can you be sure at first glance? Compare with: [v multiplyBy:C] or [v scaleBy:C] for vector multiplication, and scaling.

Keeping it Cocoa

I started out by naïvely considering NSArrays of NSNumbers, but that fell over quickly. I wanted to utilize Cocoa: but not too much. I also wanted it to be fast (or give me an opportunity to opimize it easily later).

I had to operate primarily on long vectors, so I could carry over mountains of MATLAB code that are built with vectors in mind. Take a signal, stuff it in a vector, do FFTs on the vector, normalize it, multiply with another vector, work in the frequency domain, etc.

Well, it doesn’t get much faster than arrays of floats—big ol’ blobs of memory on your system—to deal with giant data sets like these. So, how do we get arrays of floats in a Cocoa-like way?

Storing Vectors In Memory

@interface SMUGRealVector : NSObject  {
    NSMutableData *mData;
}
// ...lots of stuff

That’s right—there’s not much to the classes. Just a blob of NSMutableData, which is a giant step up from a naked pointer to a blob of floats. NSMutableData gives us so much:

  • Operations on, and extraction of, ranges of data.
  • Simple increase/decrease operations on its size, even appending other blobs of data.
  • File serialization routines!
  • And more, of course…

Yes, I’m competent enough to write the above routines myself, but I know better than that. So how do we build these vectors?

Vector Construction

- (id)initWithLength:(unsigned int)N;
{
    if ( !( self = [super init] ) ) {
        return nil;
    }
    mData = [[NSMutableData alloc] 
        initWithLength:(N*sizeof(float))];
    if ( !mData ) {
        return nil;
    }
    return self;
}

// And a whole whack of convenience functions...
+ (id)realVectorWithLength:(unsigned int)N;
+ (id)realVectorWithOnes:(unsigned int)N;
+ (id)realVectorWithIntegersRangingFrom:(int)begin to:(int)end;

Looks good, right? Some of you might recognize some MATLAB idioms in there, such as ones(), or replicating [1:5], which returns [1 2 3 4 5]. Great, what else can we do?

Accessing Vectors

To get at the bits & pieces of vectors, you have a few options:

- (float*)components;
- (unsigned int)length;
- (void)setComponent:(float)c atIndex:(unsigned int)i;
- (float)componentAtIndex:(unsigned int)i;

They do what you’d expect, wrapping the matching NSData routines in the case of component, and length. You can also operate on ranges of vectors:

- (SMUGRealVector*)realVectorInRange:(NSRange)range;
- (void)appendVector:(SMUGRealVector*)v
- (void)replaceComponentsInRange:(NSRange)range
    withRealVector:(SMUGRealVector*)v;

You can actually build a ‘vector queue’ of sorts, by using the routines above. In one case, you can build a standard queue by appending vectors on one end, and extracting a sub-range from the other end—discarding the original, longer vector as you go. This is certainly memory-intensive, but these are extremely handy to bootstrap some signal-processing algorithms (overlap-add and overlap-save, for instance).

(If you wanted to optimize overlap-add, and overlap-save, you would instead build a large circular buffer, of sorts, and just use a combo of replaceComponentsInRange: and realVectorInRange: to do your bidding, but I digress…)

Vector Math

This is where I think my math library really kicks ass. I built with simplicity in mind, but I also wanted to ensure I could take advantage of vecLib/vDSP.h as much as possible, because I think it’s an underused API from Apple.

Here are a few choice routines:

- (void)square;
{
    vDSP_vsq( [self components], 1, [self components], 1, 
        [self length] );
}
- (void)multiplyBy:(SMUGRealVector*)x
{
    NSParameterAssert( [self length] == [x length] );
    vDSP_vmul( [self components], 1, [x components], 1, 
        [self components], 1, [self length] );
}
- (void)scaleBy:(float)scalar;
{
    vDSP_vsmul( [self components], 1, &scalar, 
        [self components], 1, [self length] );
}
// and so on...

Isn’t this great? These are one-liner routines thanks to vDSP!

(I actually chose not to use the FFT library from vecLib, for a few reasons. First of all, I find the way it stores complex values, and the DC/Nyquist components to be strange. Second, I encountered a (now fixed) bug long ago with FFT lengths > 128K.)

Memory vs Speed

When I designed the vector class, I had to keep in mind that there was a tradeoff between memory, and speed. For instance, to calculate, z(Cv+w), where C is a constant, and v,w,z are vectors, this is valid:

float *vc = [v components];
float *wc = [w components];
float *zc = [z components];
unsigned int len = [z length];
// Gotta be careful!
NSParameterAssert( len == [w length] && len == [v length] );
for ( unsigned int i = 0; i < len; i++ ) {
    zc[i] = zc[i] * ( ( vc[i] * C ) + wc[i] );
}

However, instead I prefer to write:

[v scaleBy:C];
[v add:w];
[z multiplyBy:v];

(Note that the results of the calcuations first stuck to v, and then to z. Thus, a copy of both v and w must be made in advance if you want to retain their values for later. This is something I have to think about constantly when using the library…)

In most cases, the latter way of writing the code turns out to be much faster. This is because vDSP is built to operate on large data sets very quickly, so it can often outperform hand-written loops in many cases. Furthermore, it's much easier to read and maintain this code!

However, there are certainly instances where you can do better than the canned routines, and this is where I think Objective-C really shines for this library.

Categories Rule

In specific projects, I have specific math needs. For instance, FuzzMeasure has specific categories for generating swept sine waves. So let's say we wrote a highly-tuned version of the loop above, and we wanted to operate directly on z (i.e. we didn't care about the original value of z). We build a category called SMUGRealVector (MyOperation), and define this routine:

- (void)myOperation;
{
    // Replicate the above routine, but replace 'z' with 'self'.
}

Then, when we want to use it in our source, we #import "SMUGRealVector_MyOperation.h", and then call it:

[z myOperation];

This isn't news to many veteran Objective-C coders, but I found it to be a great tool for building an extensible library for doing math on large vectors. Furthermore, it lets me slowly evolve my class into one that closely resembles the MATLAB built-in functions.

That way, when I come across signal processing algorithms described in MATLAB code, I can quite easily port them to work in Objective-C. Even better, I can easily go back & forth between my code, and Octave (the free MATLAB clone), comparing results for operations as I code these algorithms.

So, Now What?

I'd really like to share this math library, but there are some problems I need to resolve before I give it away.

  • I lied above. FuzzMeasure doesn't have a swept sine generation category, because I've not split it out of the main framework yet. I do use categories in the way I described, but much of the extensions that are neatly organized are considered secrets right now.
  • It's a mess. When I started writing it, I sucked at Cocoa/Objective-C. So there are many silly coding mistakes, and things I'd rather not share.
  • A big mess. There are also many other routines in the framework that have nothing to do with math at all.
  • I'm really busy. When I do this, it's going to take time, and effort that I just don't have to spare right now.
  • The name sucks. SMUGFoundation makes sense for a general group of classes, but I need to split this out into a new SMUGMath framework.
  • I don't know how to build it. Do I branch my own copy, pushing changes to the mainline repository once in a while? Or, do I just split the code out and give it away, not ever consuming changes made by the public? Both have caveats.
  • I can't afford to support it. You are a great friend of open-source, and will help me improve the library. But we both know I'm going to be getting tons of emails from randoms asking about lame Xcode link failures, and how to stick it into their iPhone project (which won't work right now, due to a lack of vDSP, for starters).

I'll do my best, though, because I wanted to share this framework for at least two years. I've only recently reached a point where I'm comfortable taking the steps.

If nothing else, I hope this post helps people come to a similar conclusion that I did, which is that Objective-C, and Cocoa, can be used as a part of very sophisticated processing frameworks. There's no reason to force yourself into straight C, or C++, to achieve this.

Sending messages to objects might be slower than C or C++, but once you get into the method implementation, that's where you can really rock out. I've branched this framework off to do all sorts of advanced signal processing, including using OpenCL to further accelerate operations on large vectors of data.

Trust me, it's fast enough.

Please excuse the vague post, as I don’t have anything specific I’d like to share just yet. However, what I’d like to do here is call attention to my new favorite part of Mac OS X 10.6 Snow Leopard—OpenCL.

I’m working on some incredible technology for Capo lately, but it’s pretty heavyweight stuff. I’m processing audio data to produce a fancy visualization of its spectral content (not using the FFT). Unfortunately, running this operation is quite slow, so I’ve been trying to parallelize, and optimize it as best as I can.

In practice, I don’t intend to run the processing on entire audio files, but I’ve been using that as a worst-case example to test the throughput of a few approaches I’ve been working on. The input file for the tests below is a 45-second wave file, and the tool I built produces a detailed image file containing a time-vs-frequency view of the entire file.

All the test results were collected on my 8-core Mac Pro. I realize this isn’t representative of users’ machines, but it allows me to verify that all the system’s computing resources are being utilized—it’s not trivial to peg 8 cores. And, seeing how this is where computers are heading in the near future, this seems like a smart thing to focus on…

Initial Approach

I implemented the algorithm in question using a MATLAB source file for reference. I used my optimized math routines (which are sped up using vecLib/vDSP stuff), but it still took a while—492 seconds to process the file!

I fully expected this to be slow, so I wasn’t surprised with the result. It operated on only one core, and used up a modest amount of memory. At least I had a baseline in place, and output data to verify the optimized routines against.

(Note: This test was run at a lower resolution than the ones below, as it was unbearably slow as it stands. I think the true timing for this algorithm, with the same analysis parameters for the file, were more on the order of thousands of seconds. Running a test with the same resolution used in the rest of the tests, using a 0.91 second-long input file, resulted in 10 minutes of running time—brutal!)

NSOperationQueue

I know this path very well—NSOperationQueue is employed to speed up waterfall calculations in FuzzMeasure, and it’s a very easy API to work with. After some intense optimization work that lasted over a day, I managed to get the process running in 115 seconds.

This was a huge improvement, and I even managed to integrate this algorithm into Capo with a preliminary UI wrapped around it. Unfortunately, I had to really turn down the resolution to get reasonable performance numbers.

There was also another interesting side effect, which is that the Capo audio engine would be bogged down as the file was being processed in the background. You see, NSOperationQueue seems to run at a normal priority, so it could preempt anything else that’s happening in your application. You can mitigate the problem by reducing the number of concurrent operations on the queue, but you don’t have much space to reduce that load on a 2-cpu machine.

Also, the way my math library is structured, I decided to trade memory for computation time, so I had to do a bunch of work to balance the memory load (e.g. how much of the audio file is loaded at once before spawning off lots of copies of its data so it can be read in parallel by all these threads) during runtime. By the end, my code wasn’t very pretty at all.

Finally, the way this code was all written, it wasn’t very easy to have on-demand updates of the parameters used to generate the image of the spectral data. So you couldn’t have a user-defined frequency range parameter that is manipulated in real time, as updating parameters would result in things being re-calculated again. These are design issues on my part (it was a tradeoff for overall algorithm run speed), but I made these decisions consciously.

OpenCL Attempt 1—Scalar Code

OpenCL excites a lot of people, and they seem to go ga-ga over the fact that you can schedule work for your GPU to do. However, it’s also an amazingly expressive way to write parallelized code for multi-core CPUs.

I didn’t try the OpenCL route until I had become sufficiently frustrated with my NSOperationQueue implementation. I had optimized it as much as I could handle (without severely obfuscating a ton of code, and making the whole implementation very fragile), and I really didn’t want to start thinking about making a future release of Capo 10.6-only so soon.

That said, I really wanted to know for sure that this would offer some kind of benefit over what I’ve been doing so far. Heck, maybe I could use the GPU to do my bidding…

Well, I whipped up a Cocoa wrapper for OpenCL (which I hope to share once I add some more features to it), and wrote my first kernel for OpenCL in a few hours (with the OpenCL spec at-hand). Once I wrapped my head around the whole process, I stepped back and realized that my code was much cleaner, and readable, than before.

Still, this was a very naïve implementation, so I wasn’t expecting magic out of it. After working out the bugs, I measured 30.87 seconds! Holy cow—that’s a huge gain!

At this point, I could have stopped. I had basically shaved >60% of the time off my NSOperationQueue implementation, but I wanted to push it a little further, because it still wasn’t running all that great on my dual-core 13″ MacBook Pro.

I did not yet integrate this into Capo, as I only just finished writing up the test code (it’s not hard to move over), but what I did notice is that OpenCL is scheduling these work items using the low-priority Grand Central dispatch queue. This means that I will be playing very nicely with the rest of the system as this monstrous operation is happening—score one more win for OpenCL!

OpenCL Attempt 2—Vectorized Code

The Intel CPUs ship with decent vector units these days, and OpenCL lets you write vectorized code very easily. You can cut a loop into a quarter of the operations, and can work on 4 elements at once, simply by switching to the float4 data type, and playing around with indexes into your data arrays.

This was tricky to get working—maybe 3-4 hours of toying around and debugging the code before I realized I had a mathematical error (I was combining the result of a non-linear operator—oops!) contributing to garbled output. After I got the bugs worked out, I was getting a result of 14.1 seconds.

Absolutely incredible—I basically doubled my runtime by working with vectors.

OpenCL Non-Attempt—Running on the GPU

I’m not planning to ship code that runs on the GPU for this particular algorithm. The GPU is a dodgy thing to work with, and I’m dealing with an algorithm that runs much longer than you want to tie up the GPU for. For instance, I actually manage to completely lock up my system for a full minute as the algorithm runs.

Oh, that’s right—this takes a full minute to execute on a GeForce 8800GT. The type of algorithm I’m working with is far better suited to the memory layout of a general purpose computer, its caching strategy, etc.

Furthermore, there’s an issue of overhead here…

OpenCL Overhead

When you work with the CPU, you avoid all that overhead of moving data to/from the GPU. With some extra flags specified, you can tell OpenCL that you are supplying your own host memory pointer, and you wish to avoid the copying step.

In my testing experience, it takes almost no time to start up an OpenCL context, compile your OpenCL program, and set up your memory/parameters when you use the CPU. On the GPU, I was losing somewhere between 1-3 seconds for the round-trip.

Conclusion

Overall I’m extremely impressed with what OpenCL brings to the table. It’s really not that hard of an API to use (especially now that I have a Cocoa wrapper), and if you work at it, you can get some huge speed gains over a more “traditional” multi-core programming approach such as what you get using NSOperationQueue.

It’s not for everyone, for sure, but it’s going to make a lot of otherwise complex things easier to do.

I get asked about drawing waveforms from time to time. Over the years, I came to realize that this is a black art of sorts, and it requires a combination of some audio and drawing know-how on the Mac to get it right.

But first, a little story.

Once upon a time I used to write audio software for BeOS while I was in university. As almost every audio software author eventually does, I came to a point where I needed to render audio waveforms to the screen. I hacked up a straightforward drawing algorithm, and it worked well.

When I started working on a follow-on project, I decided to re-use the algorithm I wrote for the first application, but it didn’t work so well. The trouble is, when I originally wrote that algorithm, the audio clips in question were all very tiny—less than 2s. Now I was dealing with much longer clips (up to a few minutes, in practice), and the algorithm didn’t scale well at all.

Around this time, I interviewed with Sonic Foundry, with the hopes of joining the Vegas team. During my interview, I asked, “How do you guys draw waveforms on-screen for large audio clips, and so quickly!?”

“That’s proprietary information, sorry.”

At the time, I just figured the guys were just avoiding a long, drawn-out response. I coded this up myself, except for the fact that it wasn’t so fast—so it can’t be that difficult, right? Unfortunately, I got similar responses from other people I had asked afterwards.

Regardless of whether you’re new to audio, or you’ve been doing it for a while, you are aware that there aren’t too many books on the topic. Furthermore, you probably aren’t going to find too much in the way of detailed algorithms, or even pseudocode, to help you out.

I’m starting to realize that the reason is two-fold.

First off, there really aren’t a lot of people out there who need to draw audio waveforms (or large data sets, for that matter) to screen. Second, it’s really not all that hard once you think about it for a while.

Overview

Drawing waveforms boils down to a few major stages: acquisition, reduction, storage, and drawing.

For each of the stages, you have many implementation options, and you’ll choose the simplest one that’ll serve your application. I don’t know what your application is, so I’ll use Capo as the main example for this post, and throw around some hypothetical situations where necessary.

Early on, you have to set some priorities: Speed, Accuracy, and Display Quality. The order of those priorities will help you decide how to build your drawing algorithm, down to the individual stages.

In Capo, I wanted to make Display Quality the top priority, followed by Speed, and then Accuracy. Because Capo would never be used to do sample-precise edits, I could throw away a whole lot of data, and then make the waveform look as good as possible in a short time frame.

If I were writing an audio editor, my priorities might be Accuracy, followed by Speed, and then Display Quality. For a sequencer (like Garage Band), I’d choose Speed, Display Quality, then Accuracy, because you’re only viewing the audio at a high level, and it’s part of a larger group of parts. Make sense?

Once you have an idea of what you need, you will have a clear picture of how to proceed.

Acquisition

This is almost worth a post of its own. I like using the ExtAudioFile{Open,Seek,Read,Close} API set from AudioToolbox.framework to open various audio file formats, but you may choose a combo of AudioFile+AudioConverter (ExtAudioFile wraps these for you), or QuickTime’s APIs, or whatever else floats your boat.

Your decision of API to get the source data is entirely up to your application. You can’t extract movie audio with (Ext)AudioFile APIs, for instance, so they might not help much when writing a video editing UI. Alternatively, you may have your own proprietary format, or record short samples into memory, etc.

Given the above, I’m going to assume you’re working with a list of floating-point values representing the audio, because that’ll be helpful later on. Using ExtAudioFile, or an AudioConverter, make sure that your host format is set for floats, and you should be good.

When you’re pulling data from a file, keep in mind that it’s not going to be very quick, even on an SSD drive, thanks to format conversions. I’d advise doing all this work in an auxiliary thread, no matter how you get your audio, because it’ll keep your application responsive.

In Capo’s case, there is a separate thread that walks the entire audio file, doing the acquisition, reduction, and storage steps all at once. Because Display Quality and Performance were high on the priority list, the drawing step is done only when needed.

Reduction

Audio contains tons of delicious data. Unfortunately, when accuracy isn’t the top priority, it’s far too much data to be shown on the screen. With 44,100 samples/second, a second of audio would span ~17 30″ Cinema Displays if you displayed one sample value per each horizontal pixel.

If accuracy is your top priority, you’re still going to be throwing lots of data away most of the time, except when your user wants to maintain a 1:1 sample:pixel ratio (or, in some cases, I’ve seen a sample take up more than 1 pixel, for very fine editing). If you’re writing an editor, or some other application that needs high-detail access to the source data, you will have to re-run the reduction step as the user changes the zoom level. When the user wants to see 1:1 samples:pixels, you won’t throw anything away. When the user wishes to see 200:1 samples:pixels, you’ll throw away 199 samples for every pixel you’re displaying.

In the case of Capo, I chose to create an overview data set for the ‘maximum zoom’ level, and keep that on the heap (a 5 minute song should take ~1MB RAM). In my case, I chose a maximum resolution of 50 samples per pixel, and created a data set from that. As the user zooms out, I then sample the overview data set to get the lower-resolution versions of the data. Accuracy isn’t great, but it’s pretty fast.

Now, when I talk about “throwing away”, or “sampling” the data set, I’m not simply discarding data. In some cases, randomly choosing samples to include in the final output will work just fine. However, you may encounter some pretty annoying artifacts (missing transients, jumping peaks, etc) when you change zoom levels or resize the display. If Display Quality is low on your list—who cares?

If you do care, you have a few options. Within each “bin” of the original audio, you can take a min/max pair, just the maximum magnitude, or an average. I have found the maximum magnitude to work well for the majority of cases. Here’s an example of what I do in Capo (in pseudocode, of sorts):

// source_audio contains the raw sample data
// overview_waveform will be filled with the 'sampled' waveform data
// N is the 'bin size' determined by the current zoom level
for ( i = 0; i < sizeof(source_audio); i += N ) {
    overview_waveform[i/N] = take_max_value_of( &(source_audio[i]), N )
}

Once you have your reduced data set, then you can put it on the screen.

Display

Here's where you have the most leeway in your implementation. I use the Quartz API to do my drawing. I prefer the family of C CoreGraphics CG* calls, because they're portable to CoreAnimation/iPhone coding, the most feature-rich, and generally quicker than their Cocoa equivalents. I won't get into any alternatives here (e.g. OpenGL), to keep it simple.

If we stick with the Capo example, then we've chosen to use the maximum magnitude data to draw our waveform. By doing so, we can exploit the fact that the waveform is going to be symmetric along the X axis, and only create one half of the final waveform path using some CGAffineTransform magic.

In the past, developers would create waveforms in pixel buffers using a series of vertical lines to represent the magnitudes of the samples. I like to call this the "traditional waveform drawing". It's still used quite a bit today, and in some cases it works great (especially when showing very small waveforms, and pixels are scarce like in a multitrack audio editor).

Traditional Waveform

I personally prefer to utilize Quartz paths so that I get some nice anti-aliasing to the waveform edge. Because Capo features the waveform so prominently in the display, I wanted to ensure I got top-notch output. Quartz paths gave me that guarantee.

To build the half-path, we'll also be exploiting the fact that both CoreAudio and Quartz represent points using floating-point values. Sadly, this code is slightly less awesome in 64-bit mode, since CGFloats become doubles, and you have to convert the single-precision audio floats over to double-precision pixels. Luckily there are quick routines for that conversion in Accelerate.framework (A whole 'nother blog post, I know...).

<

p>

- (CGPathRef)giveMeAPath
{
    // Assume mAudioPoints is a float* with your audio points 
    // (with {sampleIndex,value} pairs), and mAudioPointCount 
    // contains the # of points in the buffer.

CGMutablePathRef path = CGPathCreateMutable();
CGPathAddLines( path, NULL, mAudioPoints, mAudioPointCount ); // magic!
return path;

}

<

p>

Because magnitudes are represented in the range [0,1], and we're using Quartz, we can build a transform that'll scale the waveform path to fit inside half the height of the view, and then append another transform that'll translate/scale the path so it's flipped upside-down, and appears below the X axis line (which corresponds to a sample value of 0.0). Here's a zoomed in example of what I'm talking about.

Flipped Waveform

And here's some code to give you an idea of what's going on to create the whole path:

// Get the overview waveform data (taking into account the level of detail to
// create the reduced data set)
CGPathRef halfPath = [waveform giveMeAPath];

// Build the destination path
CGMutablePathRef path = CGPathCreateMutable();

// Transform to fit the waveform ([0,1] range) into the vertical space 
// ([halfHeight,height] range)
double halfHeight = floor( NSHeight( self.bounds ) / 2.0 );
CGAffineTransform xf = CGAffineTransformIdentity;
xf = CGAffineTransformTranslate( xf, 0.0, halfHeight );
xf = CGAffineTransformScale( xf, 1.0, halfHeight );

// Add the transformed path to the destination path
CGPathAddPath( path, &xf, halfPath );

// Transform to fit the waveform ([0,1] range) into the vertical space
// ([0,halfHeight] range), flipping the Y axis
xf = CGAffineTransformIdentity;
xf = CGAffineTransformTranslate( xf, 0.0, halfHeight );
xf = CGAffineTransformScale( xf, 1.0, -halfHeight );

// Add the transformed path to the destination path
CGPathAddPath( path, &xf, halfPath );

CGPathRelease( halfPath ); // clean up!

// Now, path contains the full waveform path.

Once you have this path, you have a bunch of options for drawing it. For instance, you could fill the path with a solid color, turn the path into a mask and draw a gradient (that's how Capo does it), etc.

Keep in mind, though, that a complex path with lots of points can be slow to draw. Be certain that you don't include more data points in your path than there are horizontal pixels on the screen—they won't be visible, anyway. If necessary, draw in a separate thread to an image, or use CoreAnimation to ensure your drawing happens asynchronously.

Use Shark/Instruments to help you decide whether this needs to be done—it's complicated work, and tough code to get working correctly with very few drawing artefacts. You don't even want to know the crazy code I had to get working in TapeDeck to have chunks of the waveform paged onto the screen. (Well, you might, but that's proprietary information, sorry. ;))

In Conclusion

People have suggested to me in the past that Apple should step up and hand us an API that would give waveform-drawing facilities (and graphs, too!). I disagree, and if Apple were to ever do this, I'd probably never use it. There are simply far too many application-specific design decisions that go into creating a waveform display engine, and whatever Apple would offer would probably only cover a small handful of use cases.

Hopefully the above information can help you build a waveform algorithm that suits your application well. I think that by breaking the problem up into separate sub-problems, you can build a solution that'll work best for your needs.

Over the past few days, I spent more time than should have been necessary trying to drag songs from iTunes to my application’s dock icon. There is already code out there to help folks handle a drag from iTunes to a custom NSView, but nothing has ever been said about handling a drag from iTunes to your application’s dock icon.

One might think that because you can drag a song out of iTunes into the Finder, and have the file copied there, that you can simply publish support for the public.audio UTI and everything will work fine. Of course, life is not so simple.

A drag out of iTunes puts a few different flavours of data onto the pasteboard—none of which appear to be natively accepted by the dock. The most intriguing of these data items is the one with the ‘itun’ OSType. It is an XML property list that can be stuffed into an NSDictionary and then read from—this is how folks currently access the song’s location in their drag handlers (see the code link above).

Now, in order to support dragging a song from iTunes to your application’s dock icon, life gets somewhat more complicated…

First of all, to handle a pasteboard drag to your application, you must expose a service in your application’s Info.plist. Check out Will Larson’s blog post about handling text drags for more information about how to do this. I started from this point.

One might make the immediate conclusion that you can simply add itun or CorePasteboardFlavorType 0x6974756E to your list of NSSendTypes in your service definition. I did, and I was wrong—neither of these two things will cause your dock icon to accept the drag.

So, after a lengthy discussion with some other developers, I determined a bittersweet workaround. In order for this to work properly, I need you to promise me that you will do exactly as I prescribe in order to accept iTunes dock drags. So, pay close attention.

Because NSSendTypes accepts a list of NSPboardTypes or UTIs, and not OSTypes, we will have to wrap the OSType in a UTI. Unfortunately, we can’t all go around wrapping itun in different UTIs, because this will not work. The first UTI to claim itun will win out over the others, and only one application will accept these dragged songs from iTunes.

So, the ideal situation would be to wait for Apple to expose a UTI from the iTunes Info.plist, but then we would all die holding our breath. Instead, I’m asking you to define this small chunk of the Info.plist for your application.

<key>UTImportedTypeDeclarations</key> <array>     <dict>         <key>UTTypeConformsTo</key>         <array>             <string>public.data</string>         </array>         <key>UTTypeIdentifier</key>         <string>org.liscio.itun</string>         <key>UTTypeTagSpecification</key>         <dict>             <key>com.apple.ostype</key>             <string>itun</string>         </dict>     </dict> </array>

Yep, I’ve gone and named it after myself—org.liscio.itun. That way, you all know where it came from. I can’t go around writing into the public.* domain, or the com.apple.* domain, so I didn’t. Using org.* instead seemed more community-oriented. :)

After you’ve imported this type, add org.liscio.itun to your list of NSSendTypes, and you’re done. Your application should now accept song drops from iTunes. (See below for some troubleshooting tips.)

Note that you’re importing this type definition in your plist, and not exporting it. In fact, all my apps will be importing it as well. Nobody should really own it, as it doesn’t belong to any of us. If Apple ever does decide to add a UTI to wrap the ‘itun’ type, then we’ll all have to change our imported type definition accordingly. I’m OK with that, and it’s really a simple thing to fix…

I hope this helps you folks give your users a better experience in your apps, and I certainly don’t mind a “digital high-five” in your About box if it’s helped you as well. You can even drop me a line to let me know that you’re going to use it. This also serves the purpose in my potentially letting you know if Apple does actually add this UTI, and it’s time to remove my UTI from your plist. ;)

Troubleshooting

I don’t expect this to happen to most of you if you get it right from the get-go, but I figured I might as well add this just in case…

The above change to your Info.plist may not take effect right away. You might have to kick the services system using NSUpdateDynamicServices() (via Ruby or Python is best), quit iTunes, or use some magic incantation of lsregister at the terminal. Sometimes you might even have to reboot and try again. This is a part of life when you’re messing around with UTIs and services during development, unfortunately.

If you’re still having trouble, first try to make sure you can register a simple service using the NSStringPboardType, and that you get called in your app delegate first. Don’t even think of asking me for help until you’ve verified this… ;)

Update: 2009/02/24

I filed a radar, rdar://6616686 for this issue, so it’s tracked appropriately by Apple. I also added it to OpenRadar for all to see: http://openrdar.com/6616686

I’m pretty stoked about releasing TapeDeck 1.0.1 today. It contains some huge audio code overhaulin’ that I started about a week before we shipped 1.0…

Just before we shipped TapeDeck, I noticed there was some utterly horrible skipping during audio playback. I attributed the skipping to a small jump in our CPU usage, right after adding the new Core Animation-powered HUD. It was so bad that TapeDeck was nearly unusable. This caused me to spend a 12h day rewriting our audio playback code (in a branch!).

The new code followed Apple’s recommendation for high-performance glitch-free audio playback. So I created a separate high-priority feeder thread reading chunks of a file’s audio data into a ring buffer that was consumed on the high-priority AU render thread.

I was about 80% done this rewrite when I realized that it was a different bug that caused this unbearable skipping to occur. I don’t recall the specifics, but I think I was re-seeking the file accidentally when some unintentional extra KVO updates got triggered. It was caused by the new HUD changes I checked in—are you surprised? :P

Content with the performance of the audio engine on my test systems (an 8-core Mac Pro, and a PowerBook G4 1.5), we shipped TapeDeck. Unfortunately, it still wasn’t perfect. I got reports that recent MacBook Pro machines were also experiencing the skipping, and some more heavy-duty testing on my PowerBook G4 was causing the skips to show up more often.

Well it was a good thing I kept the rewritten audio engine in a branch! I merged that branch with what we released in 1.0, and completed the last 20% of the rewrite. TapeDeck 1.0.1 contains this new, more efficient playback engine which should take up slightly less CPU time, and skip a lot less frequently (never say never!).

Unfortunately, I can’t say that the skips in playback are completely gone on my PowerBook G4 when you heavily load the system (I was able to get a few skips while doing compiles in Xcode), but they’re greatly reduced. On the other hand, I am unable to get TapeDeck 1.0.1 to skip on my wife’s MacBook Pro, even when pushing it pretty hard by browsing our very large collection of RAW photos in iPhoto—an operation that brought TapeDeck 1.0 to tears.

Anyway, this was the fix that required the largest effort in TapeDeck 1.0.1. I hope it solves the skipping problem completely for our users!