Math and Cocoa

A few years ago when I was working on FuzzMeasure, I found myself in need of a math library to work with large data sets, but I didn't want to deal with building C++ classes to fill out my framework. Ideally, I'd build something that fit in with Cocoa as much as possible.

Why not C++?

Well, I really dislike C++, for starters. It has a whole lot of fancy OO features, but it's an awful lot of rope to hang yourself with. Every time I've worked with C++, I've hated it. If I'm coding for fun, I'll code with a language I like, thanks.

But Objective-C is missing some crucial things, right?

Syntactic sugar. You couldn't write aVector + bVector to add two vectors together using Objective-C, as there is no operator overloading (not a bad thing, as you'll see below).
Templates. Yes, I actually wrote separate SMUGRealVector, SMUGDoubleVector, and SMUGComplexVector classes.
Speed. Maybe, but I'm not so sure the win for C++ is cut & dry.

After using this library for over 4 years, I haven't found these missing "features" to be a problem. In fact, I find the Objective-C language to be far better, in the long run, because I've gained these features (among others):

Extensibility. You can build specific features onto existing classes, which is especially powerful if you're consuming the class as part of a framework. (i.e. You don't own it.)
Readibility. The code's more verbose, but you can read the code again in a few years without much trouble.
Frameworks. Cocoa is amazing. It's chock full of extremely useful classes that basically do what I want, anyway.

When I talk about readability above, I'm especially poking at overloading operators in C++. How do you know what v * C does, just by looking at it? Does it scale v by a constant? How can you be sure at first glance? Compare with: [v multiplyBy:C] or [v scaleBy:C] for vector multiplication, and scaling.

Keeping it Cocoa

I started out by naïvely considering NSArrays of NSNumbers, but that fell over quickly. I wanted to utilize Cocoa: but not too much. I also wanted it to be fast (or give me an opportunity to opimize it easily later).

I had to operate primarily on long vectors, so I could carry over mountains of MATLAB code that are built with vectors in mind. Take a signal, stuff it in a vector, do FFTs on the vector, normalize it, multiply with another vector, work in the frequency domain, etc.

Well, it doesn't get much faster than arrays of floats—big ol' blobs of memory on your system—to deal with giant data sets like these. So, how do we get arrays of floats in a Cocoa-like way?

Storing Vectors In Memory

@interface SMUGRealVector : NSObject <NSCopying,NSCoding> {
    NSMutableData *mData;
}
// ...lots of stuff

That's right—there's not much to the classes. Just a blob of NSMutableData, which is a giant step up from a naked pointer to a blob of floats. NSMutableData gives us so much:

Operations on, and extraction of, ranges of data.
Simple increase/decrease operations on its size, even appending other blobs of data.
File serialization routines!
And more, of course…

Yes, I'm competent enough to write the above routines myself, but I know better than that. So how do we build these vectors?

Vector Construction

- (id)initWithLength:(unsigned int)N;
{
    if ( !( self = [super init] ) ) {
        return nil;
    }
    mData = [[NSMutableData alloc] 
        initWithLength:(N*sizeof(float))];
    if ( !mData ) {
        return nil;
    }
    return self;
}

// And a whole whack of convenience functions...
+ (id)realVectorWithLength:(unsigned int)N;
+ (id)realVectorWithOnes:(unsigned int)N;
+ (id)realVectorWithIntegersRangingFrom:(int)begin to:(int)end;

Looks good, right? Some of you might recognize some MATLAB idioms in there, such as ones(), or replicating [1:5], which returns [1 2 3 4 5]. Great, what else can we do?

Accessing Vectors

To get at the bits & pieces of vectors, you have a few options:

- (float*)components;
- (unsigned int)length;
- (void)setComponent:(float)c atIndex:(unsigned int)i;
- (float)componentAtIndex:(unsigned int)i;

They do what you'd expect, wrapping the matching NSData routines in the case of component, and length. You can also operate on ranges of vectors:

- (SMUGRealVector*)realVectorInRange:(NSRange)range;
- (void)appendVector:(SMUGRealVector*)v
- (void)replaceComponentsInRange:(NSRange)range
    withRealVector:(SMUGRealVector*)v;

You can actually build a 'vector queue' of sorts, by using the routines above. In one case, you can build a standard queue by appending vectors on one end, and extracting a sub-range from the other end—discarding the original, longer vector as you go. This is certainly memory-intensive, but these are extremely handy to bootstrap some signal-processing algorithms (overlap-add and overlap-save, for instance).

(If you wanted to optimize overlap-add, and overlap-save, you would instead build a large circular buffer, of sorts, and just use a combo of replaceComponentsInRange: and realVectorInRange: to do your bidding, but I digress…)

Vector Math

This is where I think my math library really kicks ass. I built with simplicity in mind, but I also wanted to ensure I could take advantage of vecLib/vDSP.h as much as possible, because I think it's an underused API from Apple.

Here are a few choice routines:

- (void)square;
{
    vDSP_vsq( [self components], 1, [self components], 1, 
        [self length] );
}
- (void)multiplyBy:(SMUGRealVector*)x
{
    NSParameterAssert( [self length] == [x length] );
    vDSP_vmul( [self components], 1, [x components], 1, 
        [self components], 1, [self length] );
}
- (void)scaleBy:(float)scalar;
{
    vDSP_vsmul( [self components], 1, &scalar, 
        [self components], 1, [self length] );
}
// and so on...

Isn't this great? These are one-liner routines thanks to vDSP!

(I actually chose not to use the FFT library from vecLib, for a few reasons. First of all, I find the way it stores complex values, and the DC/Nyquist components to be strange. Second, I encountered a (now fixed) bug long ago with FFT lengths > 128K.)

Memory vs Speed

When I designed the vector class, I had to keep in mind that there was a tradeoff between memory, and speed. For instance, to calculate, z(Cv+w), where C is a constant, and v,w,z are vectors, this is valid:

float *vc = [v components];
float *wc = [w components];
float *zc = [z components];
unsigned int len = [z length];
// Gotta be careful!
NSParameterAssert( len == [w length] && len == [v length] );
for ( unsigned int i = 0; i < len; i++ ) {
    zc[i] = zc[i] * ( ( vc[i] * C ) + wc[i] );
}

However, instead I prefer to write:

[v scaleBy:C];
[v add:w];
[z multiplyBy:v];

(Note that the results of the calcuations first stuck to v, and then to z. Thus, a copy of both v and w must be made in advance if you want to retain their values for later. This is something I have to think about constantly when using the library…)

In most cases, the latter way of writing the code turns out to be much faster. This is because vDSP is built to operate on large data sets very quickly, so it can often outperform hand-written loops in many cases. Furthermore, it's much easier to read and maintain this code!

However, there are certainly instances where you can do better than the canned routines, and this is where I think Objective-C really shines for this library.

Categories Rule

In specific projects, I have specific math needs. For instance, FuzzMeasure has specific categories for generating swept sine waves. So let's say we wrote a highly-tuned version of the loop above, and we wanted to operate directly on z (i.e. we didn't care about the original value of z). We build a category called SMUGRealVector (MyOperation), and define this routine:

- (void)myOperation;
{
    // Replicate the above routine, but replace 'z' with 'self'.
}

Then, when we want to use it in our source, we #import "SMUGRealVector_MyOperation.h", and then call it:

[z myOperation];

This isn't news to many veteran Objective-C coders, but I found it to be a great tool for building an extensible library for doing math on large vectors. Furthermore, it lets me slowly evolve my class into one that closely resembles the MATLAB built-in functions.

That way, when I come across signal processing algorithms described in MATLAB code, I can quite easily port them to work in Objective-C. Even better, I can easily go back & forth between my code, and Octave (the free MATLAB clone), comparing results for operations as I code these algorithms.

So, Now What?

I'd really like to share this math library, but there are some problems I need to resolve before I give it away.

I lied above. FuzzMeasure doesn't have a swept sine generation category, because I've not split it out of the main framework yet. I do use categories in the way I described, but much of the extensions that are neatly organized are considered secrets right now.
It's a mess. When I started writing it, I sucked at Cocoa/Objective-C. So there are many silly coding mistakes, and things I'd rather not share.
A big mess. There are also many other routines in the framework that have nothing to do with math at all.
I'm really busy. When I do this, it's going to take time, and effort that I just don't have to spare right now.
The name sucks. SMUGFoundation makes sense for a general group of classes, but I need to split this out into a new SMUGMath framework.
I don't know how to build it. Do I branch my own copy, pushing changes to the mainline repository once in a while? Or, do I just split the code out and give it away, not ever consuming changes made by the public? Both have caveats.
I can't afford to support it. You are a great friend of open-source, and will help me improve the library. But we both know I'm going to be getting tons of emails from randoms asking about lame Xcode link failures, and how to stick it into their iPhone project (which won't work right now, due to a lack of vDSP, for starters).

I'll do my best, though, because I wanted to share this framework for at least two years. I've only recently reached a point where I'm comfortable taking the steps.

If nothing else, I hope this post helps people come to a similar conclusion that I did, which is that Objective-C, and Cocoa, can be used as a part of very sophisticated processing frameworks. There's no reason to force yourself into straight C, or C++, to achieve this.

Sending messages to objects might be slower than C or C++, but once you get into the method implementation, that's where you can really rock out. I've branched this framework off to do all sorts of advanced signal processing, including using OpenCL to further accelerate operations on large vectors of data.

Trust me, it's fast enough.