The Code

First, you can download the code and follow along if you like.

The function accepts a number of parameters to control the gradient probing and stopping criteria, as well as provide the mixed signals. The mixed signals are represented as a row-wise matrix with each row representing a particular mix. Here is the breakdown from the help section of the function:

function [y, K] = ppursuit(h, eta, tol, mxi, x)

%

% [y, K] = ppursuit(h, eta, tol, mxi, x)

%

% Projection Pursuit (probing to estimate gradient).

% This function uses projection pursuit to demix m

% signal mixtures into m estimated source signals.

%

% h Step size for probing the Kurtosis (K) of the

% demixed signals, relative to the magnitude

% of the current demixing vector. IOW, K is

% probed by looking at K(w + h) in the m

% dimensions of w, one at a time. It should be

% noted that w is a unit vector, so there is

% no need to scale the step size by the current

% size of w.

% eta Distance that demixing vector w is adjusted

% in the direction of the estimated gradient of

% K. IOW, wnext = w + eta*g where g is the

% estimated gradient of K, normalized to 1.

% tol Stopping criterion for gradient ascent. The

% ascent terminates when the relative change in

% K is < tol. (Change in K divided by K).

% mxi Maximum iterations to execute on the ascent

% of K for each recovered signal.

% x (m x n) matrix of signal mixtures. Each row

% is a 1 x n mixture of m source signals. The

% source signals may be anything, and there is

% no assumed relationship between the n samples

% of a given mixture. They may be temporal or

% spatial samples.

% y The (m x n) source estimates

% K The (m x ?) history of the ascent of Kurtosis

% for each recovered source. Having this history

% allows the caller to tune the search parameters.

%

The algorithm itself is very straightforward.

- First, we find the PCA (Principal Component Analysis) to minimize de-mixing errors. Because our two original sound sources are independent, they are uncorrelated. Casting our data into an uncorrelated basis (which is what PCA does) we can easily extract signals as we find them without interfering with the remaining signals. If we did not do this, then we would have to find all the signals simultaneously (by a different technique) so that we could solve the "parallelogram" that they would make (obviously, I'm leaving things out...). So, we do this by taking the SVD (Singular Value Decomposition) of the mix matrix and throwing away "U" and "S". What remains is the mixes represented in an ortho-normal basis (this is a fairly trivial derivation).
- Now we iterate for each row in the mixes:
- Whiten the mixes by setting the variance to 1
- Select a starting vector for demixing. In the original project, we used a random unit-length vector. I've changed that to a standard basis vector, i, for illustrative purposes.
- Project the mixes onto the trial demixing vector and measure the kurtosis
- Iterate until our stopping criteria are met: that is, we reach the maximum number of iterations, or the change in kurtosis from one iteration to the next fall below our threshold
- Probe the gradient in each direction, measuring the kurtosis
- Assemble the results of the probe into a gradient vector
- Adjust the trial demixing vector one increment (eta) in the direction of the gradient vector. By moving a fraction of the gradient, we minimize the risk of overshooting the peak of the kurtosis. If we moved by the full gradient, a localized spike in the gradient could easily cause a large displacement of the demixing vector.

- At this point, the demixing vector now points in the direction of maximum kurtosis. We project our mixes onto this vector and store the results as the first unmixed signal
- Subtract the unmixed signal from the original mixes to allow further demixing

Please note that I used a stripped down kurtosis calculation which ignores the denominator of the typical calculation. We can do this because the mixes have been whitened so the variance (and thus the denominator of the kurtosis calculation) will be 1.

Results

So, to test this we need a mixed sound signal like this mix of a bird chirping and a gong ringing. This sound clip only plays one "mix" of which we have two in this file. It's a horrific sound...

After projecting the mixes into an ortho-normal basis, we can plot the two mixes against each other like this:

It's pretty clear that there are two distinct signals here, one fairly well defined in a sort of "north-by-north-east" orientation, and another in a "north-west" orientation. I've only plotted about 1 in 5 samples here, otherwise the image just becomes a big blob.

After the first iteration of the algorithm, finding the first signal, we can see its progress here:

In this image, the red line represents the starting point, the green lines represent the intermediate trial demixing vectors and the blue line is the final vector for the current signal. Notice how the trial demixing vectors probe quickly at first and slow down as we approach the peak of the kurtosis curve and the gradient begins to flatten.

This first iteration extracts a pretty clean bird chirp signal, although if you listen closely, you can hear just a touch of the gong as well.

In the second iteration, we can see the algorithm finding the second, more diffuse, signal:

Again, you can see the initially quick probing that slows as we approach the peak of kurtosis. What is not shown here is that the mixes have had the first signal removed. Because there are only two signals here, removing the first signal will project the remaining signal onto a vector, making the extraction of the second signal trivial. But it's still fun to watch it work. The second signal extacted is a rather awful gong sound. Again, you can still hear just a touch of the bird chirping in the background.

Here is the kurtosis history of each probe, using ppursuit(.001, .01, .00001, 2000, mix) as the function call.

The first signal, in blue, took 47 iterations to reach the stopping criteria. The second signal, in green took only about 17 iterations to stop (the fall off in the green curve is just an artifact of the way the data is stored). As you can see, both of the curves start off fairly steep and rapidly level out. Each curve represents the slop of the gradient as it is pursued uphill.

You can see some interesting effects by changing the parameters to the function. It is possible, for example, to get the algorithm to oscillate around the peak of kurtosis as each probe overshoots the peak.

The reason we have less than perfect demixed signals is because we can never hit the peak kurtosis perfectly. As we get closer to the peak, the magnitude of the gradient gets smaller and smaller, so that the demixing vector moves by smaller and smaller amounts. The smaller we set the "tol" parameter, the closer we can get, at the expense of more iterations.

So, in retrospect, this example mix is not so great because the demixed signals appear to be orthogonal. However, they aren't. The resulting demixing vectors are [0.7665,-0.644225], and [0.70813,0.70608]. Their dot-product is 0.087907. So, they are close, but not quite, orthogonal. This is important. One of the benefits of projection pursuit is that it can find signals that are not orthogonal. Other simple analysis techniques like PCA (Principal Component Analysis) rely on the signals being orthogonal to produce meaningful results.

Here is the raw signal, before projection onto the ortho-normal basis in the first steps of the algorith:

As you can see, the original signals are not orthogonal. There is no guarantee that the projection onto the ortho-normal basis will produce orthogonal, or near-orthogonal signals. That is just a characteristic of this particular data.

Thanks to Dr. Paul Schimpf for comments and suggestions about this posting.

For the next installation, I'll examine how to get started writing this in haskell.