Wednesday, September 30, 2009

Projection Pursuit in Haskell, pt 3


Sometimes problem solving requires just a lot of thought. And I don't mean brute force thought, but the gentle, patient thought that comes from relaxing about a problem and just letting your brain work on it for a while. I find I come up with some of my best ideas in the shower. Yeah, I know... But, when something has really been bothering me -- remaining unsolved despite my best efforts -- the solutions typically come at night. Something wakes me up and then I'm up and my mind won't stop and then the solution comes.

Such it was last night. I haven't had a lot of time to put into this effort lately, and what effort I've made has been frustrating. If you look at the original Octave source code you'll see that there are a lot of parameters controlling the algorithm. We've got the raw data itself, the size of the gradient probe, the proportion of the gradient to use, the number of iterations. It's a lot of stuff to cart around. But, I'll get to that in a moment.

First, I had spent a lot of time writing some matrix and vector operators. This was a good exercise, though I've ultimately thrown all that code out. One crucial part of this algorithm is the use of the right singular vectors from the singular value decomposition. I certainly didn't want to write an svd function for this exercise. Google to the rescue! I found hmatrix, and a solution to all my matrix handling problems. So, using hmatrix caused me to dig back through the bits of code I'd already written. At the core of the projection pursuit algorithm is the calculation of the gradient. This solution uses an estimation of the gradient by probing in all dimensions and measuring the change in the kurtosis.

So, let's define some helper functions:

{-- calculate modified kurtosis --}
kurtosis :: Vector Double -> Double
kurtosis y = mean(y) ** 4 - 3

{-- calculate the mean of a vector --}
mean :: Vector Double -> Double
mean y = go 0 0 $ toList y
go :: Double -> Int -> [Double] -> Double
go s l [] = s / fromIntegral l
go s l (x:xs) = go (s+x) (l+1) xs

{- create a list of vectors to probe h distance in all directions -}
hs :: Double -> Int -> [Vector Double]
hs h r = toRows . diag . fromList . take r $ repeat h

Using these, we can calculate the estimated gradient:

{-- calculate the gradient of a function given a bunch of starting information                           
f - the function to compute gradient of
h - the probe distance... the smaller the better
z - the raw data set needed for future projections
w - a starting vector/point at which to calculate the gradient --}

gradient :: (Vector Double -> Double) -> Double -> Matrix Double -> Vector Double -> Vector Double
gradient f h z w = probe ws
ws = map ((+) w) . hs h $ rows z
k = f $ w <> z

probe :: [Vector Double] -> Vector Double
probe = fromList . probe'

probe' :: [Vector Double] -> [Double]
probe' [] = []
probe' (x:xs) = (deltaK x) / h : (probe' $ xs)

deltaK :: Vector Double -> Double
deltaK x = k - f (x <> z)

This is really pretty straightforward. We calculate a starting kurtosis value, generate a bunch a vectors that shift h-amount in all dimensions, calculate the kurtosis for each of those and concatenate them into a new vector for the gradient.

The real breakthrough for me came when I was struggling with how ugly it was going to be to pass around all this junk. Just ugly and I can't stand it. So last night, while I was lying awake fretting about this, it finally hit me. I don't have to pass around all this stuff. I can just partially apply the gradient function to all the algorithm control parameters, and the raw data, and then just pass that around. So at some point I'll do this:

{-- build a gradient function for *this* problem --}
grad :: Vector Double -> Vector Double
grad = gradient kurtosis h z

Now I can just pass around grad and it will carry around all the baggage for me. I know. This is a really simple part of haskell, and it's something that is done all the time. I've certainly done it many times. But for some reason it eluded me for a really long time. The real problem was one of not thinking about this function properly. Yes, all that data is needed to calculate the gradient. But I am not calculating the gradient of all of this stuff as variables. The function, the raw data and the h distance are all fixed points within the context of the problem. That is, there is only one true variable in this application of the gradient function and that is the current position, w. All along I had been thinking about h and f and z as variables and thus I had to keep passing them around. And they are variables, within the context of the whole algorithm, but at the level of the gradient, they aren't. Huh. It took me along time to bend my brain around to seeing this the right way.

Anyway, there it is. Now that school is back in, I get stolen moments on the bus to work on this, so hopefully the rest will come soon.

The Summer of Chaos

So, time to reminisce about the summer just past. Wow! It's amazing how life throws you curve balls.

The Plan. Heading out of last spring, I was anticipating a summer of general-ed requirements to speed me on my way towards graduation, a little programming work on the side, research on grad-school programs and lots of time for personal projects to be posted here. This was to go alongside taking care of my three daughters and generally being "summery." Cool. Good plan. A little ambitious, but doable.

The Reality. Business took a nose-dive. So much so that I had to go back behind the bar for the summer to keep things afloat. We pared down to a skeleton staff with me working 4 shifts a week slinging beers. That's just what needed doing.

Meanwhile, I did get some side work, technically an internship, though I'm not bothering to take the educational credit, just the CV listing. I spent some time learning Python, flexing my Debian admin skills and playing around with EWU's Living Laboratory project under Dr. Jeff Putnam. It was a fun project. I got pretty comfortable with Python, the basics of Postgresql, and Django. Also, I got to play with specifying and implementing the simplest of Domain Specific Languages to allow user-entered code to run on our server. That project is ongoing, though I've got precious little time for it now that school is back in. Hopefully, I'll get to extend (or create anew) that language for specifying graphs as well.

So, personal projects, grad-school research? All by the wayside. My plans to port my Projection Pursuit algorithm to Haskell got pushed back. A *ton* of reading (Computational Category Theory, A Transition to Advanced Mathematics, Goedel, Escher, Bach, etc.) postponed indefinitely. Grad school selection is only getting started now, and that is such a daunting task! Dr. Putnam suggests I find blogs I like, follow all of their blog links and assemble this graph of blogs and schools and then run a clustering algorithm looking for commonality and hope that helps me narrow down the choices. Hah! That would probably work really well, but given the time available, would take until long past the application deadlines. Plus, I have to prep for the GRE's in a week! Wah!

The Results. Well, what can I say? It is what it is, and I keep marching ahead. Things never go quite the way you expect, but in the end, they work out. I think. I'm getting that Projection Pursuit done. I'm keeping up with my studies. I'm gradually narrowing down the choices for grad-school applications. And somehow I'm still enjoying my family and keeping food on the table. Not bad :-)

Saturday, August 22, 2009

Projection Pursuit in Haskell, pt 2

In this installment, I'll review my original projection pursuit algorithm in Octave and provide some pretty pictures and sounds :)
The Code
First, you can download the code and follow along if you like.

The function accepts a number of parameters to control the gradient probing and stopping criteria, as well as provide the mixed signals. The mixed signals are represented as a row-wise matrix with each row representing a particular mix. Here is the breakdown from the help section of the function:
function [y, K] = ppursuit(h, eta, tol, mxi, x)
% [y, K] = ppursuit(h, eta, tol, mxi, x)
% Projection Pursuit (probing to estimate gradient).
% This function uses projection pursuit to demix m
% signal mixtures into m estimated source signals.
% h Step size for probing the Kurtosis (K) of the
% demixed signals, relative to the magnitude
% of the current demixing vector. IOW, K is
% probed by looking at K(w + h) in the m
% dimensions of w, one at a time. It should be
% noted that w is a unit vector, so there is
% no need to scale the step size by the current
% size of w.
% eta Distance that demixing vector w is adjusted
% in the direction of the estimated gradient of
% K. IOW, wnext = w + eta*g where g is the
% estimated gradient of K, normalized to 1.
% tol Stopping criterion for gradient ascent. The
% ascent terminates when the relative change in
% K is < tol. (Change in K divided by K).
% mxi Maximum iterations to execute on the ascent
% of K for each recovered signal.
% x (m x n) matrix of signal mixtures. Each row
% is a 1 x n mixture of m source signals. The
% source signals may be anything, and there is
% no assumed relationship between the n samples
% of a given mixture. They may be temporal or
% spatial samples.
% y The (m x n) source estimates
% K The (m x ?) history of the ascent of Kurtosis
% for each recovered source. Having this history
% allows the caller to tune the search parameters.

The algorithm itself is very straightforward.

  • First, we find the PCA (Principal Component Analysis) to minimize de-mixing errors. Because our two original sound sources are independent, they are uncorrelated. Casting our data into an uncorrelated basis (which is what PCA does) we can easily extract signals as we find them without interfering with the remaining signals. If we did not do this, then we would have to find all the signals simultaneously (by a different technique) so that we could solve the "parallelogram" that they would make (obviously, I'm leaving things out...). So, we do this by taking the SVD (Singular Value Decomposition) of the mix matrix and throwing away "U" and "S". What remains is the mixes represented in an ortho-normal basis (this is a fairly trivial derivation).

  • Now we iterate for each row in the mixes:

    • Whiten the mixes by setting the variance to 1

    • Select a starting vector for demixing. In the original project, we used a random unit-length vector. I've changed that to a standard basis vector, i, for illustrative purposes.

    • Project the mixes onto the trial demixing vector and measure the kurtosis

    • Iterate until our stopping criteria are met: that is, we reach the maximum number of iterations, or the change in kurtosis from one iteration to the next fall below our threshold

      • Probe the gradient in each direction, measuring the kurtosis

      • Assemble the results of the probe into a gradient vector

      • Adjust the trial demixing vector one increment (eta) in the direction of the gradient vector. By moving a fraction of the gradient, we minimize the risk of overshooting the peak of the kurtosis. If we moved by the full gradient, a localized spike in the gradient could easily cause a large displacement of the demixing vector.

    • At this point, the demixing vector now points in the direction of maximum kurtosis. We project our mixes onto this vector and store the results as the first unmixed signal

    • Subtract the unmixed signal from the original mixes to allow further demixing

Please note that I used a stripped down kurtosis calculation which ignores the denominator of the typical calculation. We can do this because the mixes have been whitened so the variance (and thus the denominator of the kurtosis calculation) will be 1.
So, to test this we need a mixed sound signal like this mix of a bird chirping and a gong ringing. This sound clip only plays one "mix" of which we have two in this file. It's a horrific sound...

After projecting the mixes into an ortho-normal basis, we can plot the two mixes against each other like this:

It's pretty clear that there are two distinct signals here, one fairly well defined in a sort of "north-by-north-east" orientation, and another in a "north-west" orientation. I've only plotted about 1 in 5 samples here, otherwise the image just becomes a big blob.

After the first iteration of the algorithm, finding the first signal, we can see its progress here:

In this image, the red line represents the starting point, the green lines represent the intermediate trial demixing vectors and the blue line is the final vector for the current signal. Notice how the trial demixing vectors probe quickly at first and slow down as we approach the peak of the kurtosis curve and the gradient begins to flatten.

This first iteration extracts a pretty clean bird chirp signal, although if you listen closely, you can hear just a touch of the gong as well.

In the second iteration, we can see the algorithm finding the second, more diffuse, signal:

Again, you can see the initially quick probing that slows as we approach the peak of kurtosis. What is not shown here is that the mixes have had the first signal removed. Because there are only two signals here, removing the first signal will project the remaining signal onto a vector, making the extraction of the second signal trivial. But it's still fun to watch it work. The second signal extacted is a rather awful gong sound. Again, you can still hear just a touch of the bird chirping in the background.

Here is the kurtosis history of each probe, using ppursuit(.001, .01, .00001, 2000, mix) as the function call.

The first signal, in blue, took 47 iterations to reach the stopping criteria. The second signal, in green took only about 17 iterations to stop (the fall off in the green curve is just an artifact of the way the data is stored). As you can see, both of the curves start off fairly steep and rapidly level out. Each curve represents the slop of the gradient as it is pursued uphill.

You can see some interesting effects by changing the parameters to the function. It is possible, for example, to get the algorithm to oscillate around the peak of kurtosis as each probe overshoots the peak.

The reason we have less than perfect demixed signals is because we can never hit the peak kurtosis perfectly. As we get closer to the peak, the magnitude of the gradient gets smaller and smaller, so that the demixing vector moves by smaller and smaller amounts. The smaller we set the "tol" parameter, the closer we can get, at the expense of more iterations.

So, in retrospect, this example mix is not so great because the demixed signals appear to be orthogonal. However, they aren't. The resulting demixing vectors are [0.7665,-0.644225], and [0.70813,0.70608]. Their dot-product is 0.087907. So, they are close, but not quite, orthogonal. This is important. One of the benefits of projection pursuit is that it can find signals that are not orthogonal. Other simple analysis techniques like PCA (Principal Component Analysis) rely on the signals being orthogonal to produce meaningful results.

Here is the raw signal, before projection onto the ortho-normal basis in the first steps of the algorith:

As you can see, the original signals are not orthogonal. There is no guarantee that the projection onto the ortho-normal basis will produce orthogonal, or near-orthogonal signals. That is just a characteristic of this particular data.

Thanks to Dr. Paul Schimpf for comments and suggestions about this posting.

For the next installation, I'll examine how to get started writing this in haskell.

Friday, August 21, 2009

Projection Pursuit in Haskell, pt 1

In the Spring Quarter, I took a class called Introduction to Scientific Computing. It was sort of a survey of scientific computing techniques focused on using Octave.

Our final project was to implement a Projection Pursuit algorithm to unmix two or more mixed sound signals using the same number of mixes as there were signals. The idea is simple: Given, say, two speakers and two microphones, all at different locations, if the two speakers each play different sounds then the two microphones will detect different mixes of those two sounds. Our job was to take these two mixes and "unmix" them. For simplicity's sake, we are ignoring the time differential that would occur between the different mixes.

To do this using Projection Pursuit, we project the two mixes onto a random vector in 2 dimensional space. Then we measure the kurtosis of the projected signal. Next, we do a gradient ascent, adjusting our vector to climb uphill in the direction of the greatest kurtosis. As we approach the peak, the change in kurtosis gets smaller and smaller. When we satisfy our stopping criteria (the change in kurtosis falls below some threshold) we consider a signal to be "found". Then we extract the signal that lies on that vector and we have our first unmixed sound. Next we remove that signal from the two mixes and what remains is the other sound. This works because sounds tend to be super-gaussian signals, so the kurtosis will climb fairly quickly as you search.

I have to say, it was a really cool project and sort of "magical" when the extracted sounds started playing.

My goal is to port this project to Haskell. Projection Pursuit can be implemented, I believe, in purely functional code as fairly straightforward recursion. Now there are certain niceties to doing the original in Octave: native matrix and vector manipulation, simple graphing capabilities (so you can watch it "pursue"), and fairly easy sound playing to listen to the results. I probably won't get to coding up the graphing and sound in Haskell.

I completed some portions of the matrix manipulation functions earlier in the summer, but it's not ready to post yet, and I want to write up my progress in this process.

Next time, I'll post my Octave code, and provide some graphics of the pursuit. It's pretty cool.

Thursday, July 2, 2009

How do you TODO?

Some time back there was an ask proggit looking for ideas for handling TODO's. Someone posted a simple bash script that handled a TODO file in the local directory. Now, I'm always looking for little things to do in Haskell as learning exercises and this one struck me. So, here it is, my TODO program. It actually has a neat bonus feature in that it doesn't display TODO items that are marked with a "DONE". Someday I need to add a date/time stamp, but it's good enough for now.
module Main( main ) where

import System ( getArgs, system, exitWith )
import System.Console.GetOpt
import System.Posix.Files

main:: IO()
main = do
  args <- getArgs

  case getOpt RequireOrder options args of
    ([Edit], _,       []) -> spawnEditor 
    ([Help], _,       _)  -> putStrLn \$ usageInfo header options ++ description
    ([],     [],      []) -> displayTodo 
    ([],     nonOpts, []) -> addTodo nonOpts
    ([], [],  msgs)       -> error \$ concat msgs ++ usageInfo header options ++ description
    _                     -> error \$ usageInfo header options ++ description

data Flag = Edit | Help

options :: [OptDescr Flag]
options = [ 
 Option ['e'] ["edit"] (NoArg Edit) "edit the todo list",
 Option ['h'] ["help"] (NoArg Help) "display this help"

header :: String
header = "Usage: todo [OPTIONS] [todo item]"

description :: String
description = unlines [ ""
                      , "The default operation is to just cat the TODO file. Calling todo with"
                      , "a non-option argument will add that argument to the TODO file."

todoFile :: String
todoFile = "TODO"

-- TODO -- needs to check for existing emacs process first and run emacsclient in that case...
spawnEditor :: IO ()
spawnEditor = system ("emacs " ++ todoFile) >>= exitWith

addTodo :: [String] -> IO ()
addTodo nonOpts = do
  print "Updating todo list..."
  appendFile todoFile \$ (unwords nonOpts) ++ "\n"

displayTodo :: IO ()
displayTodo =  fileExist todoFile >>= 
               (\a -> if a 
                      then readFile todoFile >>= putStr . removeDones
                      else putStrLn "nothing to do!") 

removeDones :: String -> String
removeDones = unlines . filter (not . contains "DONE") . lines  

contains :: String -> String -> Bool
contains target source = any (== target) \$ words source

It's not the greatest Haskell code I'm sure, but its the first I've done that wasn't pure functional code, it works, and was fun to sort out.

ps. Looks like my attempts at formatting code were less than fully successful. It's all there, just click and drag to select it... sorry

Tuesday, April 14, 2009

Linear linear everywhere....

We had the distinct pleasure of being lectured by Dr. Ron Gentle last week as our regular Calc IV(multi-variable) professor was out of town for two days. Dr. Gentle is known for viewing the entire universe as a linear construct of some sort or another and the subject matter for his lectures, multi-variable chain rule differentiation, was no exception, being distilled down to a simple vector multiplication for any possible type of differentiation out there. I'll probably talk more about that later, but the main point is that Dr. Gentle is right, linear algebra is *everywhere*.

A case in point was today's lecture (from the usual guy, Dr. Dale Garraway) on directional derivatives. To make a long story short, we have already worked through partial derivatives where given $f(x,y)=x^2+y^3$, the first partial derivative in the $x$ direction is $f_x(x,y)=2x$ and the first partial in the $y$ direction is $f_y(x,y)=3y^2$.

Great, but what if you want to know the derivative in some other direction, like, say, in the direction of the vector $\vec{v}=(a,b)$? The general answer is this: $f_{\vec{v}}(x,y)=f_x(x,y)a + f_y(x,y)b$. The astute reader will notice that this is just a linear combination of the two first derivatives. Could it be that the first derivatives just form a basis for a function space defining all the derivatives of this multi-variable function? It sure looks like it.

In fact it looks like essentially a basis conversion from a spatial basis (the vector in $\mathcal{R}^2$ or something like that) to a functional basis giving the derivative. Neat.

Sunday, March 29, 2009

Polynomials with Function Composition

This morning, reading in Computational Category Theory, by Rydeheard et al, I came across this little piece of ML (it's pretty early in the book. It takes me about 45 minutes to read a page, sheesh):

fun poly_eval(f) = f(f(3)) + f(3) + 3

And I got to thinking about how this looks like a linear combination of function compositions. I just finished Linear Algebra so everything looks linear. One of the things I've been trying to wrap my head around is how to express function composition in a linear manner, and while this posting won't address that directly, it may help me down that road.

The question today is how to express polynomials as composition, as in the code above.

Given a standard form for a degree two polynomial: $f(x)=a_1x^2+a_2x+a_3$, let's replace every occurrence of $x$ in the polynomial with a function that multiplies $x$ by its argument. Composing this function with itself would multiply the value of $x$ by something multiplied by the value of $x$.

Sort of like this:

let $f=\lambda a. ax$ where $a$ is the coefficient of the polynomial term. Here $x$ is a free variable and if we provide a coefficient $c$, the result is $cx$. If we compose these functions we can expand them and see what happens.
$\begin{eqnarray} f(f) & = & f (\lambda a. ax)\\& = & (\lambda b. bx)(\lambda\\\end{eqnarray}$

now apply a constant $c$ as our coefficient, and evaluate
$\begin{eqnarray} f(f c) & = & (\lambda b. bx)(\lambda c\\& = & (\lambda b.bx)(cx)\\& = & (cx)x\\& = & cx^2\end{eqnarray}$

and there we have our first term in our polynomial (I'm not up on $\lambda$-notation and such stuff, so please, someone correct me if I've made a mistake in the above).

We can express this in code (thanks to dolio on #haskell) by adopting the convention that $f$ will accept a function and a value where the value will become our free variable (using partial application) and the function will either be other $f$'s (composition) or $const$ $c$, the coefficient.

f::(Num a) => (a -> a) -> a -> a
f g x = g x * x

and these can be composed as many times as we like to get powers of $x$.

*Main> let two_x2 = f(f(const 2))
*Main> :t two_x2
two_x2 :: Integer -> Integer
*Main> two_x2 5

Unfortunately, this forces us to declare the coefficient in advance and then we're stuck with it. It would be nice to be able to decide the coefficient later and still have $x$ as a free variable.

We can do it like this:

x1 = \c -> f(const c)
x2 = \c -> f(f(const c))

Now both $x1$ and $x2$ (for $x$ and $x^2$) can be used to express a polynomial term with an arbitrary coefficient.

*Main> :t x2
x2 :: Integer -> Integer -> Integer
*Main> let five_x2 = x2 5
*Main> :t five_x2
five_x2 :: Integer -> Integer
*Main> five_x2 4
*Main> five_x2 2

Finally, it is simple to build arbitrary polynomials with a free variable

poly = \a1 a2 a3 x -> x2 a1 x + x1 a2 x + a3

where $a1, a2, a3$ are the coefficients. Using partial application we can create a specific polynomial with its coefficients and then apply it to any value for $x$ that we like.

*Main> let polyA = poly 1 2 3
*Main> :t polyA
polyA :: Integer -> Integer
*Main> polyA 5

Here, $polyA(x) = x^2 + 2x + 3$. There we go, arbitrary polynomial expression using function composition. I suppose this is pretty simple stuff, but I find it incredibly fascinating. I'm hoping this will have some application to my thoughts on linear transformations of functions with composition.

Sunday, March 22, 2009

Partial Fraction Decomposition with Linear Algebra

The first semester of linear algebra is an eye-opening lesson in the applicability of mathematics and in the concepts of abstraction. Moving from linear transformations and operators on simple vectors in $\mathcal{R}^2$ or $\mathcal{R}^3$ to the same in general vector spaces is amazing. Extending these ideas into more abstract realms clearly shows the math in something like compilation of computer programs. What is compilation other than a transformation from one vector space (the source code) to another (the object code)? Wow!

Anyway, there were a number of interesting little problems that presented themselves near the end of the semester. One fellow student asked if we could consider partial fraction decomposition as a linear operation. The question was never adequately answered in class (merely due to a misunderstanding, $M^2$ still rocks!), but I think I've worked out one decent version of it.

First, to review, partial fraction decomposition is the process of turning a rational fraction into a sum of simpler, partial, rational fractions. That is, moving from
$\begin{eqnarray} \frac{4x+6}{x^3+6x^2+11x+6} \end{eqnarray}$

$\begin{eqnarray} \frac{1}{x+1}+\frac{2}{x+2}-\frac{3}{x+3} \end{eqnarray}$

In this example, we are using a relatively straightforward, easily factorable, denominator with a degree greater than that of the numerator. Honestly, I haven't, at this point, worked out the details for some of the other types of partial fraction decomposition.

The above decomposition is typically worked out by first factoring the denominator into distinct linear factors, setting up a sum of those factors with variables in the numerators, and multiplying by a common denominator. The result is an equation with the original numerator on the left and polynomial expressions on the right.

$\begin{eqnarray} \frac{4x+6}{x^3+6x^2+11x+6} & = & \frac{4x+6}{(x+1)(x+2)(x+3)}\\ & = & \frac{A}{x+1}+\frac{B}{x+2}+\frac{C}{x+3}\\ 4x+6 & = & A(x+2)(x+3)+B(x+1)(x+3)\\ & & +C(x+1)(x+2)\end{eqnarray}$

At this stage, we expand all the expressions on the right, regroup like terms and extract a set of equations representing the coefficients of like terms as follows

$\begin{eqnarray} 4x+6 & = & A(x^2+5x+6)+B(x^2+4x+3)+C(x^2+3x+2)\\ 4x+6 & = & (A+B+C)x^2+(5A+4B+3C)x+(6A+3B+2C)\\\end{eqnarray}$

$\begin{eqnarray} A+B+C & = & 0\\ 5A+4B+3C & = & 4\\ 6A+3B+2C & = & 6\end{eqnarray}$

Now we just solve the system of equations using any method we like to find the three values to use as numerators in the decomposition. It is easy to see a linear solution. We have a system of three equations with three unknowns. We could easily solve this using an augmented coefficient matrix and row reduction.

But there is a better way. Back up to this step.

$\begin{eqnarray} 4x+6=A(x^2+5x+6)+B(x^2+4x+3)+C(x^2+3x+2)\end{eqnarray}$

If you think about this for a minute, this looks rather like a representation of one expression, the polynomial on the left, in terms of a different set of polynomials on the right. And in fact, it is. Also note that the polynomials on the right are the expansions of each multiplicative combination (less one) of the linear factors of our original denominator.

We can look at this as a coordinate change operation. The expression on the right is a coordinate vector of the expression on the left in some basis other than the standard basis for $P$ (polynomials). If you don't see it yet, you will, so follow along.

One way to represent polynomials is as a vector whose elements represent the various terms in a standard polynomial. This is a basic linear transformation often used in linear algebra texts. It looks like this.

$\begin{eqnarray} & T : P^2\rightarrow\mathcal{R}^3\\ & T(f(x)) = \left[ \begin{array}{c} a_1\\ a_2\\ a_3 \end{array}\right]\end{eqnarray}$

$\begin{eqnarray} f(x) = a_1 + a_2x + a_3x^2\end{eqnarray}$

Using this common transformation, we can rewrite each polynomial in our equation above as a vector in $\mathcal{R}^3$.

$\begin{eqnarray} 4x+6 & = & \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] \\x^2+5x+6 & = & \left[ \begin{array}{c} 6\\5\\1 \end{array} \right] \\x^2+4x+3 & = & \left[ \begin{array}{c} 3\\4\\1 \end{array} \right] \\x^2+3x+2 & = & \left[ \begin{array}{c} 2\\3\\1 \end{array} \right]\end{eqnarray}$

And now it is quite clear that this is a basis change, expressing a vector in terms of a different basis.

$\begin{eqnarray} \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] = A\left[ \begin{array}{c} 6\\5\\1 \end{array} \right] + B\left[ \begin{array}{c} 3\\4\\1 \end{array} \right] + C\left[ \begin{array}{c} 2\\3\\1 \end{array} \right]\end{eqnarray}$

We can develop a coordinate change matrix $[T]_{\mathcal{B}}$ as the inverse of the concatenation of the vectors on the righthand side. In coordinate changes, we use the equation $[T]_{\mathcal{B}}\vec{v_{\mathcal{B}}}=\vec{v}$. With this we can easily solve this partial fraction decomposition thus

$\begin{eqnarray}& [T]_\mathcal{B} = \left[ \begin{array}{c c c} 6 & 3 & 2\\ 5 & 4 & 3\\ 1 & 1 & 1 \end{array} \right]^{-1} = \left[ \begin{array}{c c c} \frac{1}{2} & -\frac{1}{2} & \frac{1}{2}\\ -1 & 2 & -4\\ \frac{1}{2} & -\frac{3}{2} & \frac{9}{2} \end{array} \right] \\& \vec{v_{\mathcal{B}}} = \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] \\& [T]_\mathcal{B}\vec{v_{\mathcal{B}}} = \left[ \begin{array}{c c c} \frac{1}{2} & -\frac{1}{2} & \frac{1}{2}\\ -1 & 2 & -4\\ \frac{1}{2} & -\frac{3}{2} & \frac{9}{2} \end{array} \right] \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] = \left[ \begin{array}{c} 1\\2\\-3 \end{array} \right] = \left[ \begin{array}{c} A\\B\\C \end{array} \right]\end{eqnarray}$

and use the $A$, $B$, $C$ result to complete our decomposition as

$\begin{eqnarray} \frac{4x+6}{x^3+6x^2+11x+6} & = & \frac{1}{x+1}+\frac{2}{x+2}+\frac{-3}{x+3}\\\end{eqnarray}$

Well, this is certainly a lot of work for a single application. But, using $[T]_\mathcal{B}$, we can perform any partial fraction decomposition for a rational fraction with this denominator by simple matrix multiplication. In other words, we can repeat this operation, using a different numerator, easily and quickly by plugging in the vector representation of the new numerator above. This should work for any numerator with a degree equal or less than that of the denominator.

Likewise, being an invertible operation, we can add any decomposed fractions with these factors as denominators without the tedious polynomial expansion commonly used. We simply multiply $[T]_{\mathcal{B}}^{-1}$ by the vector representation of $A$, $B$, $C$ and it will spit out the vector form of the numerator. Cool!

This technique really only works, at this point, for rational fractions with easily factored denominators. It may be extendable for use in other cases, but I haven't done the work yet to investigate how easy that might be.