Saturday, February 4, 2012

Give 'em the boot...

My computing career is rather different from that of a lot of people I know. I haven't been in computing my whole life but, rather, I have two distinct periods of computing -- the first when I was a teen and the second here in my 40s.

In the first iteration, I hacked many long hours on my C64, the school's Apple ][e machines and whatever else I could get my hands on. It was a given, at that time, that one spent quite a while basically poking the innards of your machine with a virtual stick. This produced variety of results, most of which were rather boring. But the occasional "spark" of meaningful response what enough fuel to maintain motivation. Luckily none of my sparks actually involved sparks or releasing the magic smoke from the hardware. Perhaps I didn't try hard enough?

Now in my second iteration, I have to make things work in a way that supports a family and maintains my interest without completely monopolizing my time. I think this means picking out little projects and experiments that can be done in a weekend afternoon -- things that are interesting, intellectually stimulating and scratch my particular itches. And those itches seem, more and more, to hearken back to the stick poking of the first iteration. Now, since I have to actually get meaningful work done on my machines, and maintain an environment where my family can do things like write school papers and get email, I can't just start poking values into memory and the like. The cost of breaking things is too high. Thankfully, the world is full of virtualization choices....

My current interests are bootstrapping, both of languages and computing environments (arguably, very much the same thing). In my undergrad automata class we played with a Universal Turing Machine. This machine behaved, initially, in a way that is very similar to the boot sector of an x86 machine -- it performed a couple of initialization instructions and then jumped over some intervening data to get to the real program to run. The typical DOS-style boot sector does a similar thing -- initializes a few things and then (hand-wavey) jumps into the real program. This realization really struck me and has intrigued me for years. So, the idea is to play with bootstrapping a computing environment from just about nothing. How close to nothing? I think starting at the boot sector is good enough. I suppose I could mess around with writing a BIOS, but that might be too minimal.

There's quite a lot of material on bootstrapping your way into various languages, particularly Scheme and Forth. I played with Forth back in the '80s, thought it was pretty cool, but never really learned it. However, it seems ideal for bootstrapping. Apparently, just a few lines of assembler will get enough of a minimal Forth to allow building up a system gradually. So here is a rough sketch of a plan:

  1. Build some kind of boot sector code with just enough to allow me to enter machine instructions and then jump to them -- sure to lead to bad results :)
  2. Use that boot sector to boot a VM and then start hacking. Ideally the next step is to write enough code to allow saving machine state to disk. This will enable me to increment my way to something more useful over multiple sessions. Part of this step is probably rewriting the boot sector to automagically reload the saved machine state -- a basic kernel image.
  3. Start building up enough environment to get something Forth-like and from there we're off and running!
I'm not positive how this will play out, and whether I will even actually start it, but it sounds good for now. :)


My kids saw me messing around here today and were horrified at the layout I was using. My 12 yr old proceeded to rework it. Hence, the new look... ;)

Tuesday, March 8, 2011

Silly little details

Sometimes you have these moments when some silly little detail you've missed for so long suddenly becomes clear and it changes your entire outlook. I just had one of those moments. My friend Harry came to ask me about the State monad. Now, I'm no monadic cowboy, but I've slowly grokked how to use them a little bit.

So, we were discussing the structure of return and bind for State, and started to talk about runState. In my attempts to explain it, I realized something silly -- it's an accessor function for the record type State!! Duh! Now it call becomes clear!

All along I had assumed that runState was some function floating around in some file somewhere that I just hadn't found yet. But no, it's just the field accessor for the runState field in the State record. No one ever constructs a State instance using that syntax, but they access it that way all the time. It turns out:

let m = State (\s -> let foo = ... in (foo a, s))

is synonymous with

let m = State { runState = (\s -> let ...)}

so when I do

let (a,s') = runState m s

I'm really just getting that function back out of the State record and applying it to s!!


I'm not really sure how I've missed this the whole time. Probably others who read this will wonder how I got this far without figuring it out. But, there it is.

Maybe this will help someone. It's sure helped me.

Wednesday, September 15, 2010

\$100 Words

My eleven year old daughter came home with a homework problem. The task is to find \$100 words. A \$100 word is a word whose letters, when assigned values, sum to 100. Each one she finds is worth "\$100" in their little game at school. Okay cool! But the values are assigned like this:

  • a = 1

  • b = 3

  • ...
  • m = 25

  • n = 26

  • o = 24

  • ...
  • z = 2

This is not exactly the most intuitive way to number the alphabet, and it certainly makes it hard to evaluate a given word as the typical mapping we're used to (a=1, b=2...) doesn't help. I gave up after a few minutes and told her she should learn to program to solve the problem.

She wasn't too thrilled with that idea, but here's my simple Haskell solution:

import Char (isUpper)

values :: [(Char, Int)]
values = zip (interleave ['a'..'m'] ['z','y'..'n']) [1..26]
interleave :: [a] -> [a] -> [a]
interleave (x:xs) (y:ys) = x : y : interleave xs ys
interleave _ _ = []

charValue :: Char -> Int
charValue c = case lookup c values of
Nothing -> 0
Just v -> v

wordValue :: String -> Int
wordValue = foldl (\s c -> s + charValue c) 0

goodWord :: String -> Bool
goodWord word@(x:_) = (not $ isUpper x) && wordValue word == 100
goodWord [] = False

main :: IO ()
main = interact (unlines . filter goodWord . lines)

Just feed it a list of words, like the handy /etc/dictionaries-common/words and away it goes...

$ ./DollarWords < /etc/dictionaries-common/words

$ ./DollarWords < /etc/dictionaries-common/words | wc -l

So, if I was to unleash this on her poor unsuspecting teachers, her team would rocket ahead with \$652,000! But I guess that wouldn't be fair since she didn't write the program...

I'd love to see other versions of solutions for this problem.

Wednesday, September 8, 2010

Decoding Huffman codes without the tree

Okay, so, last time I demonstrated how to serialize a Huffman decoding tree into a simple stack-based language for rebuilding the tree. This was pretty interesting in it's own right, in my opinion, but was only a step down the road to the material in this installment... how to decode the Huffman code without the Huffman decoding tree. That is, the intervening data structure, the Huffman decoding tree, is completely eliminated, replacing it with a call graph that does the decoding instead.

Like everything so far, a lot is owed to Heinrich Apfelmus' morse code example, from which this work is directly taken.

This technique of removing the intermediate data structure is called deforestation. In this case, when decoding a string of encoded characters, the Huffman decoding tree is built, and then traversed to find a decoded letter. The construction of the tree is a recursive application of the tree constructors starting at the root of the tree. This construction actually happens backards, starting with leaf nodes which are assembled into branch nodes repeatedly until there is only one branch node, the root, left. In a lazy language like Haskell, these nodes may not be evaluated, but instead left as a tree of thunks, unevaluated function calls, where the function calls are calls to the node constructor. Then the decoding is a recursive traversal of that tree looking for a leaf node to decode the letter.

The construction of the tree looks essentially like this:

(Branch (Branch (Leaf 'a') (Leaf 'b')) (Branch (Leaf 'c') (Branch (Leaf 'd') (Leaf 'e')))

(with some stuff left out for clarity).

Apfelmus' replaces the constructor calls with functions that tell what to do in the case of reading a particular character from the input stream. And this is where one of the fundamental differences between the morse code tree and the Huffman coding tree come to light. The morse code tree is essentially a trie -- the resulting character returned is based on where you are in the tree when you run out of input (morse code is broken into "words" for each letter). Each node in the morse code tree has a letter associated with it. Each dash or dot moves you further into the tree and to a different letter (allowing, I suppose, a form of partial result, though it has no meaning). When the symbols in a particular morse code "word" are used up, whatever node you're looking at is the character that has been decoded.

To implement this, Apfelmus' uses the following, (here I use 1's and 0's instead of dashes and dots):

branch c x y = \code -> case code of
'0':ds -> x ds
'1':ds -> y ds
[] -> c

leaf = undefined

The leaf can be undefined because any well-formed morse code will never reach the leaves of the tree. And if there is no input left, the last case in the branch function, then you're done, emit the character. I won't go into the details of how these functions are used, you can read his post yourself.

In the Huffman tree, this is not the case. First, the incoming code is not broken into "words". It is all one stream. Secondly, the decoding happens at the leaf nodes so the leaf function has to actually do something. Finally, in the decoding process, we have to keep track of what hasn't been decoded so far because we'll need it for the next letter.

The result looks like this:

type Code = String -- an encoded message
type Result = (Char, Code)

branch :: (Code -> Result) -> (Code -> Result) -> (Code -> Result)
branch x y = \code -> case code of
'0':ds -> x ds
'1':ds -> y ds

leaf :: Code -> (Code -> Result)
leaf [c] = \code -> (c, code)

Note here the difference. The branch function never returns a decoded character. That is pushed into the leaf function which also returns the unprocessed remaining portion of the code.

Now to put it together. In the last installment we had an interpret function that used HuffTree constructors to build a data structure. In this case, we replace those constructors with our two functions:

interpret' :: Program -> Code -> Result
interpret' = head . foldl exec []
exec (rt:lt:xs) '_' = branch lt rt : xs
exec xs c = leaf [c] : xs

Instead of building a data structure, we're building a call graph -- a graph of function calls that trace the decoding of a code into a character and a remaining code (the Result type). Each call to interpret' with a Program and a Code argument produces one decoded letter and the rest like this:

*Huffman> let prog = compile $ buildDecTree "hello world"
*Huffman> prog
"rw_eh__l d_o___"
*Huffman> let code = encode "hello world"
*Huffman> code
*Huffman> interpret' prog code

Subsequent calls to interpret' continue the process

*Huffman> interpret' prog (snd it)
*Huffman> interpret' prog (snd it)

So, to wrap it all up, we recursively build up the decoded message until we've run out of code:

decode' :: Program -> Code -> Message
decode' prog code = runInterp "" code
runInterp :: Message -> Code -> Message
runInterp s [] = s
runInterp s cd = uncurry (\c rest -> runInterp (s++[c]) rest) $ interpret' prog cd

And we can see that it works:

*Huffman> decode' prog code
"hello world"

Ta da!

Okay, now come the caveats.

First, this was begun as an exercise for myself, just to see if I could do it. It was pretty fun, and I'm happy with the results. But, being done just for me (despite my sharing with you) it's probably got all sorts of problems...

Second, this type of deforestation, as far as I can tell, is not necessarily a gain. For example, depending on the semantics of the language it's implemented in, you may end up just building a whole pile of thunks on the stack that don't actually do you any good. In other words, it may all be for naught... building the intermediate data structure may be just as good or better, and is certainly more clear and understandable.

Third, I really ran through the end here. This is at least partially because I wanted to get done, but also because Apfelmus does an excellent job, much better than I could do, of explaining this. I encourage you to read his post.

Comments, suggestions etc are very very welcome. And the code is available in various stages on my git server.

Sunday, August 22, 2010

Serializing Huffman Trees

If you've been following, then you know I'm playing with Huffman trees and attempting follow Heinrich Apfelmus' morse code example.

Previously, I built a Huffman Tree data type and some functions for encoding and decoding messages using that structure. This installment is rather short, covering a way to serialize and deserialize a HuffTree structure. This serves two purposes. First, it's a way to encode the Huffman tree so that the recipient of a message can decode it. I make no claims as to its efficiency or anything like that. It works, and is really more of an exercise for myself. The second reason is in my pursuit of mimicking Apfelmus' work, I want to effectively obscure the Huffman tree structure, making it more or less vanish into the call structure.

Compiling a HuffTree and Interpreting the Code

The technique here is pretty straightforward. We will traverse the tree, using a post-order traversal, generating a code to be later used for rebuilding the tree. The code is interpreted as a stack based language. In the language, any symbol, other than '_' (underscore) encodes a push operation of that character onto the stack. An underscore means pop two items off the stack, create a Branch node with them and push the new node onto the stack. As these operations are repeated, assuming a well-formed code, the result will be one node left on the stack containing the reconstructed tree.

Both functions are very simple:

-- here we emit an "_" to signal a branch. This "_" will be used to do
-- a pop/merge/push when we interpret the code later
compile :: HuffTree -> String
compile (Leaf sy _) = [sy]
compile (Branch _ lt rt) = compile lt ++ compile rt ++ "_"

interpret :: [Char] -> HuffTree
interpret = head . foldl exec []
exec :: [HuffTree] -> Char -> [HuffTree]
exec (rt:lt:xs) '_' = (Branch 0 lt rt) : xs
exec xs c = (Leaf c 0) : xs

For compiling, simple pattern matching in a recursive tree traversal does the trick. For interpreting, a foldl with a constructor function is all that's needed. Note that the interpret function is stolen almost directly from Apfelmus' work, except that the Branch and Leaf operations are swapped.

These functions work just fine:

*Huffman> let tree = buildDecTree "hello world"
*Huffman> tree
Leaf r
Leaf w
Leaf e
Leaf h
Leaf l
Leaf d
Leaf o

*Huffman> compile tree
"rw_eh__l d_o___"
*Huffman> interpret "rw_eh__l d_o___"
Leaf r
Leaf w
Leaf e
Leaf h
Leaf l
Leaf d
Leaf o

In this compile-interpret sequence, one thing is lost -- the weight of the nodes. This is acceptable because the weights are only used for building the initial tree based upon the character frequencies of the encoded message. Once that step is done, the weights are simply unnecessary.

It would be nice to be able to compare trees for equality, if for no other purpose than to get the machine to tell us that interpret and compile are inverses (modulo the weights). So our HuffTree type needs to be an instance of Eq with an (==) function that is meaningful for our purposes. In this case we want structural equality with the values in the leaf nodes being compared directly. Weights are ignored.

instance Eq HuffTree where
(==) (Branch _ l1 r1) (Branch _ l2 r2) = l1 == l2 && r1 == r2
(==) (Leaf s1 _) (Leaf s2 _) = s1 == s2
(==) _ _ = False

The goal here is trees that are equal should encode and decode in the same way. So the shape of the tree is important. That's what this equality function does.

*Huffman> buildDecTree "hello world" == (interpret . compile $ buildDecTree "hello world")
*Huffman> buildDecTree "hello world" == (interpret . compile $ buildDecTree $ reverse "hello world")

And we can see that this works. Even for the reversed message, because the tree built from this message has the same structure.

That's it for this installment -- a short one. Next time I'll start messing with the deforestation/fusion stuff.

Tuesday, August 17, 2010

Huffman Coding in Haskell

To follow up on my post from last week, here is the beginnings of some fun with Huffman Coding in Haskell. Remember that this was spurred by Heinrich Apfelmus' article using fusion and other fun things with Morse Code.

Huffman Coding

There is nothing revolutionary in here, just a reasonably straight-up Huffman coding algorithm that converts a message into a string of 1's and 0's and then can convert them back to the original message. I borrowed heavily from this blog posting and if you look at the history of my source code you'll see that it was a HUGE help! The code snippets below are just that, snippets. Look to the complete source code to get a working example.

Anyway, the basics are as follows.... Given a message, we count the frequency of each character and build up a dictionary of characters and their respective frequencies.

-- the list of character
type FreqList = [(Char, Int)]

-- function to count character frequency, storing the results in a map.
charFreq :: String -> FreqList
charFreq s = Data.Map.toList $ foldl charFreq' Data.Map.empty s

charFreq' :: Map Char Int -> Char -> Map Char Int
charFreq' m c = Data.Map.insertWith (+) c 1 m

These frequencies are later used as weights for building the Huffman Tree. A Huffman tree is a binary tree built so that higher frequency characters are more shallow leaves and lower frequency characters are deeper leaves. This means that high frequency characters are represented with fewer bits than low frequency characters. This gives an overall shorter stream of bits for encoding the message.

So, we need a data type to represent the Huffman Tree:

-- data structure for huffman trees
data HuffTree = Branch { wt :: Int, l :: HuffTree, r :: HuffTree }
| Leaf { symbol :: Char, wt :: Int }
deriving (Eq)

instance Ord HuffTree where
compare = compare `on` wt

mappend :: HuffTree -> HuffTree -> HuffTree
mappend x y = Branch (wt x + wt y) x y

We've made HuffTree an instance of Ord so that it can be sorted on its wt (weight) field. This is useful for building the tree -- we can keep the nodes in the order we want easily. And the function mappend is handy for "adding" HuffTrees together.

The process for building a Huffman Tree from a list of characters and weights is pretty straightforward. We create a list of HuffTrees, using just Leaf constructors and storing their weights. This list is sorted by weight so that the Leaf nodes that represent the characters with the lowest frequency or weight are first. These HuffTree elements are pulled off the list, combined with the `mappend` function and then sorted back into the list. This process continues until there is only one HuffTree left in the list. This is the Huffman coding tree for our message.

-- produce a decoding tree from a string
buildDecTree :: String -> HuffTree
buildDecTree = build . sort . map (uncurry Leaf) . charFreq
build :: [ HuffTree ] -> HuffTree
build [] = error "Empty Leaf list"
build (t:[]) = t
build (x:y:ts) = build $ Data.List.insert (mappend x y) ts

As usual, Haskell make this sort of operation ridiculously simple and intuitive.

with an instance of Show for HuffTree, we can see the results:

*Huffman> buildDecTree "hello world"
Branch 11
Branch 4
Branch 2
Leaf r 1
Leaf w 1
Branch 2
Leaf e 1
Leaf h 1
Branch 7
Leaf l 3
Branch 4
Branch 2
Leaf 1
Leaf d 1
Leaf o 2

To encode a character using the generated tree, we record the path taken to get to the character emitting a "0" for left branches taken and a "1" for right branches. In the above tree, where the first branch listed is the left branch, the path to the letter "e" is left -> right -> left. Thus the code for "e" is "010", only three bits to represent an 8-bit character. To encode an entire message, we repeat the process for each letter. But traversing the tree for each letter is neither efficient nor elegant. So we flatten the tree into a dictionary of characters mapped to their encodings. To do this, we traverse the tree once, recursively recording and emitting dictionary entries.

-- produce an encoding dictionary from the decoding tree
buildEncDict :: HuffTree -> [(Char, String)]
buildEncDict = buildEncDict' ""

buildEncDict' :: String -> HuffTree -> [(Char, String)]
buildEncDict' s (Leaf sy _) = [(sy,s)]
buildEncDict' s (Branch _ lt rt) = buildEncDict' (s ++ "0") lt
++ buildEncDict' (s ++ "1") rt

Now to encode a character, we simply to a lookup in the dictionary. In this case, it's just a list of character-code pairs but it could easily be, for a larger character set, a search tree keyed by the character storing code values. As an aside, this would be an interesting exercise, to traverse the Huffman tree transforming it into a binary search tree. Regardless, encoding is very straightforward:

-- here is an encoding function to turn a String into a Code
encode :: String -> Code
encode s = encode' (buildEncDict $ buildDecTree s) s

encode' :: [(Char, String)] -> String -> Code
encode' _ [] = []
encode' d (s:ss) = (fromJust $ Prelude.lookup s d) ++ encode' d ss

And finally, to decode a message, you traverse the HuffmanTree directly, turning 0's into left branches and 1's into right branches in the tree traversal. When the traversal reaches a Leaf node, then a character has been decoded.

-- and a decode function to turn a code and a tree into a string
decode :: HuffTree -> Code -> String
decode t code = decode' t code
decode' (Branch _ lt rt) (c:cs) =
case c of
'0' -> decode' lt cs
'1' -> decode' rt cs
_ -> [] -- "otherwise" produces a warning about unused variable
decode' (Leaf s _) cs = s : decode' t cs -- we need the whole tree again
decode' _ [] = []

Note the use of a helper function because to decode each letter, you have to start from the root of the tree. There are likely better ways to do this...

That's the basics of Huffman coding in Haskell, and it works just fine:

*Huffman> let tree = buildDecTree "hello world"
*Huffman> encode "hello world"
*Huffman> decode tree $ encode "hello world"
"hello world"

But there is a problem. To decode the message, you need the coding tree for that message. I'm sure there are a variety of techniques to transmit the tree as part of the message. One of the things Apfelmus did that intrigued me was develop a stack based language and interpreter to represent the morse code tree as a string of characters. I think this would make a fine way to encode the tree into the front of the message. We should be able to extract a string of characters from the Huffman tree for a particular message and use that string of characters to reconstruct the tree on the other side. Now this will add some significant overhead to the output for a given message, but I'm not necessarily concerned with space efficiency here as much as learning about these kinds of transformations.


I previously mentioned some differences between Apfelmus' implementation of the morse code decoder and a similar Huffman decoder. Having gone through the process of building this code, I"ve come to better recognize those differences.

First, as I mentioned before, in the Huffman tree, the leaves are important and the result is an encoding of the routes through the tree to obtain the desired characters. We use the Leaf nodes to know when a particular character has been decoded because the code is an undelineated string of bits. In the morse code solution, the "letters" are delineated. It is clear when a particular letter is complete. So the decoding works slightly differently and the tree is really a trie, where the result is whatever character you land on when the input stream for a particular character ends. I'm not sure what the implications will be for the application of the fusion technique, but we'll see.

When decoding, you don't strictly need the weights anymore, they are only used to build the tree. So transmitting the decoding tree is simpler than it seems at first glance. Again we'll see what comes of that as I progress through this.

Next installment should be a "compiler" to turn a HuffTree into a string of characters that can be used to reconstruct the tree again using an "interpreter". Should be fun!

Sunday, August 8, 2010

A Haskell approach to Huffman Coding

I just ran across Heinrich Apfelmus' discussion of morse code and I think it's pretty cool. I like the progression from a straightforward initial approach to a very clever result. I was really struck by the similarities of parsing morse code with a tree and Huffman Coding. My goal this week is to try to apply Apfelmus' technique to Huffman coding.

There are some important distinctions between the morse code tree and Huffman Coding. First, in Huffman coding, the leaves are important. The symbols are stored in the leaves of the tree instead of in the nodes as in the morse code solution. Additionally, the Huffman tree structure itself encodes the frequency of the symbols in the source, a consideration that is not required in the morse code. This changes the interpreter required for constructing the tree. Also, the Huffman coding tree should be constructed specifically for the data set it is being used to encode. For a reasonably consistent data set, like large amounts of English text, the frequency of symbols for the language in general can certainly be used with reasonable results. But, for encoding other kinds of data, ideally the source should be used to generate frequencies of symbols unique to that source. I'm probably going to ignore this and just use some reasonable standard frequencies instead.

So, that's my plan for the week, we'll see how it goes.

Monday, March 22, 2010

You know you're a geek when...

You know you're a geek when you catch yourself trying to explain to your 12 year old kid how the pig-latin transformation is not an isomorphism, but is a function (at least it seems so at first glance). And then things diverge into whether one is discussing typographical or aural pig-latin.

The anecdotal evidence is pretty clear: the English word "pay" becomes "ay-pay" while the English word "ape" becomes "ape-ay"*. Typographically they are quite distinct, but aurally they can be quite similar, even identical, depending on one's speech patterns. So, spoken pig-latin is not an isomorphism, relying on contextual clues to allow easy translation. But typographical pig-latin should be very straightforward, mechanical, to translate. I'm not even going to go into the occasional pig-latinized French that we have to bust out around here when the kids are eavesdropping.

Anyway, that's just a little food for thought ;)

* remember that words which begin with a vowel sound merely have the "ay" syllable appended to it.

Saturday, January 30, 2010

Grad School! and other stuff

I've been letting this languish for a variety of reasons, not the least because I'm just really busy. Carrying 18 credits at school while trying to prep a house for sale is not for the faint-of-heart. So, here is a bit of an update one what I'm not quite getting accomplished.

Grad School -- I just got my first grad school acceptance letter. That makes it official that I'm headed to grad school in the fall. Yay! But at the same time that makes everything else much more critical.

House, Business -- I'm trying to sell my businesses and our house. Both of these are fairly huge tasks in their own right. I've got some interest in the bar that may pan out to some kind of sale soon, but who knows. The house is an effort in trash hauling, mostly. It's mazing how much crap can accumulate in 10 years! We've got to get about 2/3 of our stuff *out* of the house to make it look acceptable for someone to buy. Then we have to somehow *live* without 2/3 of our stuff. Luckily, about half that 2/3's is probably trash of some kind or another. Can you imagine? 1/3 of our stuff is essentially trash! This is going to be good for us in the long run.

School -- I'm carrying a big load at the moment (and next quarter too) in order to get done this spring. It's going to work out, but it's a *lot* of work. Thankfully, some of it's fun. My Senior Project is a great little research project examining the latency of a video system in Linux using gStreamer and a real-time kernel. Pretty cool.

Side stuff -- the reading list gets longer all the time, especially with impending grad school admission. I've got to catch up on important stuff I feel I'm missing from my undergrad education. Specifically, I want to read up on programming language semantics, compiler design, and play around with some OS stuff. I may have to write the typical crappy little OS project that never gets beyond the bootloader. The projection pursuit project is actually pretty much done. I have one little problem to fix and I can call it a finished prototype, but I'm not sure when I'll get to that.

So, that's the update for now. Cheers everyone!

Wednesday, September 30, 2009

Projection Pursuit in Haskell, pt 3


Sometimes problem solving requires just a lot of thought. And I don't mean brute force thought, but the gentle, patient thought that comes from relaxing about a problem and just letting your brain work on it for a while. I find I come up with some of my best ideas in the shower. Yeah, I know... But, when something has really been bothering me -- remaining unsolved despite my best efforts -- the solutions typically come at night. Something wakes me up and then I'm up and my mind won't stop and then the solution comes.

Such it was last night. I haven't had a lot of time to put into this effort lately, and what effort I've made has been frustrating. If you look at the original Octave source code you'll see that there are a lot of parameters controlling the algorithm. We've got the raw data itself, the size of the gradient probe, the proportion of the gradient to use, the number of iterations. It's a lot of stuff to cart around. But, I'll get to that in a moment.

First, I had spent a lot of time writing some matrix and vector operators. This was a good exercise, though I've ultimately thrown all that code out. One crucial part of this algorithm is the use of the right singular vectors from the singular value decomposition. I certainly didn't want to write an svd function for this exercise. Google to the rescue! I found hmatrix, and a solution to all my matrix handling problems. So, using hmatrix caused me to dig back through the bits of code I'd already written. At the core of the projection pursuit algorithm is the calculation of the gradient. This solution uses an estimation of the gradient by probing in all dimensions and measuring the change in the kurtosis.

So, let's define some helper functions:

{-- calculate modified kurtosis --}
kurtosis :: Vector Double -> Double
kurtosis y = mean(y) ** 4 - 3

{-- calculate the mean of a vector --}
mean :: Vector Double -> Double
mean y = go 0 0 $ toList y
go :: Double -> Int -> [Double] -> Double
go s l [] = s / fromIntegral l
go s l (x:xs) = go (s+x) (l+1) xs

{- create a list of vectors to probe h distance in all directions -}
hs :: Double -> Int -> [Vector Double]
hs h r = toRows . diag . fromList . take r $ repeat h

Using these, we can calculate the estimated gradient:

{-- calculate the gradient of a function given a bunch of starting information                           
f - the function to compute gradient of
h - the probe distance... the smaller the better
z - the raw data set needed for future projections
w - a starting vector/point at which to calculate the gradient --}

gradient :: (Vector Double -> Double) -> Double -> Matrix Double -> Vector Double -> Vector Double
gradient f h z w = probe ws
ws = map ((+) w) . hs h $ rows z
k = f $ w <> z

probe :: [Vector Double] -> Vector Double
probe = fromList . probe'

probe' :: [Vector Double] -> [Double]
probe' [] = []
probe' (x:xs) = (deltaK x) / h : (probe' $ xs)

deltaK :: Vector Double -> Double
deltaK x = k - f (x <> z)

This is really pretty straightforward. We calculate a starting kurtosis value, generate a bunch a vectors that shift h-amount in all dimensions, calculate the kurtosis for each of those and concatenate them into a new vector for the gradient.

The real breakthrough for me came when I was struggling with how ugly it was going to be to pass around all this junk. Just ugly and I can't stand it. So last night, while I was lying awake fretting about this, it finally hit me. I don't have to pass around all this stuff. I can just partially apply the gradient function to all the algorithm control parameters, and the raw data, and then just pass that around. So at some point I'll do this:

{-- build a gradient function for *this* problem --}
grad :: Vector Double -> Vector Double
grad = gradient kurtosis h z

Now I can just pass around grad and it will carry around all the baggage for me. I know. This is a really simple part of haskell, and it's something that is done all the time. I've certainly done it many times. But for some reason it eluded me for a really long time. The real problem was one of not thinking about this function properly. Yes, all that data is needed to calculate the gradient. But I am not calculating the gradient of all of this stuff as variables. The function, the raw data and the h distance are all fixed points within the context of the problem. That is, there is only one true variable in this application of the gradient function and that is the current position, w. All along I had been thinking about h and f and z as variables and thus I had to keep passing them around. And they are variables, within the context of the whole algorithm, but at the level of the gradient, they aren't. Huh. It took me along time to bend my brain around to seeing this the right way.

Anyway, there it is. Now that school is back in, I get stolen moments on the bus to work on this, so hopefully the rest will come soon.

The Summer of Chaos

So, time to reminisce about the summer just past. Wow! It's amazing how life throws you curve balls.

The Plan. Heading out of last spring, I was anticipating a summer of general-ed requirements to speed me on my way towards graduation, a little programming work on the side, research on grad-school programs and lots of time for personal projects to be posted here. This was to go alongside taking care of my three daughters and generally being "summery." Cool. Good plan. A little ambitious, but doable.

The Reality. Business took a nose-dive. So much so that I had to go back behind the bar for the summer to keep things afloat. We pared down to a skeleton staff with me working 4 shifts a week slinging beers. That's just what needed doing.

Meanwhile, I did get some side work, technically an internship, though I'm not bothering to take the educational credit, just the CV listing. I spent some time learning Python, flexing my Debian admin skills and playing around with EWU's Living Laboratory project under Dr. Jeff Putnam. It was a fun project. I got pretty comfortable with Python, the basics of Postgresql, and Django. Also, I got to play with specifying and implementing the simplest of Domain Specific Languages to allow user-entered code to run on our server. That project is ongoing, though I've got precious little time for it now that school is back in. Hopefully, I'll get to extend (or create anew) that language for specifying graphs as well.

So, personal projects, grad-school research? All by the wayside. My plans to port my Projection Pursuit algorithm to Haskell got pushed back. A *ton* of reading (Computational Category Theory, A Transition to Advanced Mathematics, Goedel, Escher, Bach, etc.) postponed indefinitely. Grad school selection is only getting started now, and that is such a daunting task! Dr. Putnam suggests I find blogs I like, follow all of their blog links and assemble this graph of blogs and schools and then run a clustering algorithm looking for commonality and hope that helps me narrow down the choices. Hah! That would probably work really well, but given the time available, would take until long past the application deadlines. Plus, I have to prep for the GRE's in a week! Wah!

The Results. Well, what can I say? It is what it is, and I keep marching ahead. Things never go quite the way you expect, but in the end, they work out. I think. I'm getting that Projection Pursuit done. I'm keeping up with my studies. I'm gradually narrowing down the choices for grad-school applications. And somehow I'm still enjoying my family and keeping food on the table. Not bad :-)

Saturday, August 22, 2009

Projection Pursuit in Haskell, pt 2

In this installment, I'll review my original projection pursuit algorithm in Octave and provide some pretty pictures and sounds :)
The Code
First, you can download the code and follow along if you like.

The function accepts a number of parameters to control the gradient probing and stopping criteria, as well as provide the mixed signals. The mixed signals are represented as a row-wise matrix with each row representing a particular mix. Here is the breakdown from the help section of the function:
function [y, K] = ppursuit(h, eta, tol, mxi, x)
% [y, K] = ppursuit(h, eta, tol, mxi, x)
% Projection Pursuit (probing to estimate gradient).
% This function uses projection pursuit to demix m
% signal mixtures into m estimated source signals.
% h Step size for probing the Kurtosis (K) of the
% demixed signals, relative to the magnitude
% of the current demixing vector. IOW, K is
% probed by looking at K(w + h) in the m
% dimensions of w, one at a time. It should be
% noted that w is a unit vector, so there is
% no need to scale the step size by the current
% size of w.
% eta Distance that demixing vector w is adjusted
% in the direction of the estimated gradient of
% K. IOW, wnext = w + eta*g where g is the
% estimated gradient of K, normalized to 1.
% tol Stopping criterion for gradient ascent. The
% ascent terminates when the relative change in
% K is < tol. (Change in K divided by K).
% mxi Maximum iterations to execute on the ascent
% of K for each recovered signal.
% x (m x n) matrix of signal mixtures. Each row
% is a 1 x n mixture of m source signals. The
% source signals may be anything, and there is
% no assumed relationship between the n samples
% of a given mixture. They may be temporal or
% spatial samples.
% y The (m x n) source estimates
% K The (m x ?) history of the ascent of Kurtosis
% for each recovered source. Having this history
% allows the caller to tune the search parameters.

The algorithm itself is very straightforward.

  • First, we find the PCA (Principal Component Analysis) to minimize de-mixing errors. Because our two original sound sources are independent, they are uncorrelated. Casting our data into an uncorrelated basis (which is what PCA does) we can easily extract signals as we find them without interfering with the remaining signals. If we did not do this, then we would have to find all the signals simultaneously (by a different technique) so that we could solve the "parallelogram" that they would make (obviously, I'm leaving things out...). So, we do this by taking the SVD (Singular Value Decomposition) of the mix matrix and throwing away "U" and "S". What remains is the mixes represented in an ortho-normal basis (this is a fairly trivial derivation).

  • Now we iterate for each row in the mixes:

    • Whiten the mixes by setting the variance to 1

    • Select a starting vector for demixing. In the original project, we used a random unit-length vector. I've changed that to a standard basis vector, i, for illustrative purposes.

    • Project the mixes onto the trial demixing vector and measure the kurtosis

    • Iterate until our stopping criteria are met: that is, we reach the maximum number of iterations, or the change in kurtosis from one iteration to the next fall below our threshold

      • Probe the gradient in each direction, measuring the kurtosis

      • Assemble the results of the probe into a gradient vector

      • Adjust the trial demixing vector one increment (eta) in the direction of the gradient vector. By moving a fraction of the gradient, we minimize the risk of overshooting the peak of the kurtosis. If we moved by the full gradient, a localized spike in the gradient could easily cause a large displacement of the demixing vector.

    • At this point, the demixing vector now points in the direction of maximum kurtosis. We project our mixes onto this vector and store the results as the first unmixed signal

    • Subtract the unmixed signal from the original mixes to allow further demixing

Please note that I used a stripped down kurtosis calculation which ignores the denominator of the typical calculation. We can do this because the mixes have been whitened so the variance (and thus the denominator of the kurtosis calculation) will be 1.
So, to test this we need a mixed sound signal like this mix of a bird chirping and a gong ringing. This sound clip only plays one "mix" of which we have two in this file. It's a horrific sound...

After projecting the mixes into an ortho-normal basis, we can plot the two mixes against each other like this:

It's pretty clear that there are two distinct signals here, one fairly well defined in a sort of "north-by-north-east" orientation, and another in a "north-west" orientation. I've only plotted about 1 in 5 samples here, otherwise the image just becomes a big blob.

After the first iteration of the algorithm, finding the first signal, we can see its progress here:

In this image, the red line represents the starting point, the green lines represent the intermediate trial demixing vectors and the blue line is the final vector for the current signal. Notice how the trial demixing vectors probe quickly at first and slow down as we approach the peak of the kurtosis curve and the gradient begins to flatten.

This first iteration extracts a pretty clean bird chirp signal, although if you listen closely, you can hear just a touch of the gong as well.

In the second iteration, we can see the algorithm finding the second, more diffuse, signal:

Again, you can see the initially quick probing that slows as we approach the peak of kurtosis. What is not shown here is that the mixes have had the first signal removed. Because there are only two signals here, removing the first signal will project the remaining signal onto a vector, making the extraction of the second signal trivial. But it's still fun to watch it work. The second signal extacted is a rather awful gong sound. Again, you can still hear just a touch of the bird chirping in the background.

Here is the kurtosis history of each probe, using ppursuit(.001, .01, .00001, 2000, mix) as the function call.

The first signal, in blue, took 47 iterations to reach the stopping criteria. The second signal, in green took only about 17 iterations to stop (the fall off in the green curve is just an artifact of the way the data is stored). As you can see, both of the curves start off fairly steep and rapidly level out. Each curve represents the slop of the gradient as it is pursued uphill.

You can see some interesting effects by changing the parameters to the function. It is possible, for example, to get the algorithm to oscillate around the peak of kurtosis as each probe overshoots the peak.

The reason we have less than perfect demixed signals is because we can never hit the peak kurtosis perfectly. As we get closer to the peak, the magnitude of the gradient gets smaller and smaller, so that the demixing vector moves by smaller and smaller amounts. The smaller we set the "tol" parameter, the closer we can get, at the expense of more iterations.

So, in retrospect, this example mix is not so great because the demixed signals appear to be orthogonal. However, they aren't. The resulting demixing vectors are [0.7665,-0.644225], and [0.70813,0.70608]. Their dot-product is 0.087907. So, they are close, but not quite, orthogonal. This is important. One of the benefits of projection pursuit is that it can find signals that are not orthogonal. Other simple analysis techniques like PCA (Principal Component Analysis) rely on the signals being orthogonal to produce meaningful results.

Here is the raw signal, before projection onto the ortho-normal basis in the first steps of the algorith:

As you can see, the original signals are not orthogonal. There is no guarantee that the projection onto the ortho-normal basis will produce orthogonal, or near-orthogonal signals. That is just a characteristic of this particular data.

Thanks to Dr. Paul Schimpf for comments and suggestions about this posting.

For the next installation, I'll examine how to get started writing this in haskell.

Friday, August 21, 2009

Projection Pursuit in Haskell, pt 1

In the Spring Quarter, I took a class called Introduction to Scientific Computing. It was sort of a survey of scientific computing techniques focused on using Octave.

Our final project was to implement a Projection Pursuit algorithm to unmix two or more mixed sound signals using the same number of mixes as there were signals. The idea is simple: Given, say, two speakers and two microphones, all at different locations, if the two speakers each play different sounds then the two microphones will detect different mixes of those two sounds. Our job was to take these two mixes and "unmix" them. For simplicity's sake, we are ignoring the time differential that would occur between the different mixes.

To do this using Projection Pursuit, we project the two mixes onto a random vector in 2 dimensional space. Then we measure the kurtosis of the projected signal. Next, we do a gradient ascent, adjusting our vector to climb uphill in the direction of the greatest kurtosis. As we approach the peak, the change in kurtosis gets smaller and smaller. When we satisfy our stopping criteria (the change in kurtosis falls below some threshold) we consider a signal to be "found". Then we extract the signal that lies on that vector and we have our first unmixed sound. Next we remove that signal from the two mixes and what remains is the other sound. This works because sounds tend to be super-gaussian signals, so the kurtosis will climb fairly quickly as you search.

I have to say, it was a really cool project and sort of "magical" when the extracted sounds started playing.

My goal is to port this project to Haskell. Projection Pursuit can be implemented, I believe, in purely functional code as fairly straightforward recursion. Now there are certain niceties to doing the original in Octave: native matrix and vector manipulation, simple graphing capabilities (so you can watch it "pursue"), and fairly easy sound playing to listen to the results. I probably won't get to coding up the graphing and sound in Haskell.

I completed some portions of the matrix manipulation functions earlier in the summer, but it's not ready to post yet, and I want to write up my progress in this process.

Next time, I'll post my Octave code, and provide some graphics of the pursuit. It's pretty cool.

Thursday, July 2, 2009

How do you TODO?

Some time back there was an ask proggit looking for ideas for handling TODO's. Someone posted a simple bash script that handled a TODO file in the local directory. Now, I'm always looking for little things to do in Haskell as learning exercises and this one struck me. So, here it is, my TODO program. It actually has a neat bonus feature in that it doesn't display TODO items that are marked with a "DONE". Someday I need to add a date/time stamp, but it's good enough for now.
module Main( main ) where

import System ( getArgs, system, exitWith )
import System.Console.GetOpt
import System.Posix.Files

main:: IO()
main = do
  args <- getArgs

  case getOpt RequireOrder options args of
    ([Edit], _,       []) -> spawnEditor 
    ([Help], _,       _)  -> putStrLn \$ usageInfo header options ++ description
    ([],     [],      []) -> displayTodo 
    ([],     nonOpts, []) -> addTodo nonOpts
    ([], [],  msgs)       -> error \$ concat msgs ++ usageInfo header options ++ description
    _                     -> error \$ usageInfo header options ++ description

data Flag = Edit | Help

options :: [OptDescr Flag]
options = [ 
 Option ['e'] ["edit"] (NoArg Edit) "edit the todo list",
 Option ['h'] ["help"] (NoArg Help) "display this help"

header :: String
header = "Usage: todo [OPTIONS] [todo item]"

description :: String
description = unlines [ ""
                      , "The default operation is to just cat the TODO file. Calling todo with"
                      , "a non-option argument will add that argument to the TODO file."

todoFile :: String
todoFile = "TODO"

-- TODO -- needs to check for existing emacs process first and run emacsclient in that case...
spawnEditor :: IO ()
spawnEditor = system ("emacs " ++ todoFile) >>= exitWith

addTodo :: [String] -> IO ()
addTodo nonOpts = do
  print "Updating todo list..."
  appendFile todoFile \$ (unwords nonOpts) ++ "\n"

displayTodo :: IO ()
displayTodo =  fileExist todoFile >>= 
               (\a -> if a 
                      then readFile todoFile >>= putStr . removeDones
                      else putStrLn "nothing to do!") 

removeDones :: String -> String
removeDones = unlines . filter (not . contains "DONE") . lines  

contains :: String -> String -> Bool
contains target source = any (== target) \$ words source

It's not the greatest Haskell code I'm sure, but its the first I've done that wasn't pure functional code, it works, and was fun to sort out.

ps. Looks like my attempts at formatting code were less than fully successful. It's all there, just click and drag to select it... sorry

Tuesday, April 14, 2009

Linear linear everywhere....

We had the distinct pleasure of being lectured by Dr. Ron Gentle last week as our regular Calc IV(multi-variable) professor was out of town for two days. Dr. Gentle is known for viewing the entire universe as a linear construct of some sort or another and the subject matter for his lectures, multi-variable chain rule differentiation, was no exception, being distilled down to a simple vector multiplication for any possible type of differentiation out there. I'll probably talk more about that later, but the main point is that Dr. Gentle is right, linear algebra is *everywhere*.

A case in point was today's lecture (from the usual guy, Dr. Dale Garraway) on directional derivatives. To make a long story short, we have already worked through partial derivatives where given $f(x,y)=x^2+y^3$, the first partial derivative in the $x$ direction is $f_x(x,y)=2x$ and the first partial in the $y$ direction is $f_y(x,y)=3y^2$.

Great, but what if you want to know the derivative in some other direction, like, say, in the direction of the vector $\vec{v}=(a,b)$? The general answer is this: $f_{\vec{v}}(x,y)=f_x(x,y)a + f_y(x,y)b$. The astute reader will notice that this is just a linear combination of the two first derivatives. Could it be that the first derivatives just form a basis for a function space defining all the derivatives of this multi-variable function? It sure looks like it.

In fact it looks like essentially a basis conversion from a spatial basis (the vector in $\mathcal{R}^2$ or something like that) to a functional basis giving the derivative. Neat.

Sunday, March 29, 2009

Polynomials with Function Composition

This morning, reading in Computational Category Theory, by Rydeheard et al, I came across this little piece of ML (it's pretty early in the book. It takes me about 45 minutes to read a page, sheesh):

fun poly_eval(f) = f(f(3)) + f(3) + 3

And I got to thinking about how this looks like a linear combination of function compositions. I just finished Linear Algebra so everything looks linear. One of the things I've been trying to wrap my head around is how to express function composition in a linear manner, and while this posting won't address that directly, it may help me down that road.

The question today is how to express polynomials as composition, as in the code above.

Given a standard form for a degree two polynomial: $f(x)=a_1x^2+a_2x+a_3$, let's replace every occurrence of $x$ in the polynomial with a function that multiplies $x$ by its argument. Composing this function with itself would multiply the value of $x$ by something multiplied by the value of $x$.

Sort of like this:

let $f=\lambda a. ax$ where $a$ is the coefficient of the polynomial term. Here $x$ is a free variable and if we provide a coefficient $c$, the result is $cx$. If we compose these functions we can expand them and see what happens.
$\begin{eqnarray} f(f) & = & f (\lambda a. ax)\\& = & (\lambda b. bx)(\lambda\\\end{eqnarray}$

now apply a constant $c$ as our coefficient, and evaluate
$\begin{eqnarray} f(f c) & = & (\lambda b. bx)(\lambda c\\& = & (\lambda b.bx)(cx)\\& = & (cx)x\\& = & cx^2\end{eqnarray}$

and there we have our first term in our polynomial (I'm not up on $\lambda$-notation and such stuff, so please, someone correct me if I've made a mistake in the above).

We can express this in code (thanks to dolio on #haskell) by adopting the convention that $f$ will accept a function and a value where the value will become our free variable (using partial application) and the function will either be other $f$'s (composition) or $const$ $c$, the coefficient.

f::(Num a) => (a -> a) -> a -> a
f g x = g x * x

and these can be composed as many times as we like to get powers of $x$.

*Main> let two_x2 = f(f(const 2))
*Main> :t two_x2
two_x2 :: Integer -> Integer
*Main> two_x2 5

Unfortunately, this forces us to declare the coefficient in advance and then we're stuck with it. It would be nice to be able to decide the coefficient later and still have $x$ as a free variable.

We can do it like this:

x1 = \c -> f(const c)
x2 = \c -> f(f(const c))

Now both $x1$ and $x2$ (for $x$ and $x^2$) can be used to express a polynomial term with an arbitrary coefficient.

*Main> :t x2
x2 :: Integer -> Integer -> Integer
*Main> let five_x2 = x2 5
*Main> :t five_x2
five_x2 :: Integer -> Integer
*Main> five_x2 4
*Main> five_x2 2

Finally, it is simple to build arbitrary polynomials with a free variable

poly = \a1 a2 a3 x -> x2 a1 x + x1 a2 x + a3

where $a1, a2, a3$ are the coefficients. Using partial application we can create a specific polynomial with its coefficients and then apply it to any value for $x$ that we like.

*Main> let polyA = poly 1 2 3
*Main> :t polyA
polyA :: Integer -> Integer
*Main> polyA 5

Here, $polyA(x) = x^2 + 2x + 3$. There we go, arbitrary polynomial expression using function composition. I suppose this is pretty simple stuff, but I find it incredibly fascinating. I'm hoping this will have some application to my thoughts on linear transformations of functions with composition.

Sunday, March 22, 2009

Partial Fraction Decomposition with Linear Algebra

The first semester of linear algebra is an eye-opening lesson in the applicability of mathematics and in the concepts of abstraction. Moving from linear transformations and operators on simple vectors in $\mathcal{R}^2$ or $\mathcal{R}^3$ to the same in general vector spaces is amazing. Extending these ideas into more abstract realms clearly shows the math in something like compilation of computer programs. What is compilation other than a transformation from one vector space (the source code) to another (the object code)? Wow!

Anyway, there were a number of interesting little problems that presented themselves near the end of the semester. One fellow student asked if we could consider partial fraction decomposition as a linear operation. The question was never adequately answered in class (merely due to a misunderstanding, $M^2$ still rocks!), but I think I've worked out one decent version of it.

First, to review, partial fraction decomposition is the process of turning a rational fraction into a sum of simpler, partial, rational fractions. That is, moving from
$\begin{eqnarray} \frac{4x+6}{x^3+6x^2+11x+6} \end{eqnarray}$

$\begin{eqnarray} \frac{1}{x+1}+\frac{2}{x+2}-\frac{3}{x+3} \end{eqnarray}$

In this example, we are using a relatively straightforward, easily factorable, denominator with a degree greater than that of the numerator. Honestly, I haven't, at this point, worked out the details for some of the other types of partial fraction decomposition.

The above decomposition is typically worked out by first factoring the denominator into distinct linear factors, setting up a sum of those factors with variables in the numerators, and multiplying by a common denominator. The result is an equation with the original numerator on the left and polynomial expressions on the right.

$\begin{eqnarray} \frac{4x+6}{x^3+6x^2+11x+6} & = & \frac{4x+6}{(x+1)(x+2)(x+3)}\\ & = & \frac{A}{x+1}+\frac{B}{x+2}+\frac{C}{x+3}\\ 4x+6 & = & A(x+2)(x+3)+B(x+1)(x+3)\\ & & +C(x+1)(x+2)\end{eqnarray}$

At this stage, we expand all the expressions on the right, regroup like terms and extract a set of equations representing the coefficients of like terms as follows

$\begin{eqnarray} 4x+6 & = & A(x^2+5x+6)+B(x^2+4x+3)+C(x^2+3x+2)\\ 4x+6 & = & (A+B+C)x^2+(5A+4B+3C)x+(6A+3B+2C)\\\end{eqnarray}$

$\begin{eqnarray} A+B+C & = & 0\\ 5A+4B+3C & = & 4\\ 6A+3B+2C & = & 6\end{eqnarray}$

Now we just solve the system of equations using any method we like to find the three values to use as numerators in the decomposition. It is easy to see a linear solution. We have a system of three equations with three unknowns. We could easily solve this using an augmented coefficient matrix and row reduction.

But there is a better way. Back up to this step.

$\begin{eqnarray} 4x+6=A(x^2+5x+6)+B(x^2+4x+3)+C(x^2+3x+2)\end{eqnarray}$

If you think about this for a minute, this looks rather like a representation of one expression, the polynomial on the left, in terms of a different set of polynomials on the right. And in fact, it is. Also note that the polynomials on the right are the expansions of each multiplicative combination (less one) of the linear factors of our original denominator.

We can look at this as a coordinate change operation. The expression on the right is a coordinate vector of the expression on the left in some basis other than the standard basis for $P$ (polynomials). If you don't see it yet, you will, so follow along.

One way to represent polynomials is as a vector whose elements represent the various terms in a standard polynomial. This is a basic linear transformation often used in linear algebra texts. It looks like this.

$\begin{eqnarray} & T : P^2\rightarrow\mathcal{R}^3\\ & T(f(x)) = \left[ \begin{array}{c} a_1\\ a_2\\ a_3 \end{array}\right]\end{eqnarray}$

$\begin{eqnarray} f(x) = a_1 + a_2x + a_3x^2\end{eqnarray}$

Using this common transformation, we can rewrite each polynomial in our equation above as a vector in $\mathcal{R}^3$.

$\begin{eqnarray} 4x+6 & = & \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] \\x^2+5x+6 & = & \left[ \begin{array}{c} 6\\5\\1 \end{array} \right] \\x^2+4x+3 & = & \left[ \begin{array}{c} 3\\4\\1 \end{array} \right] \\x^2+3x+2 & = & \left[ \begin{array}{c} 2\\3\\1 \end{array} \right]\end{eqnarray}$

And now it is quite clear that this is a basis change, expressing a vector in terms of a different basis.

$\begin{eqnarray} \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] = A\left[ \begin{array}{c} 6\\5\\1 \end{array} \right] + B\left[ \begin{array}{c} 3\\4\\1 \end{array} \right] + C\left[ \begin{array}{c} 2\\3\\1 \end{array} \right]\end{eqnarray}$

We can develop a coordinate change matrix $[T]_{\mathcal{B}}$ as the inverse of the concatenation of the vectors on the righthand side. In coordinate changes, we use the equation $[T]_{\mathcal{B}}\vec{v_{\mathcal{B}}}=\vec{v}$. With this we can easily solve this partial fraction decomposition thus

$\begin{eqnarray}& [T]_\mathcal{B} = \left[ \begin{array}{c c c} 6 & 3 & 2\\ 5 & 4 & 3\\ 1 & 1 & 1 \end{array} \right]^{-1} = \left[ \begin{array}{c c c} \frac{1}{2} & -\frac{1}{2} & \frac{1}{2}\\ -1 & 2 & -4\\ \frac{1}{2} & -\frac{3}{2} & \frac{9}{2} \end{array} \right] \\& \vec{v_{\mathcal{B}}} = \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] \\& [T]_\mathcal{B}\vec{v_{\mathcal{B}}} = \left[ \begin{array}{c c c} \frac{1}{2} & -\frac{1}{2} & \frac{1}{2}\\ -1 & 2 & -4\\ \frac{1}{2} & -\frac{3}{2} & \frac{9}{2} \end{array} \right] \left[ \begin{array}{c} 6\\4\\0 \end{array} \right] = \left[ \begin{array}{c} 1\\2\\-3 \end{array} \right] = \left[ \begin{array}{c} A\\B\\C \end{array} \right]\end{eqnarray}$

and use the $A$, $B$, $C$ result to complete our decomposition as

$\begin{eqnarray} \frac{4x+6}{x^3+6x^2+11x+6} & = & \frac{1}{x+1}+\frac{2}{x+2}+\frac{-3}{x+3}\\\end{eqnarray}$

Well, this is certainly a lot of work for a single application. But, using $[T]_\mathcal{B}$, we can perform any partial fraction decomposition for a rational fraction with this denominator by simple matrix multiplication. In other words, we can repeat this operation, using a different numerator, easily and quickly by plugging in the vector representation of the new numerator above. This should work for any numerator with a degree equal or less than that of the denominator.

Likewise, being an invertible operation, we can add any decomposed fractions with these factors as denominators without the tedious polynomial expansion commonly used. We simply multiply $[T]_{\mathcal{B}}^{-1}$ by the vector representation of $A$, $B$, $C$ and it will spit out the vector form of the numerator. Cool!

This technique really only works, at this point, for rational fractions with easily factored denominators. It may be extendable for use in other cases, but I haven't done the work yet to investigate how easy that might be.