Wednesday, September 30, 2009

Projection Pursuit in Haskell, pt 3

Breakthrough.

Sometimes problem solving requires just a lot of thought. And I don't mean brute force thought, but the gentle, patient thought that comes from relaxing about a problem and just letting your brain work on it for a while. I find I come up with some of my best ideas in the shower. Yeah, I know... But, when something has really been bothering me -- remaining unsolved despite my best efforts -- the solutions typically come at night. Something wakes me up and then I'm up and my mind won't stop and then the solution comes.

Such it was last night. I haven't had a lot of time to put into this effort lately, and what effort I've made has been frustrating. If you look at the original Octave source code you'll see that there are a lot of parameters controlling the algorithm. We've got the raw data itself, the size of the gradient probe, the proportion of the gradient to use, the number of iterations. It's a lot of stuff to cart around. But, I'll get to that in a moment.

First, I had spent a lot of time writing some matrix and vector operators. This was a good exercise, though I've ultimately thrown all that code out. One crucial part of this algorithm is the use of the right singular vectors from the singular value decomposition. I certainly didn't want to write an svd function for this exercise. Google to the rescue! I found hmatrix, and a solution to all my matrix handling problems. So, using hmatrix caused me to dig back through the bits of code I'd already written. At the core of the projection pursuit algorithm is the calculation of the gradient. This solution uses an estimation of the gradient by probing in all dimensions and measuring the change in the kurtosis.

So, let's define some helper functions:

{-- calculate modified kurtosis --}
kurtosis :: Vector Double -> Double
kurtosis y = mean(y) ** 4 - 3

{-- calculate the mean of a vector --}
mean :: Vector Double -> Double
mean y = go 0 0 $ toList y
where
go :: Double -> Int -> [Double] -> Double
go s l [] = s / fromIntegral l
go s l (x:xs) = go (s+x) (l+1) xs


{- create a list of vectors to probe h distance in all directions -}
hs :: Double -> Int -> [Vector Double]
hs h r = toRows . diag . fromList . take r $ repeat h



Using these, we can calculate the estimated gradient:

{-- calculate the gradient of a function given a bunch of starting information                           
f - the function to compute gradient of
h - the probe distance... the smaller the better
z - the raw data set needed for future projections
w - a starting vector/point at which to calculate the gradient --}

gradient :: (Vector Double -> Double) -> Double -> Matrix Double -> Vector Double -> Vector Double
gradient f h z w = probe ws
where
ws = map ((+) w) . hs h $ rows z
k = f $ w <> z

probe :: [Vector Double] -> Vector Double
probe = fromList . probe'

probe' :: [Vector Double] -> [Double]
probe' [] = []
probe' (x:xs) = (deltaK x) / h : (probe' $ xs)

deltaK :: Vector Double -> Double
deltaK x = k - f (x <> z)


This is really pretty straightforward. We calculate a starting kurtosis value, generate a bunch a vectors that shift h-amount in all dimensions, calculate the kurtosis for each of those and concatenate them into a new vector for the gradient.

The real breakthrough for me came when I was struggling with how ugly it was going to be to pass around all this junk. Just ugly and I can't stand it. So last night, while I was lying awake fretting about this, it finally hit me. I don't have to pass around all this stuff. I can just partially apply the gradient function to all the algorithm control parameters, and the raw data, and then just pass that around. So at some point I'll do this:


{-- build a gradient function for *this* problem --}
grad :: Vector Double -> Vector Double
grad = gradient kurtosis h z


Now I can just pass around grad and it will carry around all the baggage for me. I know. This is a really simple part of haskell, and it's something that is done all the time. I've certainly done it many times. But for some reason it eluded me for a really long time. The real problem was one of not thinking about this function properly. Yes, all that data is needed to calculate the gradient. But I am not calculating the gradient of all of this stuff as variables. The function, the raw data and the h distance are all fixed points within the context of the problem. That is, there is only one true variable in this application of the gradient function and that is the current position, w. All along I had been thinking about h and f and z as variables and thus I had to keep passing them around. And they are variables, within the context of the whole algorithm, but at the level of the gradient, they aren't. Huh. It took me along time to bend my brain around to seeing this the right way.

Anyway, there it is. Now that school is back in, I get stolen moments on the bus to work on this, so hopefully the rest will come soon.

The Summer of Chaos

So, time to reminisce about the summer just past. Wow! It's amazing how life throws you curve balls.

The Plan. Heading out of last spring, I was anticipating a summer of general-ed requirements to speed me on my way towards graduation, a little programming work on the side, research on grad-school programs and lots of time for personal projects to be posted here. This was to go alongside taking care of my three daughters and generally being "summery." Cool. Good plan. A little ambitious, but doable.

The Reality. Business took a nose-dive. So much so that I had to go back behind the bar for the summer to keep things afloat. We pared down to a skeleton staff with me working 4 shifts a week slinging beers. That's just what needed doing.

Meanwhile, I did get some side work, technically an internship, though I'm not bothering to take the educational credit, just the CV listing. I spent some time learning Python, flexing my Debian admin skills and playing around with EWU's Living Laboratory project under Dr. Jeff Putnam. It was a fun project. I got pretty comfortable with Python, the basics of Postgresql, and Django. Also, I got to play with specifying and implementing the simplest of Domain Specific Languages to allow user-entered code to run on our server. That project is ongoing, though I've got precious little time for it now that school is back in. Hopefully, I'll get to extend (or create anew) that language for specifying graphs as well.

So, personal projects, grad-school research? All by the wayside. My plans to port my Projection Pursuit algorithm to Haskell got pushed back. A *ton* of reading (Computational Category Theory, A Transition to Advanced Mathematics, Goedel, Escher, Bach, etc.) postponed indefinitely. Grad school selection is only getting started now, and that is such a daunting task! Dr. Putnam suggests I find blogs I like, follow all of their blog links and assemble this graph of blogs and schools and then run a clustering algorithm looking for commonality and hope that helps me narrow down the choices. Hah! That would probably work really well, but given the time available, would take until long past the application deadlines. Plus, I have to prep for the GRE's in a week! Wah!

The Results. Well, what can I say? It is what it is, and I keep marching ahead. Things never go quite the way you expect, but in the end, they work out. I think. I'm getting that Projection Pursuit done. I'm keeping up with my studies. I'm gradually narrowing down the choices for grad-school applications. And somehow I'm still enjoying my family and keeping food on the table. Not bad :-)