Sunday, August 8, 2010

A Haskell approach to Huffman Coding

I just ran across Heinrich Apfelmus' discussion of morse code and I think it's pretty cool. I like the progression from a straightforward initial approach to a very clever result. I was really struck by the similarities of parsing morse code with a tree and Huffman Coding. My goal this week is to try to apply Apfelmus' technique to Huffman coding.

There are some important distinctions between the morse code tree and Huffman Coding. First, in Huffman coding, the leaves are important. The symbols are stored in the leaves of the tree instead of in the nodes as in the morse code solution. Additionally, the Huffman tree structure itself encodes the frequency of the symbols in the source, a consideration that is not required in the morse code. This changes the interpreter required for constructing the tree. Also, the Huffman coding tree should be constructed specifically for the data set it is being used to encode. For a reasonably consistent data set, like large amounts of English text, the frequency of symbols for the language in general can certainly be used with reasonable results. But, for encoding other kinds of data, ideally the source should be used to generate frequencies of symbols unique to that source. I'm probably going to ignore this and just use some reasonable standard frequencies instead.

So, that's my plan for the week, we'll see how it goes.

No comments:

Post a Comment