by ferris

MUSHROOM MUSHRO... wait

I've been off and on with writing these lately; that kinda sucks, so gonna try to write some stuff today and be better at this moving forward. This is for a couple reasons, one of which being me trying to keep what I'm currently working on a bit secret. But the balance has to be a bit more on the open side I think, as I've been doing some really neat stuff (or at least neat to me, hehe)! :)

Basically it's stuff for solskogen. I don't want to share too much about the project itself, but part of it involves a packer, and ironically it's not the 64k packer I've been working on previously, but an LZ-based on instead! Since I've been learning so much about compression in the last little while (and a bit last summer as well) it only feels natural that I work on multiple packers at once. PACKERS PACKERS PACKERS

One of the interesting things about LZ packers that I always overlook when thinking about them is that the searching/parsing is actually somewhat interesting. Particularly with the packer I'm working on, we have to find possible match edges once, and then recompress the data multiple times, tweaking the encoding each time to try and find an optimal one. Since we don't want to take several minutes to recompress each iteration (ideally we don't want to take more than a second or two in the optimal case), we have to be a bit smart here.

The scheme I ended up going with has an initial parse phase that essentially augments the input data with a DAG structure, where edges represent possible matches from a given position. The only thing we have to be careful about here is a really nasty degenerate case: if we try to perform a naive match search through a long string of the same value (which *is* going to be a problem since we're compressing binary data, where some data has to exist at specific offsets), our execution time actually goes up to N^3, since for each position (N) we have to look at each possible previous position where a potential match is (N) and then test all the matching positions after that for a match (N again)! However, this is fairly easy to work around if we allow ourselves to be *slightly* suboptimal here by just greedy accepting long strings as a big match. In my tests this only changes the ratio by a *very* small amount (+/- 8 bytes or so) so it's quite acceptable, and can be the difference between taking 60 seconds and 0.6 seconds to parse, so well worth doing.

Once we have our input and DAG, then we can try some encodings and find bit-optimal parsings for these encodings. This turns out to be equivalent to a single-source-shortest-path problem with inherently topologically sorted DAG nodes, and there are plenty of nice algorithms for solving this (I went with the latter half of Bellman-Ford as it was really convenient to implement, and I figured we'd end up exhaustively trying the edges anyways with Dijkstra due to how the graphs look anyways). When we find these parsings (which are represented by paths through our graph) we can then try to tweak our encoding to see if we can optimize the data size further. This forms a little feedback loop that ends up being pretty effective.

The thing I struggled a bit with was how to adaptively build an encoding scheme for a set of edges, but it turns out to be not that difficult, although the implementation is a *bit* hairy admittedly because speed can vary wildly based on the approach. Basically, we want to encode our LZ distance/length pairs so that more common distance/length values will be encoded with less bits than less common ones. This sounds like a job for Huffman codes, but for this data size we'd end up spending too much space encoding the huffman table, so that's out. Universal codes ) are a step in the right direction (particularly Elias gamma codes), since typically smaller values are a lot more likely than larger ones.

It turns out that a good solution is going to be some sort of middle ground here. One compressor I ended up looking at (and stealing a lot of ideas from) was exomizer - a popular compressor in the demoscene due to its consistently very good ratios and the fact that its decompressors have been optimized to run in very limited execution environments. What exomizer does is represent an encoding as a set of ranges that form a contiguous range of integers. Each range has a minimum value along with a number of bits to read further to obtain an offset from the minimum value. This offset/bias scheme is very similar to universal codes (especially when we encode table indices as unary codes, similar to exponents in gamma coding), but since it's adaptive to the input, its implicit symbol probability ends up being much closer to the actual probabilities in the input. Like huffman codes, it transmits the encoding info along with the compressed data (as simply the number of additional bits for each range, which ends up being 4 bits * 16 entries = 8 bytes in most cases). This ends up being much smaller than the equivalent huffman code data at the expense of being slightly less optimal, but appears to be a very well-balanced compromise (and easily outperforms universal codes). Note that I'm leaving out some details here ofc, but it's mostly important to understand the rough details here. Also, interestingly enough, exomizer's author claims many of the ideas used in this particular kind of encoding actually came from previous demoscene packers, although I've had a hard time finding information about similar encodings elsewhere (perhaps there are some key terms here I'm missing).

To find such a range, exomizer (and my compressor as well) ends up performing a brute-force search of available range combinations that can encode the whole available range of integers and matches the given integer frequencies. Since the space is large (but solution-rich) the implementation details are fairly crucial here. A basic recursive implementation, while not requiring much work memory, will take no less then several thousand human lifetimes to complete (roughly). So we have to be a bit clever. Luckily, we don't have to be *that* clever. If we extend a basic search to include sub-result caching and use the appropriate structures to represent these sub results (i.e. immutable singly-linked lists), we can get this search down to something bearable (a second or two, max). And this ended up actually being pretty fun to do in Rust, especially with so many nice off-the-shelf libs available. :)

All in all, in a few weeks I've managed to build a pretty competitive packer that lands the stuff I've worked on so far almost exactly where I had estimated (and hoped), which is really nice. So the next phase will be on some more creative parts of the project, which you'll see at Solskogen :)

Phew, this was a bit more scatterbrained than I had hoped, but it's always good to braindump a bit, and that's part of the charm, right? :) if you have any additional questions, you know where to find me..

Last Edited on Mon Jun 19 2017 13:59:47 GMT-0400 (EDT)