PAQing it all up!

PAQing it all up!

by ferris

This week I did more packer work, specifically on compression. It mainly consisted of a lot of information absorption (and loss of sleep) trying to wrap my head around the terminology/workings of the PAQ series of compressors, but by the weekend I had gone through enough to start implementing :)

At this point my brain's a bit tired and I need a bit of a break from things, so this will be a quick writeup, but here goes.

I originally thought I'd go for a simple LZ77 + arithmetic coder scheme, as these are things I'm at least familiar with. But I thought it'd be better for learning and compression ratio if I dug into some more advanced algorithms.

What I ended up implementing was based on PAQ, crinkler and kkrunchy where it's an arithmetic coder that uses predictions from multiple predictors mixed together somehow. In my case I went with a neural network mixer (more or less ripped from PAQ), and the models are pretty straightforward context models, plus some ones I tried to build that were x86-aware but are still pretty naive, even though they gain a few percent. I'm skipping over a lot of the details as they can be found elsewhere, and my implementation isn't really mature enough to decide on them yet anyways. Also, I do plan to open-source this at one point, so yeah. Will probably do some more hearty explanation then.

All in all, at the end of the week I've got up to ~77% compression on my test executables (well, the part that gets compressed, but that's a detail for later), which is pretty decent for how simple the method is. It's still around 5% worse than kkrunchy (which is a few percent worse than PAQ), but for the time spent I'm pretty happy with it. I'm sure with another few weeks I can improve that considerably as well :)

I also spent some time porting the decompressor + model to ASM for the packer stub, and got through the arithmetic decoder in 98 bytes of very simple asm. But I realized very quickly that the method/tools I'm using to pack up the stub don't provide any mechanism for uninitialized data sections, which I'll need for the full model. So I think I'll need to support a simple object format and do a tiny linker to fix that. Currently I'm looking into nasm's RDOFF, as it's supposedly a very simple and minimal format, but documentation is a bit scarce, so we'll see how that goes.

Anyways, still pretty close to something usable, but given the time budget I've got until the next party, I might have to drop this project until then. Which I'm fine with; my original compression goal was 75% (which I beat in a couple days after learning some new techniques, really), and I've got some ideas brewing for how I can improve that, just need to spend a few more weeks on that, which I don't want to do over the visual tech I have to work on. So yeah. :)

In other news, I've also been working on finishing up a nice glitch/ambient track for a demo with some friends of mine. Stay tuned for that :)

Until next time!

Last Edited on Tue Apr 04 2017 10:18:09 GMT-0400 (EDT)