User Tools

Site Tools


blog:pushbx:2025:0717_lzexe_data_compression_format

LZEXE data compression format

Yesterday I added the LZEXEDAT compression method to inicomp. This currently uses a plain LZEXE format data stream, without any Meta data.

The compression is done by lzexedat.exe, a DOS tool that generates the headerless compressed stream. I added the lzexedat.sh script which deals with leading dotdots then calls dosemu2. (The redirected drive does not allow to use a dotdot to go beyond its root directory, so the script will change the directory of the dosemu2 drive and strip the leading dotdots.)

The depacker

The depacker worked immediately on the first attempt. There is a subsequent change to the file but it only changes a comment.

I implemented the depacker based on my description of the format in the LZEXE fork's documentation. Unlike the original online depacker, this uses the inicomp overlap detection and also checks for all sorts of overflows.

The benchmark

Despite the additional checks, the LZEXEDAT depacker is still reasonably fast. The following is a comparison of almost all inicomp methods on compressing the current lCDebugX as a triple-mode executable. The test was done on our amd64 Debian server, without KVM.

Scriptlet used to run the build, compression, and speed tests: INICOMP_SPEED_TEST=128 use_build_help_external=1 use_build_help_compressed=1 INICOMP_METHOD="brieflz lz4 snappy exodecr x heatshrink lzd lzo lzsa2 apl bzp zerocomp mvcomp lzexedat" use_build_compress_only=0 build_name=cdebugx time -p ./mak.sh -D_PM=1 -D_DEBUG -D_DEBUG_COND -D_DEFAULTSHOWSIZE

lDebug revision: 0d6619e128e6

The results:

-rw-r--r-- 1 98816  ../bin/lcdebugx.com
real 490.89
user 484.39
sys 5.73
ldebug/source$ LC_ALL=C sort ../tmp/cdebugx.siz
   98816 bytes ( 68.19%), method              lzd
  102400 bytes ( 70.67%), method          exodecr
  104448 bytes ( 72.08%), method              apl
  105984 bytes ( 73.14%), method            lzsa2
  113664 bytes ( 78.44%), method              lzo
  114176 bytes ( 78.79%), method         lzexedat
  117248 bytes ( 80.91%), method              lz4
  117760 bytes ( 81.27%), method              bzp
  121344 bytes ( 83.74%), method       heatshrink
  121856 bytes ( 84.09%), method           mvcomp
  122368 bytes ( 84.45%), method          brieflz
  131072 bytes ( 90.45%), method           snappy
  140800 bytes ( 97.17%), method         zerocomp
  144896 bytes (100.00%), method             none
ldebug/source$ LC_ALL=C sort ../tmp/cdebugx.spd
    1.17s for 128 runs (    9ms / run), method         zerocomp
    4.65s for 128 runs (   36ms / run), method           snappy
   11.21s for 128 runs (   87ms / run), method              lz4
   11.22s for 128 runs (   87ms / run), method              bzp
   11.60s for 128 runs (   90ms / run), method           mvcomp
   11.63s for 128 runs (   90ms / run), method            lzsa2
   11.66s for 128 runs (   91ms / run), method         lzexedat
   14.43s for 128 runs (  112ms / run), method              lzo
   32.72s for 128 runs (  255ms / run), method          brieflz
   35.71s for 128 runs (  279ms / run), method          exodecr
   36.34s for 128 runs (  283ms / run), method              apl
   58.47s for 128 runs (  456ms / run), method       heatshrink
  132.34s for 128 runs ( 1033ms / run), method              lzd
ldebug/source$ grep -F 'warning: ini' ../tmp/*/pcdebugx.lst | perl -ne 's/^.*warning: (.*) \[-w\+user\]$/$1/; /^([a-zA-Z0-9]+): ([0-9]+)/; printf("%6u $1\n", $2)' | LC_ALL=C sort
  1456 inibzp
  1488 inizero
  1504 inimv
  1552 inilzexe
  1648 inihs
  1744 iniapl
  1840 iniexo
  1840 inilzsa2
  1936 inilz4
  1952 inisz
  2032 iniblz
  2592 inilzo
  3424 inilz
ldebug/source$

The current LZEXEDAT has a smaller depacker than LZSA2, is almost as fast, and compresses about 5% worse. In comparison to heatshrink, its depacker is smaller, much faster, and it compresses about 5% smaller.

Perhaps we may use LZEXEDAT for compressing the debugger's online help pages or the packed ELD library. It remains to be seen whether it will compress equally well for these tasks. But the window is a fixed 8 KiB currently, whereas we use 4 KiB for some heatshrink uses.

You could leave a comment if you were logged in.
blog/pushbx/2025/0717_lzexe_data_compression_format.txt · Last modified: 2025-07-17 14:21:16 +0200 Jul Thu by ecm