User Tools

Site Tools


blog:pushbx:2025:0717_lzexe_data_compression_format

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

blog:pushbx:2025:0717_lzexe_data_compression_format [2025-07-17 14:21:16 +0200 Jul Thu] (current)
ecm created
Line 1: Line 1:
 +====== LZEXE data compression format ======
 +
 +Yesterday [[https://hg.pushbx.org/ecm/inicomp/rev/a8bcc2eb761f|I added the LZEXEDAT compression method]] to inicomp. This currently uses a plain LZEXE format data stream, without any Meta data.
 +
 +The compression is done by [[https://hg.pushbx.org/ecm/lzexe/rev/a18c1831d7b3|lzexedat.exe]], a DOS tool that generates the headerless compressed stream. I added [[https://hg.pushbx.org/ecm/lzexe/rev/870593256cf9|the lzexedat.sh script]] which deals with leading dotdots then calls dosemu2. (The redirected drive does not allow to use a dotdot to go beyond its root directory, so the script will change the directory of the dosemu2 drive and strip the leading dotdots.)
 +
 +===== The depacker =====
 +
 +The depacker worked immediately on the first attempt. There is [[https://hg.pushbx.org/ecm/inicomp/rev/68d05d3f620c|a subsequent change to the file]] but it only changes a comment.
 +
 +I implemented the depacker based on [[https://pushbx.org/ecm/doc/lzexe.htm#format|my description of the format]] in the LZEXE fork's documentation. Unlike the original online depacker, this uses the inicomp overlap detection and also checks for all sorts of overflows.
 +
 +
 +===== The benchmark =====
 +
 +Despite the additional checks, the LZEXEDAT depacker is still reasonably fast. The following is a comparison of almost all inicomp methods on compressing the current lCDebugX as a triple-mode executable. The test was done on our amd64 Debian server, without KVM.
 +
 +Scriptlet used to run the build, compression, and speed tests: ''INICOMP_SPEED_TEST=128 use_build_help_external=1 use_build_help_compressed=1 INICOMP_METHOD="brieflz lz4 snappy exodecr x heatshrink lzd lzo lzsa2 apl bzp zerocomp mvcomp lzexedat" use_build_compress_only=0 build_name=cdebugx time -p ./mak.sh -D_PM=1 -D_DEBUG -D_DEBUG_COND -D_DEFAULTSHOWSIZE''
 +
 +lDebug revision: [[https://hg.pushbx.org/ecm/ldebug/rev/0d6619e128e6|0d6619e128e6]]
 +
 +The results:
 +
 +<code>-rw-r--r-- 1 98816  ../bin/lcdebugx.com
 +real 490.89
 +user 484.39
 +sys 5.73
 +ldebug/source$ LC_ALL=C sort ../tmp/cdebugx.siz
 +   98816 bytes ( 68.19%), method              lzd
 +  102400 bytes ( 70.67%), method          exodecr
 +  104448 bytes ( 72.08%), method              apl
 +  105984 bytes ( 73.14%), method            lzsa2
 +  113664 bytes ( 78.44%), method              lzo
 +  114176 bytes ( 78.79%), method         lzexedat
 +  117248 bytes ( 80.91%), method              lz4
 +  117760 bytes ( 81.27%), method              bzp
 +  121344 bytes ( 83.74%), method       heatshrink
 +  121856 bytes ( 84.09%), method           mvcomp
 +  122368 bytes ( 84.45%), method          brieflz
 +  131072 bytes ( 90.45%), method           snappy
 +  140800 bytes ( 97.17%), method         zerocomp
 +  144896 bytes (100.00%), method             none
 +ldebug/source$ LC_ALL=C sort ../tmp/cdebugx.spd
 +    1.17s for 128 runs (    9ms / run), method         zerocomp
 +    4.65s for 128 runs (   36ms / run), method           snappy
 +   11.21s for 128 runs (   87ms / run), method              lz4
 +   11.22s for 128 runs (   87ms / run), method              bzp
 +   11.60s for 128 runs (   90ms / run), method           mvcomp
 +   11.63s for 128 runs (   90ms / run), method            lzsa2
 +   11.66s for 128 runs (   91ms / run), method         lzexedat
 +   14.43s for 128 runs (  112ms / run), method              lzo
 +   32.72s for 128 runs (  255ms / run), method          brieflz
 +   35.71s for 128 runs (  279ms / run), method          exodecr
 +   36.34s for 128 runs (  283ms / run), method              apl
 +   58.47s for 128 runs (  456ms / run), method       heatshrink
 +  132.34s for 128 runs ( 1033ms / run), method              lzd
 +ldebug/source$ grep -F 'warning: ini' ../tmp/*/pcdebugx.lst | perl -ne 's/^.*warning: (.*) \[-w\+user\]$/$1/; /^([a-zA-Z0-9]+): ([0-9]+)/; printf("%6u $1\n", $2)' | LC_ALL=C sort
 +  1456 inibzp
 +  1488 inizero
 +  1504 inimv
 +  1552 inilzexe
 +  1648 inihs
 +  1744 iniapl
 +  1840 iniexo
 +  1840 inilzsa2
 +  1936 inilz4
 +  1952 inisz
 +  2032 iniblz
 +  2592 inilzo
 +  3424 inilz
 +ldebug/source$</code>
 +
 +The current LZEXEDAT has a smaller depacker than LZSA2, is almost as fast, and compresses about 5% worse. In comparison to heatshrink, its depacker is smaller, much faster, and it compresses about 5% smaller.
 +
 +Perhaps we may use LZEXEDAT for compressing the debugger's online help pages or the packed ELD library. It remains to be seen whether it will compress equally well for these tasks. But the window is a fixed 8 KiB currently, whereas we use 4 KiB for some heatshrink uses.
 +
 +{{tag>lzexe inicomp lzexedat lzsa2 heatshrink ldebug eld}}
 +
 +
 +~~DISCUSSION~~
  
blog/pushbx/2025/0717_lzexe_data_compression_format.txt ยท Last modified: 2025-07-17 14:21:16 +0200 Jul Thu by ecm