I ran some more meaningful tests comparing the heatshrink and lzexedat depackers. All of these tests are on the amd64 server without KVM. The help and ELD depack tests are with the inline literal store optimisation enabled, apparently speeding up the runs by about 10%, for both heatshrink and lzexedat.
This test was run with the lDebug mak script using the following command line:
ldebug/source$ use_build_compress_only=1 build_name=cdebugx \ INICOMP_SPEED_TEST=128 INICOMP_METHOD="heatshrink lzexedat" \ time -p ./mak.sh -D_PM=1 -D_DEBUG -D_DEBUG_COND -D_DEFAULTSHOWSIZE
The results, cdebugx.spd:
58.80s for 128 runs ( 459ms / run), method heatshrink 11.81s for 128 runs ( 92ms / run), method lzexedat
For reference, the file sizes in cdebugx.siz:
144896 bytes (100.00%), method none 119808 bytes ( 82.68%), method heatshrink 113152 bytes ( 78.09%), method lzexedat
With the heatshrink faster optimisation applied, we instead get cdebugx.spd.faster:
35.45s for 128 runs ( 276ms / run), method heatshrink 11.91s for 128 runs ( 93ms / run), method lzexedat
From a factor of 5 this goes down to a factor of 3 for the heatshrink time compared to the lzexedat time.
The size is in cdebugx.siz.faster, which is the same as without the faster option:
144896 bytes (100.00%), method none 119808 bytes ( 82.68%), method heatshrink 113152 bytes ( 78.09%), method lzexedat
The following file is run2.sh which is used to run the help depack test:
[ -z "$DOSEMU" ] && DOSEMU=dosemu [ -z "$INICOMP_METHOD" ] && INICOMP_METHOD="hs lze" [ -z "$INICOMP_SPEED_TEST" ] && INICOMP_SPEED_TEST="256" rm -f test2.spd for method in $INICOMP_METHOD; do start="$(date +%s.%3N)" "$DOSEMU" < /dev/null -K "${PWD}" \ -E "$method\\lcdebugx /c=rv2:=#$INICOMP_SPEED_TEST;ytest2.sld" \ -dumb -quiet -te 2> /dev/null end="$(date +%s.%3N)" duration="$(echo "scale=3; $end - $start" | bc)" printf " %7.2fs for %3u runs (%$((5 + ${INICOMP_SPEED_SCALE:-0}))sms / run), method %16s\n" \ "$duration" "$INICOMP_SPEED_TEST" \ "$(echo "scale=${INICOMP_SPEED_SCALE:-0}; $duration * 1000 / $INICOMP_SPEED_TEST" | bc)" \ "$method" \ | tee -a "test2.spd" done
It was created by picking from lDebug's mak.sh.
The test2.sld used by the above contains this source text:
if(!v2)then,rv2:=#16 rv1:=0 uninstall,paging ext quiet.eld install quiet on :loop ? ?f ?r ?c ?run ?e ?v ?re ?b ?l ?source r,v1,+=1 if(v1!=v2)then,goto:loop quiet off q
It uses the quiet.eld to suppress all output during the run, so that the time spent is primarily in the depacking instead of the display of the depacked text. Then it depacks almost every text of the online help, in a loop which runs for V2 iterations.
A test result of this is in test2.spd, reading as follows:
23.80s for 1024 runs ( 23ms / run), method hs 17.17s for 1024 runs ( 16ms / run), method lze
As it turns out, heatshrink again takes about 1.5 times as long as lzexedat -4 (same as the extpak.eld test) in a better test than in the prior blog post.
The file sizes are similar for both choices:
test$ ls -lgG lze/* hs/* --sort=time -rw-r--r-- 1 104960 Jul 22 20:41 lze/lcdebugx.com -rw-r--r-- 1 144896 Jul 22 20:41 lze/lcdebugxu.com -rw-r--r-- 1 105984 Jul 22 20:40 hs/lcdebugx.com -rw-r--r-- 1 144896 Jul 22 20:40 hs/lcdebugxu.com test$
Again, as before, the packed size is 1024 bytes shorter for the lzexedat packed help, while the unpacked size matches exactly.
The script file is run3.sh:
[ -z "$DOSEMU" ] && DOSEMU=dosemu [ -z "$INICOMP_METHOD" ] && INICOMP_METHOD="hs lze" [ -z "$INICOMP_SPEED_TEST" ] && INICOMP_SPEED_TEST="1" rm -f test3.spd for method in $INICOMP_METHOD; do cp -a ext"$method".eld extpak.eld start="$(date +%s.%3N)" "$DOSEMU" < /dev/null -K "${PWD}" \ -E "lze\\lcdebugx /c=rv2:=#$INICOMP_SPEED_TEST;ytest3.sld" \ -dumb -quiet -te 2> /dev/null end="$(date +%s.%3N)" duration="$(echo "scale=3; $end - $start" | bc)" printf " %7.2fs for %3u runs (%$((5 + ${INICOMP_SPEED_SCALE:-0}))sms / run), method %16s\n" \ "$duration" "$INICOMP_SPEED_TEST" \ "$(echo "scale=${INICOMP_SPEED_SCALE:-0}; $duration * 1000 / $INICOMP_SPEED_TEST" | bc)" \ "$method" \ | tee -a "test3.spd" done
The Script for lDebug file is a little shorter than the prior test, in test3.sld:
if(!v2)then,rv2:=#1 rv1:=0 uninstall,paging ext quiet.eld install quiet on :loop ext list.eld extpak.eld lib verbose help r,v1,+=1 if(v1!=v2)then,goto:loop quiet off q
This test, it turns out, takes more than a dozen seconds per run. These are the results, test3.spd, of running 4 iterations either with the heatshrink packed or lzexedat -4 packed extpak.eld:
113.41s for 4 runs (28352ms / run), method hs 73.74s for 4 runs (18434ms / run), method lze
The following shows the sizes of the different ELD library files:
test$ ls -lgG extlib.eld exths.eld extlze.eld --sort=time -rw-r--r-- 1 253072 Jul 24 14:09 extlib.eld -rw-r--r-- 1 156556 Jul 23 21:22 exths.eld -rw-r--r-- 1 151990 Jul 23 21:21 extlze.eld test$
The depackers of the packed files are almost the same size, as listed by list.eld's verbose mode:
test$ dos 'lze\\lcdebugx /c=ext,list.eld,extlib.eld,verbose;q' About to Execute : lze\lcdebugx /c=ext,list.eld,extlib.eld,verbose;q ~&ext,list.eld,extlib.eld,verbose extlib.eld: EXT LIB: library of ELDs. Format: "ELD1", ELD name: "EXTLIB" 3936 code, 3936 alloc, 1360 data, 1376 alloc Library contains 60 ELDs, format 0 (uncompressed) ~&q test$ dos 'lze\\lcdebugx /c=ext,list.eld,exths.eld,verbose;q' About to Execute : lze\lcdebugx /c=ext,list.eld,exths.eld,verbose;q ~&ext,list.eld,exths.eld,verbose exths.eld: EXT LIB: library of ELDs. Format: "ELD1", ELD name: "EXTLIB" 4896 code, 4896 alloc, 1360 data, 5760 alloc Library contains 60 ELDs, format 2 (heatshrink), packed size 150220 ~&q test$ dos 'lze\\lcdebugx /c=ext,list.eld,extlze.eld,verbose;q' About to Execute : lze\lcdebugx /c=ext,list.eld,extlze.eld,verbose;q ~&ext,list.eld,extlze.eld,verbose extlze.eld: EXT LIB: library of ELDs. Format: "ELD1", ELD name: "EXTLIB" 4880 code, 4880 alloc, 1360 data, 5760 alloc Library contains 60 ELDs, format 3 (lzexedat -4), packed size 145670 ~&q test$
I changed the help page and ELD library use cases to use the LZEXEDAT format (based on Fabrice Bellard's LZEXE), which compresses slightly better than heatshrink. The speed differs somewhat depending on use case, for the full-application compression (inicomp), lzexedat does much better than heatshrink but for the help pages heatshrink might be similar (at least until I get around to some better testing).
One thing is certain, though: The heatshrink compression, when running on the amd64 host, is much much faster than using LZEXE's packer that's run in dosemu2 on the same system (without KVM). For every help page I run heatshrink about 60 times and pick the best result, whereas lzexedat runs once per file and is still much slower.
Heatshrink's depack time is at a factor of 1.5 to 3 worse than lzexedat's. The factor 3 is with some minimal optimisations to heatshrink for the segmented inicomp depacker. LZEXE has the big advantage here in encoding the segment change command in the compressed stream every 40 KiB of depacked data, so it doesn't need to normalise pointers at any other point. (Our inicomp depacker for lzexedat still checks for overflows of the pointers, but this is likely much cheaper than normalising the pointers.)
The factor 1.5 presumably comes from heatshrink encoding everything into its bit stream, whereas the lzexedat format encodes literal bytes, short match distance bytes, the displacement/length combined word, and the additional escaped byte all as immediate bytes or words, not in the bitstream. The bitstream choice may be easier to handle in the depacker than the different choices of encoding elements. For a heatshrink based format it could prove useful to adapt this.
The size of lzexedat compressed data is slightly smaller than heatshrink's best performance, like 156 kB for heatshrink's extpak.eld vs 152 kB for lzexedat's, or 120 kB for heatshrink's lCDebugX executable vs 113 kB for lzexedat's. This is likely because of the three different match commands allowing to optimise for different data patterns, whereas every heatshrink match command has the same -w
and -l
field widths encoded into the bitstream. Aside the segment change command, it could be of interest for a heatshrink based format to allow changing the -w
and -l
parameters on the fly using special commands in the compressed stream.
As noted, and quite understandably, lzexedat's packer runs much slower than the native heatshrink packer. Might be of interest to port the lzss.nas file to C to make use of a native port as well. My understanding of the packer is (as to be expected) much more limited than that of the depacker.
As for what else lzexedat could do better, I imagine that some additional commands could be introduced with currently redundant encodings of the segment change or end of stream commands. (The current depacker completely ignores the distance encoded in the combined word for these.) A literals-string command for literals of length 26 or more bytes could be of use, albeit I don't know how common this is in actual data. (2 bits + 16 bits + 8 bits gives 26 bits per escaped command, so 25 or fewer literals would encode shorter using the 9 bit per byte single-literal encoding.)