2025-07-20
This week I got to put LZEXEDAT, the plain LZEXE format compression, to more uses.
even
directive by alignb 2
to avoid initialising part of the data segment if alignment is needed. (Our macro for even
tried to detect BSS segments but the data segment wasn't matched.)-4
switch to lzexedat.-v
switch, add single-line output otherwise.The percentage remaining display re-uses the 8086-compatible 32/32 division (with 32-bit dividend, divisor, and quotient) in lDebug's expression evaluator, itself copied from the book Art of Assembly, to create a 48/32 division (with 48-bit dividend and quotient, and 32-bit divisor).
I forgot the xor dx,dx
before the division by 100 initially. The long division code (adapted from the Art of Assembly example) immediately worked as expected, which is somewhat surprising. Also, like mentioned in the book, a division by zero will return with the 48-bit quotient filled with an all-1s pattern. This happens if the input file to lzexedat is empty.
Adapting the ELD depack.asm's lzexedat depacker for the help depacker presented some problems:
Now back to the lDebug changes.
In the following tests, directory 20250718 is from revision f69d69ec9365 (still with heatshrink-compressed help) and 20250719 is from revision 2c94004cd387 (with lzexedat-compressed help).
With the lzexedat -4
format used for the packed help newly, rather than the heatshrink format, the size of the compressed debugger executable shrunk by 1024 bytes while the uncompressed debugger executable stayed the same size:
test$ ls -l 20250718 total 248 -rw-r--r-- 1 ecm ecm 105984 Jul 18 23:42 lcdebugx.com -rw-r--r-- 1 ecm ecm 144896 Jul 18 23:41 lcdebugxu.com test$ ls -l 20250719 total 248 -rw-r--r-- 1 ecm ecm 104960 Jul 19 23:56 lcdebugx.com -rw-r--r-- 1 ecm ecm 144896 Jul 19 23:54 lcdebugxu.com test$
The following tests were run using the bash alias:
alias dos='dosemu -K "$PWD" -dumb -td -kt -q -quiet -E'
I initially ran the test using a long /C= command line to lCDebugx:
test$ time -p dos '20250718\lcdebugx /c=uninstall,paging;:loop;?;?f;?r;?c;?run;?e;?v;?re;?b;?l;?source;r,v1,+=1;if(v1!=#16)then,goto:loop;q'
However, this proved unusable on the default dosemu2 revision on the server. And it is difficult to avoid any blanks, requiring quotes which cannot be passed to the dosemu
script properly it seems. So I adapted the above scriptlet into the following Script for lDebug file:
test$ cat test.sld uninstall,paging :loop ? ?f ?r ?c ?run ?e ?v ?re ?b ?l ?source r,v1,+=1 if(v1!=#16)then,goto:loop q
The speed of both formats is about equal, albeit perhaps lzexedat -4 is slightly slower. Test running on the AMD A10-7870K with KVM available.
With heatshrink:
real 2.91 user 1.62 sys 1.21 test$ time -p dos '20250718\lcdebugx /c=ytest.sld'
With lzexedat -4:
real 2.93 user 1.60 sys 1.25 test$ time -p dos '20250719\lcdebugx /c=ytest.sld'
The same test, running on the amd64 server without KVM available, depicts that lzexedat -4 is perhaps slightly faster:
real 4.31 user 2.88 sys 0.73 test$ time -p dos '20250718\lcdebugx /c=ytest.sld'
real 4.13 user 2.64 sys 0.68 test$ time -p dos '20250719\lcdebugx /c=ytest.sld'
test$ dos '20250718\lcdebugx /c=ext,ldmem.eld,mem;q' ~&ext,ldmem.eld,mem Entry segment: 039B Size: DF00 55.7 KiB Message segment: 118B Size: 52D0 20.7 KiB First code segment: 16B8 Size: CA20 50.5 KiB Second code segment: 235A Size: 2120 8.2 KiB Auxiliary buffer segment: 256C Size: 2010 8.0 KiB History segment: 276D Size: 2000 8.0 KiB Environment segment: 296D Size: 0800 2048 B ELD code segment: 29ED Size: 4000 16.0 KiB Total allocation: 039B Size: 2A520 169 KiB ~&q test$ dos '20250719\lcdebugx /c=ext,ldmem.eld,mem;q' ~&ext,ldmem.eld,mem Entry segment: 039B Size: DF00 55.7 KiB Message segment: 118B Size: 5180 20.3 KiB First code segment: 16A3 Size: CA10 50.5 KiB Second code segment: 2344 Size: 2120 8.2 KiB Auxiliary buffer segment: 2556 Size: 2010 8.0 KiB History segment: 2757 Size: 2000 8.0 KiB Environment segment: 2957 Size: 0800 2048 B ELD code segment: 29D7 Size: 4000 16.0 KiB Total allocation: 039B Size: 2A3C0 168 KiB ~&q test$
The total process size shrunk by 352 bytes. 336 of these bytes were saved in the message segment (indicating a slight improvement in compression rate) while the remaining 16 bytes were saved in the first code segment (indicatinig the depacker is slightly smaller).
Even though the heatshrink packer is used about 60 times per help file, the complete time is still over much faster than calling lzexedat.exe (once per help file!) using dosemu2.
Heatshrink may actually be a little faster still than implied by this comparison. This is because _DEPACKINLINELITERAL
was added to the heatshrink depacker only in the same changeset as the lzexedat -4 depacker, so the 2025-07-18 revision still runs without this speed optimisation.
Building the current (hg revision 33c874709eda) lCDebugX using the following scriptlet:
ldebug/source$ use_build_help_compressed_lzexedat=0 use_build_compress_only=0 build_name=cdebugx time -p ./mak.sh -D_PM=1 -D_DEBUG -D_DEBUG_COND -D_DEFAULTSHOWSIZE
Results on the server in:
real 4.32 user 2.88 sys 0.73 test$ time -p dos 'hs\lcdebugx /c=ytest.sld'
Versus the same but use_build_help_compressed_lzexedat=1
:
real 4.61 user 2.72 sys 0.76 test$ time -p dos 'lze\lcdebugx /c=ytest.sld'
It does seem like heatshrink may be slightly faster, albeit the noise level for this test is too high to make sweeping statements.
Perhaps the terminal output is too much of a factor in the time actually spent on these tests.
The following comment is largely applicable to the three lzexedat depackers as well. As opposed to the -z switch I added to my fork of heatshrink, the lzexedat inicomp depacker adds the -4 switch which is communicated to the inicomp.asm assembly using the -D_4
define on the NASM command line. The help and ELD depacker always use the lzexedat -4 format, in order to limit them to the same window size (4 KiB, -w 12
) as used by the heatshrink depackers for the help and ELD.
My 8086 DOS heatshrink uses (segmented inicomp depacker, single-segment help page depacker, streaming ELD library depacker) #82
Hi, I wanted to let you know I am using the heatshrink compression format for several parts of my 86 DOS debugger project, lDebug (that's an L).
The first use was in inicomp, my executable depacker for triple-mode executables (DOS kernel, DOS device driver, DOS application). The heatshrink format is used as one of many options. This depacker supports compressed as well as uncompressed data sizes beyond 64 KiB, using 8086 segmented addressing in Real/Virtual 86 Mode. Other than that, it is special in that the destination buffer is always below the source and it is valid for the destination to partially overwrite the source if the source pointer is always above-or-equal the destination pointer. That means the entire data must be stored in memory, but less memory than the full source + full destination is needed.
The second use is for lDebug help pages. This is ready, but not yet used by default. The help pages always fit within less than 64 KiB so most of the segmentation things have been taken out of this one. It comes with a stand alone test program which uses a 256-byte file buffer to hold parts of the source file.
The third use is for the Extensions for lDebug packed library executable. I wrote some about the latest use on my blog. Like the help page depacker this uses a 256-byte file buffer for the compressed input. It also has a stand alone test program; this one supports input and output files > 64 KiB too.
Unlike the other two depackers, this one uses a 4 KiB circular decompression buffer (thus window size must not be >
-w 12
), and the implementation of its put_file_data will grab data after a certaindepackskip
counter reaches zero. The compressed data stream is much larger than 64 KiB, but only the output data of interest is grabbed by put_file_data. If that function has filled its output buffer, it will pause the current depack call. A paused depack call can be resumed when more data is needed from later on in the decompressed data stream. To implement the pausing and resumption, I run depack on its own stack separate from the main application's, and I save all needed working registers on either stack when switching stacks.