User Tools

Site Tools


blog:pushbx:2025:0722_mid_july_work_mostly_on_lzexe_re-use

Mid July work, mostly on LZEXE re-use

2025-07-20

This week I got to put LZEXEDAT, the plain LZEXE format compression, to more uses.

MSDebug

inicomp compressed payload stage

kernwrap (script to wrap a kernel in inicomp and iniload stages)

LZEXE

The percentage remaining display re-uses the 8086-compatible 32/32 division (with 32-bit dividend, divisor, and quotient) in lDebug's expression evaluator, itself copied from the book Art of Assembly, to create a 48/32 division (with 48-bit dividend and quotient, and 32-bit divisor).

I forgot the xor dx,dx before the division by 100 initially. The long division code (adapted from the Art of Assembly example) immediately worked as expected, which is somewhat surprising. Also, like mentioned in the book, a division by zero will return with the 48-bit quotient filled with an all-1s pattern. This happens if the input file to lzexedat is empty.

lDebug

Adapting the ELD depack.asm's lzexedat depacker for the help depacker presented some problems:

Now back to the lDebug changes.

lDebug packed help format change

In the following tests, directory 20250718 is from revision f69d69ec9365 (still with heatshrink-compressed help) and 20250719 is from revision 2c94004cd387 (with lzexedat-compressed help).

Executable size

With the lzexedat -4 format used for the packed help newly, rather than the heatshrink format, the size of the compressed debugger executable shrunk by 1024 bytes while the uncompressed debugger executable stayed the same size:

test$ ls -l 20250718
total 248
-rw-r--r-- 1 ecm ecm 105984 Jul 18 23:42 lcdebugx.com
-rw-r--r-- 1 ecm ecm 144896 Jul 18 23:41 lcdebugxu.com
test$ ls -l 20250719
total 248
-rw-r--r-- 1 ecm ecm 104960 Jul 19 23:56 lcdebugx.com
-rw-r--r-- 1 ecm ecm 144896 Jul 19 23:54 lcdebugxu.com
test$

Depack speed

The following tests were run using the bash alias:

alias dos='dosemu -K "$PWD" -dumb -td -kt -q -quiet -E'

I initially ran the test using a long /C= command line to lCDebugx:

test$ time -p dos '20250718\lcdebugx /c=uninstall,paging;:loop;?;?f;?r;?c;?run;?e;?v;?re;?b;?l;?source;r,v1,+=1;if(v1!=#16)then,goto:loop;q'

However, this proved unusable on the default dosemu2 revision on the server. And it is difficult to avoid any blanks, requiring quotes which cannot be passed to the dosemu script properly it seems. So I adapted the above scriptlet into the following Script for lDebug file:

test$ cat test.sld 
uninstall,paging
:loop
?
?f
?r
?c
?run
?e
?v
?re
?b
?l
?source
r,v1,+=1
if(v1!=#16)then,goto:loop
q

The speed of both formats is about equal, albeit perhaps lzexedat -4 is slightly slower. Test running on the AMD A10-7870K with KVM available.

With heatshrink:

real 2.91
user 1.62
sys 1.21
test$ time -p dos '20250718\lcdebugx /c=ytest.sld'

With lzexedat -4:

real 2.93
user 1.60
sys 1.25
test$ time -p dos '20250719\lcdebugx /c=ytest.sld'

The same test, running on the amd64 server without KVM available, depicts that lzexedat -4 is perhaps slightly faster:

real 4.31
user 2.88
sys 0.73
test$ time -p dos '20250718\lcdebugx /c=ytest.sld'
real 4.13
user 2.64
sys 0.68
test$ time -p dos '20250719\lcdebugx /c=ytest.sld'

Memory use of the debugger

test$ dos '20250718\lcdebugx /c=ext,ldmem.eld,mem;q'
~&ext,ldmem.eld,mem
Entry segment:            039B  Size:  DF00   55.7 KiB
Message segment:          118B  Size:  52D0   20.7 KiB
First code segment:       16B8  Size:  CA20   50.5 KiB
Second code segment:      235A  Size:  2120    8.2 KiB
Auxiliary buffer segment: 256C  Size:  2010    8.0 KiB
History segment:          276D  Size:  2000    8.0 KiB
Environment segment:      296D  Size:  0800     2048 B
ELD code segment:         29ED  Size:  4000   16.0 KiB
Total allocation:         039B  Size: 2A520    169 KiB
~&q
test$ dos '20250719\lcdebugx /c=ext,ldmem.eld,mem;q'
~&ext,ldmem.eld,mem
Entry segment:            039B  Size:  DF00   55.7 KiB
Message segment:          118B  Size:  5180   20.3 KiB
First code segment:       16A3  Size:  CA10   50.5 KiB
Second code segment:      2344  Size:  2120    8.2 KiB
Auxiliary buffer segment: 2556  Size:  2010    8.0 KiB
History segment:          2757  Size:  2000    8.0 KiB
Environment segment:      2957  Size:  0800     2048 B
ELD code segment:         29D7  Size:  4000   16.0 KiB
Total allocation:         039B  Size: 2A3C0    168 KiB
~&q
test$

The total process size shrunk by 352 bytes. 336 of these bytes were saved in the message segment (indicating a slight improvement in compression rate) while the remaining 16 bytes were saved in the first code segment (indicatinig the depacker is slightly smaller).

Pack performance

Even though the heatshrink packer is used about 60 times per help file, the complete time is still over much faster than calling lzexedat.exe (once per help file!) using dosemu2.

Bogus result?

Heatshrink may actually be a little faster still than implied by this comparison. This is because _DEPACKINLINELITERAL was added to the heatshrink depacker only in the same changeset as the lzexedat -4 depacker, so the 2025-07-18 revision still runs without this speed optimisation.

Building the current (hg revision 33c874709eda) lCDebugX using the following scriptlet:

ldebug/source$ use_build_help_compressed_lzexedat=0 use_build_compress_only=0 build_name=cdebugx time -p ./mak.sh -D_PM=1 -D_DEBUG -D_DEBUG_COND -D_DEFAULTSHOWSIZE

Results on the server in:

real 4.32
user 2.88
sys 0.73
test$ time -p dos 'hs\lcdebugx /c=ytest.sld'

Versus the same but use_build_help_compressed_lzexedat=1:

real 4.61
user 2.72
sys 0.76
test$ time -p dos 'lze\lcdebugx /c=ytest.sld'

It does seem like heatshrink may be slightly faster, albeit the noise level for this test is too high to make sweeping statements.

Perhaps the terminal output is too much of a factor in the time actually spent on these tests.

The three different depackers

The following comment is largely applicable to the three lzexedat depackers as well. As opposed to the -z switch I added to my fork of heatshrink, the lzexedat inicomp depacker adds the -4 switch which is communicated to the inicomp.asm assembly using the -D_4 define on the NASM command line. The help and ELD depacker always use the lzexedat -4 format, in order to limit them to the same window size (4 KiB, -w 12) as used by the heatshrink depackers for the help and ELD.

My 8086 DOS heatshrink uses (segmented inicomp depacker, single-segment help page depacker, streaming ELD library depacker) #82

Hi, I wanted to let you know I am using the heatshrink compression format for several parts of my 86 DOS debugger project, lDebug (that's an L).

The first use was in inicomp, my executable depacker for triple-mode executables (DOS kernel, DOS device driver, DOS application). The heatshrink format is used as one of many options. This depacker supports compressed as well as uncompressed data sizes beyond 64 KiB, using 8086 segmented addressing in Real/Virtual 86 Mode. Other than that, it is special in that the destination buffer is always below the source and it is valid for the destination to partially overwrite the source if the source pointer is always above-or-equal the destination pointer. That means the entire data must be stored in memory, but less memory than the full source + full destination is needed.

The second use is for lDebug help pages. This is ready, but not yet used by default. The help pages always fit within less than 64 KiB so most of the segmentation things have been taken out of this one. It comes with a stand alone test program which uses a 256-byte file buffer to hold parts of the source file.

The third use is for the Extensions for lDebug packed library executable. I wrote some about the latest use on my blog. Like the help page depacker this uses a 256-byte file buffer for the compressed input. It also has a stand alone test program; this one supports input and output files > 64 KiB too.

Unlike the other two depackers, this one uses a 4 KiB circular decompression buffer (thus window size must not be > -w 12), and the implementation of its put_file_data will grab data after a certain depackskip counter reaches zero. The compressed data stream is much larger than 64 KiB, but only the output data of interest is grabbed by put_file_data. If that function has filled its output buffer, it will pause the current depack call. A paused depack call can be resumed when more data is needed from later on in the decompressed data stream. To implement the pausing and resumption, I run depack on its own stack separate from the main application's, and I save all needed working registers on either stack when switching stacks.

You could leave a comment if you were logged in.
blog/pushbx/2025/0722_mid_july_work_mostly_on_lzexe_re-use.txt · Last modified: 2025-07-22 21:08:38 +0200 Jul Tue by ecm