Working on my fork of the 2025 May LZEXE free software release, I also came into contact with Microsoft's EXEPACK. Much has been written about it elsewhere, but a few points remained unclear to me until today as I worked on LZEXE's UPACKEXE. UPACKEXE is an offline depacker for a subset of EXEPACK-packed executables.
So, what does the data stream look like?
The stream is read backwards and depacked backwards. Both the source pointer (DS:SI) and destination pointer (ES:DI) are advanced downwards from the high end of their respective areas towards the beginning of the process's executable image position (at segment PSP + 16). The Direction Flag is set throughout the depacking, so that lodsb
, lodsw
, stosb
, and movsb
all decrement their respective index registers.
To allow running rep stosb
or rep movsb
with almost all counter word values in CX (up to FFF0h at least), the index pointers are kept anti-normalised. Anti-normalised means that the offset of the pointer is kept as high as possible. Unlike the pointer anti-normalisation in the LZMA-lzip depacker of inicomp (prior link), many EXEPACK versions suffered the bug that their anti-normalisation could advance into "negative segments" if the pointers were meant to point within the first 64 KiB minus 16 Byte of the Low Memory Area. A negative segment in Real or Virtual 8086 Mode does, of course, wrap around to a high segment value (> F000h). To read the intended memory within the low 64 KiB, wraparound at the A20 address line used to suffice – until the 286 introduced its 24 address lines and consequently the High Memory Area, the name eventually given to the 64 KiB minus 16 Byte starting at linear address 10_0000h. With A20 not masked off, the incorrectly anti-normalised pointer now accesses the HMA.
Before the first packed element, the packed data stream is padded to a paragraph boundary using bytes that read FFh (up to 15 of them, albeit the depacker can also handle 16 of them). The packed data stream is never relocated. The depacked data is written starting at the end of the area for the final depacked image. (The end of the depacked data also must be aligned to a paragraph boundary.)
Every packed element to be processed in the stream starts (at its highest address in the packed stream) with a marker byte, called marqueur by LZEXE. Masking off the low bit, the marker is either B0h or B2h. Encountering any marker that is not between B0h and B3h (inclusive) is an error, and will usually lead to the dreaded "Packed file is corrupt" error message. (This is actually why the A20-related address bug usually aborts the depacking with a nice error message instead of crashing, looping forever, or corrupting random memory.)
Any marker is followed (at the next two bytes below the marker) by a count word. B0h and B1h markers encode Run Length Encoded bytes. (Excepting the relocation table encoding, this is the only actual compression that can be performed by EXEPACK.) The following (below) byte gives the byte value to store, and this value is to be prepended to the data depacked so far as often as indicated by the count. The B2h and B3h markers instead encode literal runs, and their count word is followed (below) by the literal data to prepend to the data depacked so far.
If the low bit of the marker is set, it indicates that this marker forms the last packed element of the compressed stream. It is expected that the last element will cause the depacker to overwrite the memory allocated to the element itself, to fully finish writing the depacked image. As the marker byte is cached in a CPU register, it can be checked for its low bit after the element's depacked data has been written.
But then, I wondered, what does this code in UPACKEXE do? I machine translated the comment "envoi des données non compactées" as "sending uncompressed data". It takes the last address that points one byte below the last marker's destination and subtracts the MZ header size from that. (The header size used to be hardcoded to 512 Bytes here.) Then it seeks to behind the MZ header in the input file, and uses the calculated number as the length of a block of data to copy verbatim.
What no one seems to have specifically pointed out yet, is that the packed stream can (and is indeed expected to) end within the middle of the target image's data. When the depacker detects the marker with its low bit set, it ends its depack loop then and there. But the last element need not be stored at the beginning of the target image. So all data below the last element is kept unchanged, meaning it is a block of data that is passed through as a never-packed literal block.
LZEXE's UPACKEXE depacks the main image in two passes: First, it finds the end of the packed stream, and loops through it calling ReadB0 for every element. For every element, it stores the position of the marker byte within the file into the array called marq
. ReadB0 decrements the posi
variable to point to the next marker byte, if any.
After the first pass, we know that posi
points to the nonexistent post-final marker byte, so it points at the last byte of the never-packed initial part of the target image. Therefore, adding 1 and subtracting the MZ header size results in the size of the never-packed part. This data is known to be located at the beginning of the source image, so it is then transferred verbatim.
In the second pass, the marq array is processed in reverse (LIFO, Last In First Out). Every entry in the array points to its corresponding marker in the source file. From there it is trivial to process every element in the reverse order it was entered into the array, writing to the destination file in ascending order rather than the descending order in which the stream's elements had to be found.
The elegance of this is that during the first pass, the program need not concern itself with where in the destination file an element's data goes. It is implied by the subsequent second pass that the first block is the never-packed data prefix, followed by the reversed elements which are found in order from the marq
array.
UPACKEXE finds the packed relocation table by scanning part of the file for the byte sequence "pt". I believe this refers to the end of the default english-language error message, "Packed file is corrupt". We know that if this message occurs, then the relocation table starts directly behind it. However, it is reported that the message may be localised, which is why David Fifield's offline depacker scans for the error program termination code instead of matching the error message itself in part or in full.
The packed relocation table format is made up of 16 subtables. Each subtable starts with a word giving the amount of relocation entries in it. (Every subtable may be empty, indicated by a zero word as amount.) The indicated amount of entries follows this word. Each entry is a word, specifying the offset of the relocation. The segment of each subtable is implied by the order of the subtables; the first subtable addresses segment 0000h, the second one segment 1000h, and so on. This allows to encode any relocation address up to F000h:FFFFh. However, machines like the 286 and 386 may fault on a word access with offset FFFFh, and even on machines that do not fault, the access may tear and its second byte may wrap around to offset 0000h rather than accessing the byte behind offset FFFFh.
Therefore, the original EXEPACK online depacker includes a special handler for relocations encoded with offset FFFFh to avoid any faults or undesired tearing. Although the EXEPACK online depacker's relocation table stage was copied for LZEXE v0.90, in LZEXE v0.91 the UPACKEXE tool did not display an understanding of this problem. I fixed this of course.
UPACKEXE does not change the MZ header's min alloc or max alloc fields yet. This is not optimal.
I added some debugging output to UPACKEXE, selected using a bitmask in the environment variable %DEBUG%
.
To determine whether a file is packed by EXEPACK, UPACKEXE checks the following:
mov ax, es
. (I am certain that this does not detect all EXEPACK revisions as some preserve the incoming value in AX.)