====== Late July work ====== **2025-07-27** ===== ACEGALS ===== * [[https://hg.pushbx.org/ecm/acegals/rev/90d64d6ad6a7|Mention the additional indentation]] for pushing into lvar variables. * [[https://hg.pushbx.org/ecm/acegals/rev/9d12ee54a77a|Link an example]] for pushing into lvar variable. * Note that protocol comment's header [[https://hg.pushbx.org/ecm/acegals/rev/66365b7358b8|may be followed by an empty line]] (semicolon only) to separate it from the named sections. ===== inicomp ===== * [[https://hg.pushbx.org/ecm/inicomp/rev/c54f9cb6a782|Add _HEATSHRINK_FASTER option]] to add some code so the pointers are normalised less often. In the prior blog post we observed a depack time of 459ms prior to this change and 276ms with this change applied. * lzexedat depacker: * [[https://hg.pushbx.org/ecm/inicomp/rev/7e81ded9bce2|Add testfile -e display]] of long literal runs. This served to inspect the cdebugx.lze image for long runs (>= 26 literals in a row) to determine whether a long literal command would be of use. * [[https://hg.pushbx.org/ecm/inicomp/rev/d1a6091f9ded|Correct the wording]] of the new -e message. * [[https://hg.pushbx.org/ecm/inicomp/rev/34b244f9ac2f|Implement the _LONGLITERAL option]] to the depacker. * [[https://hg.pushbx.org/ecm/inicomp/rev/8b92b5db1070|Drop the check for a leading single literal command]], it is valid now for the compressed stream to start with a long literal command. * For testfile operation [[https://hg.pushbx.org/ecm/inicomp/rev/2adc652b1287|add display of current masque and the tag bits left]] to each -v and -e output line. This helped chasing why source.lze grew with the -l option. ===== kernwrap ===== * [[https://hg.pushbx.org/ecm/kernwrap/rev/b6fab72d047b|Support lzexedat -l format]]. Partial pick [[https://hg.pushbx.org/ecm/ldebug/rev/b0b5fc6c63c6|from lDebug's mak.sh script]]. * [[https://hg.pushbx.org/ecm/kernwrap/rev/fb893bf3f015|Update cfg.sh]] with the fastest keyword for INICOMP_WINNER, and lzexedat. ===== lDOS kernel ===== * [[https://hg.pushbx.org/ecm/msdos4/rev/8ab91deb67db|Update cfg.sh]] with fastest, zerocomp, mvcomp, lzexedat. ===== LZEXE ===== * [[https://hg.pushbx.org/ecm/lzexe/rev/f9d073723a6a|Fix a comment]] in unlzexe.c * [[https://hg.pushbx.org/ecm/lzexe/rev/15eacb6a495c|Detect and return error]] if CODEBUF1 overflow occurs. This happened to crash the packer during development of the long literal command. The CODEBUF1 buffer had a fixed length of 128 bytes and the code depended on the fact that the tag word (CODEBUF) would fill before CODEBUF1 had a chance to overflow. This assumption was broken by the long literals command. (The longest literal in lCDebugX's image is over 256 bytes in length.) This change also introduced a return code from the packer code to the caller ([[https://hg.pushbx.org/ecm/lzexe/rev/15eacb6a495c#l1.15|lzexe.pas]] or [[https://hg.pushbx.org/ecm/lzexe/rev/15eacb6a495c#l2.13|lzexedat.asm]]) to indicate a failure. * [[https://hg.pushbx.org/ecm/lzexe/rev/5980468b7eee|Fix a comment]] on indicating the length of medium match commands (an encoding with 0 is not a medium match command as it escapes long commands instead). * [[https://hg.pushbx.org/ecm/lzexe/rev/4fa127125fef|Add new switches -l and -t]] to lzexedat tool. No-op placeholders as of this changeset. * In the mak scripts [[https://hg.pushbx.org/ecm/lzexe/rev/dcabcb006d31|pass parameters to NASM]]. * In the mak scripts, [[https://hg.pushbx.org/ecm/lzexe/rev/694602735c73|pass include paths for lmacros]] to lzss.nas assembly. * [[https://hg.pushbx.org/ecm/lzexe/rev/d17fbb1a3a6e|Use the lmacros sectioning macros]] for lzss.nas * [[https://hg.pushbx.org/ecm/lzexe/rev/c40db2d8c5aa|Implement long literals format]]. * [[https://hg.pushbx.org/ecm/lzexe/rev/c93b0d6e6284|Implement a test mode]] for verifying the length needed for CODEBUF1. * [[https://hg.pushbx.org/ecm/lzexe/rev/7ac3fde006e8|Add formatlongliteral]] to manual. (In the manual on the web [[https://pushbx.org/ecm/doc/lzexe.htm#formatlongliteral|here]].) * Add formatlongliteral [[https://hg.pushbx.org/ecm/lzexe/rev/731e79b0804d|note on how well the format can be applied]], and the possibility of growing a file by 1 or 2 bytes. ==== The CODEBUF1 mystery ==== It appears that the size of CODEBUF1 of 128 bytes was chosen without much reason. While it is **enough**, it was more than the exact buffer size needed. I [[https://hg.pushbx.org/ecm/lzexe/rev/c40db2d8c5aa#l1.38|wrote a long comment on this]] in the long literal command changeset:
CODEPTR dw ?
MASQUE dw ?
CODEBUF dw ?
CODEBUF1 resb (3 + LONGESTLITERAL) * 8 + 1
; We do now check for CODEBUF1 overflow in PutCh, but we
; don't want that to happen. The following commands store
; immediates into the CODEBUF1 here:
;
; Single literal: 1 bit tags, 1 byte immediate
; Short match: 4 bit tags, 1 byte immediate
; Medium match: 2 bit tags, 2 bytes immediate
; Long match: 2 bit tags, 3 bytes immediate
; Segment change: 2 bit tags, 3 bytes immediate (seldom)
; End of stream: 2 bit tags, 3 bytes immediate (once)
; Long literal: 2 bit tags, 3 + N bytes immediate
;
; So 16 tag bits can at most contain:
;
; 16 single literals
; 4 short matches
; 8 medium or long commands,
; or 7 medium or long commands and 1 single literal
%if 0
Example of overflowing 24 bytes (no -l):
bytes bits index
2 bits 14 0
putcodebuf 0
3 bytes 0
2 bits 2 1
3 bytes 6 1
2 bits 4 2
3 bytes 9 2
2 bits 6 3
3 bytes 12 3
2 bits 8 4
3 bytes 15 4
2 bits 10 5
3 bytes 18 5
2 bits 12 6
3 bytes 21 6
2 bits 14 7
3 bytes 24 7
1 bit 15 8
1 bytes 25 8
%endif
;
; Prior to the long literal support, 2 tag bits could at
; most encode 3 immediate bytes. The first 3 bytes can be
; emitted immediately after flushing the tag word. Thus
; we have to account for 8 3-byte commands and then the
; possibility of a single literal command which leaves us
; with 7 times 2 tag bits (the 0th command already flushed
; its tag bits) plus 1 tag bit, fitting in the 16-bit tag
; word with the last command emitting its immediate byte
; with 15 tag bits stored.
; This indicates that (8 * 3) + 1 = 25 bytes were needed in
; the CODEBUF1. However, 128 bytes were originally used.
; This appears to have been an oversight.
;
; The long literal command allows to encode 515 bytes for
; every 2 tag bits, so the buffer needs to hold eight
; times that number plus 1 for the single-literal byte.
; This is what is calculated above.
.end:
Basically, the fullest that the immediate buffer (aka CODEBUF1) can get is to first see 8 commands with the maximum amount of immediates. That was 3 bytes prior to the long literals addition. The very first of the 8 commands must be positioned so its two tag bits fill the tag word (CODEBUF) completely and cause a flush, so that the 3 bytes are entered into the immediate buffer with 16 bits still available in the tag buffer. So the 8 long commands will fill 14 of the 16 tag bits.
Then, a single literal command must follow, which will not completely fill the tag buffer but rather leave it at 15 bits filled. This adds a single immediate to the immediate buffer. If a 9th long command follows instead, or a second single literal command, then the tag buffer is exhausted and the entire buffer flushed.
Therefore, the immediate buffer must be of size eight times the immediate length of the longest command, plus 1. This means prior to the long literal command, 25 bytes would have sufficed. With LONGESTLITERAL as 512, the longest command has a literal length of 515 (512 + 3) so the immediate buffer needs to hold 8 * 515 + 1 bytes, or 4096 + 24 + 1, or 4121 bytes.
Initially I was annoyed at the uncommented size of the immediate buffer as a hardcoded 128. Now it seems like that number was just chosen as a "more than enough" solution without regard to the exact needed size. For the long literal command support, however, I wanted to know exactly how much memory would be needed.
I'm considering to assemble lzss.nas thrice, for the TPC and FPC builds of lzexe.pas and another build for lzexedat.asm, which would allow to save some memory not used by the lzexe.pas utility. In this case, a build without the long literal command should allocate 25 bytes to the immediate buffer.
===== lDebug =====
* doc: [[https://hg.pushbx.org/ecm/ldebug/rev/33c874709eda|Update cmdr with a description of the R F command]].
* errfix.eld: [[https://hg.pushbx.org/ecm/ldebug/rev/929c3b01934d|Fix puts getline handler]] for resetting the errfix array corrupting the function number in bl.
* [[https://hg.pushbx.org/ecm/ldebug/rev/4e8e5f5fc1ce|Update eldputs-getline chapter]] of the manual with new functions. (In the manual on the web [[https://pushbx.org/ecm/doc/ldebug.htm#eldputs-getline|here]].)
* [[https://hg.pushbx.org/ecm/ldebug/rev/0519f98f0941|Add quiet.eld]] which silences any debugger output except for getinput. Needed to run the help depacking benchmark [[blog:pushbx:2025:0724_more_benchmarks_on_heatshrink_vs_lzexedat|of the prior blog post]], as the depack time was utterly swamped by the time it took to display the depacked text.
* mak.sh: [[https://hg.pushbx.org/ecm/ldebug/rev/b0b5fc6c63c6|Support lzexedat -l format]] for inicomp.
* [[https://hg.pushbx.org/ecm/ldebug/rev/536b25474532|Update cfg.sh]] with lzexedat and enable lzexedat -l by default.
* [[https://hg.pushbx.org/ecm/ldebug/rev/42a51d72023e|Add long literal option]] to lzexedat implementation of the help depacker. This changeset also unconditionally enabled the -l switch to lzexedat for compressing the help.
* [[https://hg.pushbx.org/ecm/ldebug/rev/49b9f0c8b13f|Fix help depacker]]: copy_data would corrupt the cx loop counter during literal processing (only if not using inline literal code).
* [[https://hg.pushbx.org/ecm/ldebug/rev/95c7c36a1915|Add long literal option]] to the ELD depacker.
* [[https://hg.pushbx.org/ecm/ldebug/rev/2f288d0c6f81|Add new format 4]] for lzexedat -4 -l compressed extpak.eld
* In list.eld [[https://hg.pushbx.org/ecm/ldebug/rev/00333df37772|support the new format 4]] in its multi-format depacker.
At this point in the night (03:14) from Friday to Saturday, I knew that both the help depacker and the ELD depacker were broken, but I called it a day. At 08:25 the following Saturday I continued:
* Fix to help depacker and ELD depacker: [[https://hg.pushbx.org/ecm/ldebug/rev/f9f8521dd44e|No longer check for a leading 1 bit]] in the compressed stream, as a long literal command may lead, which is encoded with a leading 0 bit.
* Fix to ELD depacker: [[https://hg.pushbx.org/ecm/ldebug/rev/187cbb4c3152|Do not depend on the Parity Flag after a rol instruction]]. This would have been a neat hack to distinguish 0FFh = heatshrink, 0 = lzexedat -4, and 1 = lzexedat -4 -l (only the latter has odd parity), but it turns out that [[https://www.felixcloutier.com/x86/rcl:rcr:rol:ror|the rol instruction doesn't set the Parity Flag]].
* [[https://hg.pushbx.org/ecm/ldebug/rev/d8a12069acde|Add an option to enable or disable using lzexedat -l option]] for the help compression. Defaults to off because, as the changeset message states: "there's about 5 bytes to be saved wih -l enabled, but at the cost of some additional depacker code."
* [[https://hg.pushbx.org/ecm/ldebug/rev/701f5e80449e|Update cfg.sh]] with fastest keyword for winner and lzexedat.
* doc: [[https://hg.pushbx.org/ecm/ldebug/rev/f6cb98772527|Update news-r10]] with lzexedat -4 -l used for ELD compression.
{{tag>acegals inicomp kernwrap ldos lzexe ldebug}}
~~DISCUSSION~~