====== Chasing a WarpLink bug ====== Today I used the current WarpLink revision, "release 4 by ecm (2025 September)", trying to peruse some details in preparation for a new feature. Cue my surprise when it failed to link even the simplest examples! ===== The error ===== Observe: test/20260204$ cat warplink.sh #! /bin/bash [ -z "$DOSEMU" ] && DOSEMU=dosemu param=" ${*//\//\\}" param="${param// \\/ \/}" # echo "$DOSEMU" -dumb -td -kt -q -quiet -K "$PWD" -E "warplink $param" "$DOSEMU" -dumb -td -kt -q -quiet -K "$PWD" -E "warplink $param" test/20260204$ cat test.asm section code dw 0 test/20260204$ nasm test.asm -f obj test/20260204$ ./warplink.sh /mx test.obj,test.exe,test.map\; About to Execute : warplink /mx test.obj,test.exe,test.map; WarpLink release 4 by ecm (2025 September), Michael Devore (1989-1993). Public Domain software, all copyrights surrendered. Warning in TEST.EXE Problem: No stack segment was found for the EXE file. Solution: This is possibly an error in your program. An EXE file, unlike a COM file, must internally setup its own stack if it has no stack segment. To create a COM file, use the /c option of WarpLink. FATAL error in TEST.OBJ: Problem: More than 32768 EXE file relocation table items. Solution: This is an error in your program. Your program cannot contain a total of more than 32768 far calls, far jumps, or other far pointers. Remove or replace some far references with near references. test/20260204$ The error is [[https://hg.pushbx.org/ecm/warplink/file/3d62e4618a5b/mlerrmes.mac#l314|RELOC_COUNT_ERR EQU 30]], which [[https://hg.pushbx.org/ecm/warplink/file/3d62e4618a5b/mlpass2c.nas#l703|is only used in a single spot]]. The check appears to be fine: inc WORD PTR [es:0] ; bump count of entries in relocation block inc word [number_reloc] ; bump global count of relocation items cmp word [number_reloc],RELOC_MAX ; check if too many relocation items ja mre_bounds ; too many items pop es ; restore critical register pop si pop bx ret ; done mre_bounds: mov dx,OFFSET filename wrt DGROUP mov ax,RELOC_COUNT_ERR ; more than 32768 .EXE file relocation table entries jmp NEAR PTR link_error ; transfer control to error handler make_reloc_entry ENDP The number_reloc variable also [[https://hg.pushbx.org/ecm/warplink/file/3d62e4618a5b/mlglobal.mac#l230|is initialised properly]]. Adding a segment relocation, such as generated by a ''jmp far label'' instruction, made the error go away. ===== Nondeterminism ===== I could reliably reproduce the error by using [[https://hg.pushbx.org/ecm/msdos4/file/766df35bd8d7/warplink.sh|my WarpLink invocation script]], which runs dosemu2 (currently "dosemu2-2.0pre8-20191102-1332-g9cf2b7560") with the default FreeDOS kernel (currently "FreeDOS kernel - GIT (build 2043 OEM:0xfd) [compiled Oct 10 2025]"). However, when I booted dosemu2 ("Build 2.0pre9-dev-20250926-3823-g59b19ff26") into "lDOS (2026 January)" and ran the test case there, then it only failed **once** per boot. Running the test case a second time, it seemed to succeed, not displaying the relocation count error. Trying to run the test case the first time in the debugger also happened to avoid the error message, [[https://pushbx.org/ecm/dokuwiki/blog/pushbx/2023/0911_debugger_relocation_all_switches_explained|even with the /T switch]] (so the linker would load to the same address as without the debugger): C:\>cd 20260204 C:\20260204>ldebug /t wl.exe /mx test.obj,test.exe,test.map; lDebug (2026-02-01) -g WarpLink release 4 by ecm (2025 September), Michael Devore (1989-1993). Public Domain software, all copyrights surrendered. Warning in TEST.EXE Problem: No stack segment was found for the EXE file. Solution: This is possibly an error in your program. An EXE file, unlike a COM file, must internally setup its own stack if it has no stack segment. To create a COM file, use the /c option of WarpLink. Total number of warnings: 1 EXE load image size: 001K Program terminated normally (0001) -q C:\20260204> ===== Work around ===== I was stumped about how to study this bug for a bit. Then I came across a way to provoke the bug even with the debugger running: Chainload the lDOS kernel [[https://pushbx.org/ecm/dokuwiki/blog/pushbx/2025/0304_interlude_current_ldebug_startup_files|from the bootable lDebug]], using the ''ldos'' alias then running ''g'' and ''r f CY'' and another ''g'' command. That ends up breaking [[https://hg.pushbx.org/ecm/msdos4/file/766df35bd8d7/src/BIOS/sysinit1.nas#l811|on the breakpoint before calling NEARDOSINIT]] in the lDOS system init. Next run a ''p'' command to proceed past this call. After, run ''bp at ptr ri21p when ah == 4B''. This sets a permanent breakpoint on the early DOS int 21h handler (before dosemu2 installs its handler). Then repeatedly run ''g'' until the FreeCOM command line input shows up. With the WarpLink command entered, the conditional breakpoint will be activated again. Now enter ''bw 0'' to disable the breakpoint's WHEN condition. Run another ''g'' command, which ends up in an int 21h function 3300h call. This is the first int 21h call done by the linker. Run ''g ptr [ss:sp]'' and the debugger stops within the linker's code segment, eg at 00CD:0233. Now run ''bd 0'' to disable the breakpoint, so you can run WarpLink without int 21h calls breaking out of the run. During the first run of WarpLink, the error comes up again. Score! ===== Analysis ===== I first set a temporary G breakpoint [[https://hg.pushbx.org/ecm/tlsfiles/file/4c9ebc65c51b/warplink/wl.tls#l39591|on offset 33B1]], near the check for the excessive relocations. As desired, this started out with the variable number_reloc as zero. The fact that the breakpoint was actually hit also hinted at a problem: This code should only run during processing of a segment relocation, of which there are none. Repeatedly running with the same temporary breakpoint revealed that the code was called repeatedly. The next step was to find what was calling the make_reloc_entry function: lDebug connected to serial port. Enter KEEP to confirm. = keep -r AX=0652 BX=0004 CX=0002 DX=0000 SP=03E6 BP=FFF0 SI=0000 DI=0000 DS=0D88 ES=1872 SS=1723 CS=00CD IP=33B1 NV UP EI PL NZ NA PO NC 00CD:33B1 FF068000 inc word [0080] DS:0080=0000 -t AX=0652 BX=0004 CX=0002 DX=0000 SP=03E6 BP=FFF0 SI=0000 DI=0000 DS=0D88 ES=1872 SS=1723 CS=00CD IP=33B5 NV UP EI PL NZ NA PO NC 00CD:33B5 813E80000080 cmp word [0080], 8000 DS:0080=0001 - AX=0652 BX=0004 CX=0002 DX=0000 SP=03E6 BP=FFF0 SI=0000 DI=0000 DS=0D88 ES=1872 SS=1723 CS=00CD IP=33BB OV UP EI NG NZ NA PO CY 00CD:33BB 7704 ja 33C1 not jumping - AX=0652 BX=0004 CX=0002 DX=0000 SP=03E6 BP=FFF0 SI=0000 DI=0000 DS=0D88 ES=1872 SS=1723 CS=00CD IP=33BD OV UP EI NG NZ NA PO CY 00CD:33BD 07 pop es - AX=0652 BX=0004 CX=0002 DX=0000 SP=03E8 BP=FFF0 SI=0000 DI=0000 DS=0D88 ES=6640 SS=1723 CS=00CD IP=33BE OV UP EI NG NZ NA PO CY 00CD:33BE 5E pop si - AX=0652 BX=0004 CX=0002 DX=0000 SP=03EA BP=FFF0 SI=0055 DI=0000 DS=0D88 ES=6640 SS=1723 CS=00CD IP=33BF OV UP EI NG NZ NA PO CY 00CD:33BF 5B pop bx - AX=0652 BX=D54E CX=0002 DX=0000 SP=03EC BP=FFF0 SI=0055 DI=0000 DS=0D88 ES=6640 SS=1723 CS=00CD IP=33C0 OV UP EI NG NZ NA PO CY 00CD:33C0 C3 retn - AX=0652 BX=D54E CX=0002 DX=0000 SP=03EE BP=FFF0 SI=0055 DI=0000 DS=0D88 ES=6640 SS=1723 CS=00CD IP=3236 OV UP EI NG NZ NA PO CY 00CD:3236 EBD8 jmp 3210 -t AX=0652 BX=D54E CX=0002 DX=0000 SP=03EE BP=FFF0 SI=0055 DI=0000 DS=0D88 ES=6640 SS=1723 CS=00CD IP=3210 OV UP EI NG NZ NA PO CY 00CD:3210 833E926A00 cmp word [6A92], +00 DS:6A92=ABAA - That's [[https://hg.pushbx.org/ecm/warplink/file/3d62e4618a5b/mlpass2c.nas#l430|a comparison in the function proc2_ledata]], reading the variable data_fixup_count. ABAAh is obviously a number much larger than the expected, zero. Where should the data_fixup_count variable be initialised? Turns out it is [[https://hg.pushbx.org/ecm/warplink/file/3d62e4618a5b/mlpass2d.nas#l173|in proc2_fixupp]]. But that function is never called if there's no fixupp entries! This insight also reveals the cause of the nondeterminism: When loading the linker a second time or first running the debugger as a DOS application, the word that forms the data_fixup_count variable gets initialised differently, avoiding the error message. (The debugger's /T switch doesn't help against this because DOS loads the debugger into the beginning of the Low Memory Area first, where it presumably changes the memory that ends up backing the variable, even if the debugger goes resident at the end of the LMA later in its init.) ===== Cause ===== The bug was introduced [[https://hg.pushbx.org/ecm/warplink/rev/1599f5e217f4|by a 2025-09-21 changeset]] which along with [[https://hg.pushbx.org/ecm/warplink/rev/acbd17dd577f|its child]] decreased the on-disk image size of the WarpLink executable by some 20 kB (from 91 kB to about 70 kB), by moving initialised data out of the nobits / _BSS section so that the remaining data could truly become nobits rather than being filled with progbits all-zeroes data. (The child additionally added some arrays to the nobits part with explicit code to do the zero-initialisation at run time.) ===== Fix ===== The fix is to simply [[https://hg.pushbx.org/ecm/warplink/rev/7d4588211863|put data_fixup_count into the part of the data section that's explicitly zero-initialised]] by the startup code. ===== Bonus ===== Running the test case (on dosemu2 / lDOS) repeatedly made the error go away, however it created an executable with a large number of (bogus) relocation entries in its MZ executable header: C:\>cd 20260204 C:\20260204>wl /mx test.obj,test.exe,test.map; WarpLink release 4 by ecm (2025 September), Michael Devore (1989-1993). Public Domain software, all copyrights surrendered. Warning in TEST.EXE Problem: No stack segment was found for the EXE file. Solution: This is possibly an error in your program. An EXE file, unlike a COM file, must internally setup its own stack if it has no stack segment. To create a COM file, use the /c option of WarpLink. FATAL error in TEST.OBJ: Problem: More than 32768 EXE file relocation table items. Solution: This is an error in your program. Your program cannot contain a total of more than 32768 far calls, far jumps, or other far pointers. Remove or replace some far references with near references. C:\20260204>wl /mx test.obj,test.exe,test.map; WarpLink release 4 by ecm (2025 September), Michael Devore (1989-1993). Public Domain software, all copyrights surrendered. Warning in TEST.EXE Problem: No stack segment was found for the EXE file. Solution: This is possibly an error in your program. An EXE file, unlike a COM file, must internally setup its own stack if it has no stack segment. To create a COM file, use the /c option of WarpLink. Total number of warnings: 1 EXE load image size: 001K C:\20260204>dir test.exe Volume in drive C is EMU DRIVE_C Directory of C:\20260204 TEST EXE 45,058 02-04-26 5:18p 1 file(s) 45,058 bytes 0 dir(s) 49,991 Mega bytes free C:\20260204>infoexe test.exe INFOEXE.EXE (C) 1989 Fabrice BELLARD, ecm release 4 Information of an EXE file Examining TEST.EXE Decimal Hex Length on disk (bytes) 45058 0000B002 Length of header + image (bytes) 45058 0000B002 Image size (paragraphs) 1 00000001 Image size (bytes) 2 00000002 Minimum alloc (paragraphs) 0 0000 Minimum alloc (bytes) 0 00000000 Maximum alloc (paragraphs) 65535 FFFF Maximum alloc (bytes) 1048560 000FFFF0 Minimum alloc incl image (paras) 1 0001 Minimum alloc incl image (bytes) 16 00000010 Maximum alloc incl image (paras) unlimited Maximum alloc incl image (bytes) unlimited SS:SP 0000:0000 CS:IP 0000:0000 Amount relocation entries 11178 2BAA Base of relocation entry table 30 001E Header size (paragraphs) 2816 0B00 Header size (bytes) 45056 0000B000 Overlay number 0 0000 C:\20260204> {{tag>warplink bugspam}} ~~DISCUSSION~~