The lDebug debugger is a demanding application. It's driven NASM bug reports for a while now. There are 63 tickets filed by me in the NASM bugzilla, most of which cropped up during lDebug development. As described in 2022 July, the debugger's footprint has grown beyond 64 KiB segment limits twice now.
The debugger now requires three main segments to address parts of itself if a build utilising its full capabilities is assembled. These are:
The process/data/entry/stack section, holding the debugger's PSP, most data and messages, all assembler tables, all code entrypoints into the debugger, as well as the debugger's stack.
The code section, which holds most of the code. It is the only section currently patched using the 386+ respectively non-386 patch tables in the initialisation code. (The data entry section contains a few patch sites but these are handled one by one without tables.)
The second code section.
This is only included if the
_DUALCODE build option is enabled.
It can hold all the symbolic code included from
symsnip repo (
insert.asm and friends), and
optionally even most code assembled from the
(With the exception of the XMS detection
and calling code, which use
intcall functions and
have a number of 386 patches so they go into the first code section.)
Additionally, the auxiliary buffer and history buffer are addressed as their own segments. Finally, the init section is used only for initial set-up.
Building the most recent lDebug revision with a command like this:
NASM=~/proj/nasmtest/nasm ./makexd -D_SYMBOLIC -D_DUALCODE -D_DEFAULTSHOWSIZE -D_SYMBOLASMDUALCODE -D_DEBUG_COND
Results in the following size messages:
debug.asm:1390: warning: msg holds 33835 bytes [-w+user] expr.asm:2843: warning: word data exceeds bounds [-w+number-overflow] init.asm:1432: warning: patch_no386_table: 946 (Method 2) [-w+user] init.asm:1432: warning: 1B=318 repo=46 run=426 byte=996 [-w+user] init.asm:1437: warning: patch_386_table: 50 (Method 2) [-w+user] init.asm:1437: warning: 1B=4 repo=11 run=13 byte=59 [-w+user] debug.asm:6745: warning: asmtables hold 8048 bytes [-w+user] debug.asm:6756: warning: init segment holds 7120 bytes [-w+user] debug.asm:6767: warning: code segment holds 61280 bytes [-w+user] debug.asm:6782: warning: code segment 2 holds 14560 bytes [-w+user] debug.asm:6793: warning: PSP segment holds 57184 bytes [-w+user]
(The auxiliary buffer and history buffer take up about 8 KiB each.)
This sums to 145 KiB used by the resident debugger.
My most recent patch to NASM addresses a problem that cropped up during lDebugX symbolic development. It appeared to be caused by the sheer size of the sources moreso than specifically the new dual code segments split. I found that recent revisions of the assembler would be killed by the system's OOM killer. This was due to exceeding 3.3 GiB of memory use, of the 6 GiB of memory allocated to our server. This would also be too close for comfort to the 4 GiB address space limit on 32-bit x86 hosts, although our server runs in amd64 long mode.
One cause of this was a change in how the preprocessor allocates tokens. Instead of storing a pointer to a separate allocation for each token's text content, now each token is allocated 64 bytes (on amd64 hosts) and text of up to 47 bytes is stored inline in the token's allocation. With thousands of tokens, most of which store text much shorter than 47 bytes, the memory use has risen a lot.
My patch reduces the token structure length from 64 bytes to 32 bytes.
Although this does not restore the
oldnasm levels of memory use (below 800 MiB),
it does drop it to below 2.5 GiB which suffices to build lDebugX symbolic
on our server.
The fourth bug concerns parsing of multi-line macros to detect the use of the
%00 "label of the macro" specifier. While testing this feature before
using it for the debugger's dual code segment support,
I found that it didn't work in the recent NASM revisions. I checked the preprocessor sources to determine if the feature was removed intentionally, which it turns out wasn't the case.
The fifth bug I originally reported in 2020 December. It is on the use of commas in
%strcat directives, which are documented as optional but allowed. A patch to fix this bug was
submitted to NASM's github
in 2022 February. I'm still waiting on it being merged.
Even older, now fixed, bugs include
the bit-shift left shifting right instead
%assign directives, and
using a question mark prefix for a define name being rejected. (The
lmacros2.mac stack frame macros use question mark prefixes for the defines created by their
lequ directives.) There also were several problems with listing or warning line numbers being wrong in
%rep blocks and included files.
I hope that all these bugfixes will be merged into NASM soon.