User Tools

Site Tools


blog:pushbx:2022:0826_recent_segmentation_in_ldebug_and_not_yet_merged_nasm_patches

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

blog:pushbx:2022:0826_recent_segmentation_in_ldebug_and_not_yet_merged_nasm_patches [2022-08-26 01:06:20 +0200 Aug Fri] (current)
ecm created
Line 1: Line 1:
 +====== Recent segmentation in lDebug and not yet merged NASM patches ======
 +
 +The lDebug debugger is a demanding application.
 +It's driven NASM bug reports for a while now.
 +There are [[https://bugzilla.nasm.us/buglist.cgi?f1=reporter&list_id=1912&o1=equals&v1=pushbx%40ulukai.org|63 tickets filed by me]]
 +in the NASM bugzilla, most of which cropped up during lDebug development.
 +As [[blog:pushbx:2022:0706_ldebug_symbolic_and_segment_splits|described in 2022 July]], the debugger's footprint has grown beyond 64 KiB segment limits twice now.
 +
 +
 +===== Segmentation =====
 +
 +The debugger now requires three main segments to address
 +parts of itself if a build utilising its full capabilities is assembled.
 +These are:
 +
 +The process/data/entry/stack section, holding the debugger's PSP,
 +most data and messages, all assembler tables,
 +all code entrypoints into the debugger,
 +as well as the debugger's stack.
 +
 +The code section, which holds most of the code.
 +It is the only section currently patched using
 +the 386+ respectively non-386 patch tables
 +in the initialisation code.
 +(The data entry section contains a few patch sites
 +but these are handled one by one without tables.)
 +
 +The second code section.
 +This is only included if the ''_DUALCODE'' build option is enabled.
 +It can hold all the symbolic code included from
 +the ''symsnip'' repo (''insert.asm'' and friends), and
 +optionally even most code assembled from the
 +''symbols.asm'' sources.
 +(With the exception of the XMS detection
 +and calling code, which use ''intcall'' functions and
 +have a number of 386 patches so they go into the first code section.)
 +
 +Additionally, the auxiliary buffer and history buffer are addressed
 +as their own segments. Finally, the init section is used only
 +for initial set-up.
 +
 +
 +===== Current segment use =====
 +
 +Building the most recent lDebug revision with a command like this: ''NASM=~/proj/nasmtest/nasm ./makexd -D_SYMBOLIC -D_DUALCODE -D_DEFAULTSHOWSIZE -D_SYMBOLASMDUALCODE -D_DEBUG_COND''
 +
 +Results in the following size messages: 
 +
 +<code>debug.asm:1390: warning: msg holds 33835 bytes [-w+user]
 +expr.asm:2843: warning: word data exceeds bounds [-w+number-overflow]
 +init.asm:1432: warning: patch_no386_table: 946 (Method 2) [-w+user]
 +init.asm:1432: warning: 1B=318 repo=46 run=426 byte=996 [-w+user]
 +init.asm:1437: warning: patch_386_table: 50 (Method 2) [-w+user]
 +init.asm:1437: warning: 1B=4 repo=11 run=13 byte=59 [-w+user]
 +debug.asm:6745: warning: asmtables hold 8048 bytes [-w+user]
 +debug.asm:6756: warning: init segment holds 7120 bytes [-w+user]
 +debug.asm:6767: warning: code segment holds 61280 bytes [-w+user]
 +debug.asm:6782: warning: code segment 2 holds 14560 bytes [-w+user]
 +debug.asm:6793: warning: PSP segment holds 57184 bytes [-w+user]</code>
 +
 +(The auxiliary buffer and history buffer take up about 8 KiB each.)
 +
 +This sums to 145 KiB used by the resident debugger.
 +
 +
 +===== NASM bugs =====
 +
 +My most recent patch to NASM addresses a problem that
 +cropped up during lDebugX symbolic development.
 +It appeared to be caused by the sheer size of the sources moreso
 +than specifically the new dual code segments split.
 +I found that recent revisions of the assembler 
 +would be killed by the system's OOM killer.
 +This was due to exceeding 3.3 GiB of memory use,
 +of the 6 GiB of memory allocated to our server.
 +This would also be too close for comfort to the 4 GiB address space limit
 +on 32-bit x86 hosts, although our server runs in amd64 long mode.
 +
 +One cause of this was a change in how the preprocessor
 +allocates tokens. Instead of storing a pointer to a separate allocation
 +for each token's text content, now each token is allocated 64 bytes
 +(on amd64 hosts) and text of up to 47 bytes is stored inline in the token's allocation.
 +With thousands of tokens, most of which store text much shorter
 +than 47 bytes, the memory use has risen a lot.
 +
 +[[https://bugzilla.nasm.us/show_bug.cgi?id=3392774|My patch]] reduces the token structure length from 64 bytes to 32 bytes.
 +Although this does not restore the ''oldnasm'' levels of memory use (below 800 MiB),
 +it does drop it to below 2.5 GiB which suffices to build lDebugX symbolic
 +on our server.
 +
 +The next two bugs are minor problems
 +[[https://bugzilla.nasm.us/show_bug.cgi?id=3392805|in a disabled code branch]]
 +as well as
 +[[https://bugzilla.nasm.us/show_bug.cgi?id=3392804|using a standard library function directly]] instead of the ''nasm_free'' function.
 +
 +The [[https://bugzilla.nasm.us/show_bug.cgi?id=3392803|fourth bug concerns parsing of multi-line macros]] to detect the use of the ''%00'' "label of the macro" specifier. While testing this feature before
 +[[https://hg.pushbx.org/ecm/ldebug/rev/04da020cae40|using it for the debugger's dual code segment support]],
 +I found that it didn't work in the recent NASM revisions. I checked the preprocessor sources to determine if the feature was removed intentionally, which it turns out wasn't the case.
 +
 +The fifth bug I [[https://bugzilla.nasm.us/show_bug.cgi?id=3392732|originally reported in 2020 December]]. It is on the use of commas in ''%strcat'' directives, which are documented as optional but allowed. A patch to fix this bug was
 +[[https://github.com/netwide-assembler/nasm/pull/25|submitted to NASM's github]]
 +in 2022 February. I'm still waiting on it being merged.
 +
 +Even older, now fixed, bugs include
 +[[https://bugzilla.nasm.us/show_bug.cgi?id=3392747|the bit-shift left shifting right instead]]
 +in preprocessor ''%assign'' directives, and
 +[[https://bugzilla.nasm.us/show_bug.cgi?id=3392733|using a question mark prefix for a define name]] being rejected. (The ''lmacros2.mac'' stack frame macros use question mark prefixes for the defines created by their ''lpar'', ''lvar'', and ''lequ'' directives.) There also were several problems with listing or warning line numbers being wrong in ''%rep'' blocks and included files.
 +
 +
 +I hope that all these bugfixes will be merged into NASM soon.
 +
 +
 +{{tag>ldebug segmentation nasm}}
 +
 +
 +~~DISCUSSION~~
  
blog/pushbx/2022/0826_recent_segmentation_in_ldebug_and_not_yet_merged_nasm_patches.txt ยท Last modified: 2022-08-26 01:06:20 +0200 Aug Fri by ecm