User Tools

Site Tools


blog:pushbx:2023:0707_msdebug_ldebug_x2b2_warplink_2023_june_work

MSDebug, lDebug, x2b2, Warplink (2023 June work)

2023-06-25

The last three weeks in part had me on vacation, but that doesn't mean we didn't get any work done.

I don't know when I will be able to upload this blog post, as our main desktop (Linux) box is experiencing some difficulties. RAM has already been swapped, and a new PSU has arrived by post.

As usual some work on two of the debuggers happened, as well as X2B (a public domain "EXE to binary" tool) and a port of WarpLink from the original TASM / MASM sources to sources that can be assembled with a recent NASM.

MSDebug

Other than some documentation updates, a bug in the debugger's M command was fixed. This bug affected overlapping moves with segments that differed in the opposite way as the linear addresses. Prior, the comparison to figure out whether to move forwards or backwards happened extended to the 32 bits of the segmented address. This does not adequately deal with the fact that one linear address can be addressed with up to 4096 different segmented addresses.

The other code change is so as to run most of the debugger's code with Enabled Interrupts (EI). The saveints function disabled interrupts during its IVT access. The solution is to enable them again after some calls to this function.

wwwecm.scr

The only change was to add the x2b2 current release to the build script.

lDebug

The debugger will now always write the "ALASAP" and CALL 5 fields of PSPs that it creates to insure valid and compatible values.

Quote marks are now accepted as separators by parsing, allowing commands such as E 100 0"foo"0 to write 5 bytes. This improves compatibility to MSDebug.

Some bugs of the F RANGE command were fixed, particularly if the second range is in a small segment (limit < 64 KiB) and there is no length specified for the second range.

Several options for MSDebug compatibility were added:

  • Treat 0-length ranges as 64 KiB, except U which treats them as 1 B
  • Display additional blanks in 80-column 16-bit R dump
  • Display variable prompts for R F and R reg commands in the same way as MSDebug
  • Shorten disassembly opcode field width

x2b2

In response to a stackoverflow question I recommended using either the exe2bin command of the 2018 free software MS-DOS v2 release (which is free but without sources) or the Public Domain (with sources) X2B. With my attention on that, I improved a variant of the latter, which I called X2B2.

Instead of a separate build that ignores some error conditions, I added parsing for some switches which allow to select the specific parts to ignore. I also fixed the IP ignore part to handle a wrong IP as if an IP of zero was specified. (Open question: Should we check for CS equal to +0 as well?)

The program also checks for a short read of the MZ EXE header now. Additionally, a short read of the image is only allowed if a switch is passed. Without the switch this short read is also handled as an error.

The buffers and variables are aligned variously. The pathname buffers cannot overflow any longer. An initial cl other than zero is no longer a problem.

Another switch allows files exceeding 64 KiB, with something similar to the prior limit implemented as an optional check.

The pathnames may now contain dollar signs without corrupting the output. Further, spurious NUL bytes are no longer displayed after the pathnames.

The WarpLink source release of version 2.70, released into the Public Domain as of 1999-11-05, was the basis for our repo. About 30 assembly source files are built together to form the main executable of the linker. These were written to target MASM or TASM.

I created a script called fixmem.pl which initially was intended to fix only the memory accesses done without square brackets. I started out editing the first file manually, but soon found I could automate some of my edits.

The core purpose of the script required to parse variable definitions as well as EXTRN directives to learn the sizes of symbols. For simplicity, the size must be learned before it is used, though running the script with the same input file specified twice could likely work around this requirement. The needed WarpLink sources are well-behaved however.

Some accesses that already had brackets still needed a size to be added.

8086 instructions with two or one explicit operands needed to be handled. The one-operand form is simpler: If the operand is a single 8086 register name without SIZE PTR keywords and without brackets, then it is a register operand. Anything else requires brackets, if it doesn't have them yet.

Two-operand forms are more difficult. If an operand is not a register name, and not an equate name, and does not come with a prepended OFFSET keyword, then it must be a memory access. If it comes with SIZE PTR or brackets already, it is also a memory access. (This is always true, except for JMP NEAR PTR which is a direct branch, despite the PTR keyword.)

One-operand forms always have to specify the access size of a memory operand. Two-operand forms of memory accesses do not have to specify a size, if the other operand is a register. Except for if the destination (first) operand is a memory operand, and the source operand is cl, and the instruction is a shift or rotate instruction, in which case the size of the memory operand also needs to be specified.

Soon the next hurdle came up: Structure definitions. Instead of editing all the definitions, we relied on recent NASM supporting the db ? style directives for reserving space, as well as the db NUMBER dup (?) directives to reserve multiple units of space. That left us with the STRUC and ENDS directives that are prefixed by the structure names. Not very difficult to handle with the little-known %00 parameter type to a NASM multi-line macro. The macros for that went into the nasm.mac macro file.

This file also got other extensions such as defining to empty the OFFSET and PTR keywords, as well as handling the PROC, PROC NEAR, ENDP, or LABEL directives. Further, the DOSSEG style segment definitions went there, along with .DATA, .DATA?, .CONST, and .CODE directives to switch to different segments.

Additionally, PUBLIC and EXTRN directives were implemented as macros too. The EXTRN macro is nontrivial, as it has to parse and drop the size or type specifications appended to the symbol's label name with a colon. It was considered less work to handle this in a NASM macro than to change all the directives in the ported sources. The build time performance is probably not very good, but is acceptable.

A particularly good part of nasm.mac is the EVEN directive. It checks for whether it is used in the _BSS segment or not, and will automatically expand to either alignb 2 or align 2. This is better than TASM's implementation of the EVEN directive, because that one will emit a warning whenever it needs to reserve a byte in a BSS section. (I had this concern previously for MSDebug using EVEN directives in space-reserving parts of the sources.)

Subsequently, fixmem.pl learned a lot of new tricks. IFDEF directives get a percent sign prepended to be handled by the NASM preprocessor. COMMENT directives work the same way as in TASM. (I found that the handling of these is very line-oriented even in TASM, and presumably likewise in MASM.) Structure definitions are parsed and remembered specifically to emit proper istruc uses and "local" labels when a structure instance is to be expanded. Some directives are commented out. INCLUDE directives are changed to NASM preprocessor directives, with the filenames uncapitalised and .inc replaced by .mac as the filename extension. Assignments of the form label = $ are replaced by equates with equ. DGROUP: and _TEXT: prefixes are replaced by wrt specifiers for NASM.

jmp and call instructions with DWORD PTR keywords are changed to refer to far memory operands with brackets. Segment override prefixes in front of an opening bracket are moved into the brackets. lods instructions with explicit operands (usually to specify a segment override) are transformed into NASM-style instructions.

One special case that came up late into the work was that the file mlddl2.nas had structure definitions matching those in mlddl1.nas. But the global variables of these structure types were referenced in mlddl2.nas with simply a single symbol of byte type. Despite that, the sources referred to structure fields of these globals with the dot syntax. I invented a new directive for these cases, which dumps equates for all the structure fields of a given structure definition, prepended by a given label and a dot.

The new directive also causes fixmem.pl to infer the symbol size for all of these equates. It may be the better choice to dump these symbol sizes in a recoverable form into the output; for now the sizes are only recorded internally by the script.

Yesterday I finished the port of all source files required for the main linker executable. It can be built all in DOS (using the DPMI executable released by NASM) or mostly on the Linux host (using the host's native NASM). Only the last command, the linking, must be ran inside DOS. This is the WarpLink command to link all OMF object files together to create a DOS .EXE file.

The resulting file has a similar size as the original, and the final link step can be carried out by running the resulting file itself. This latter operation results in exactly the same file as created by the released WarpLink executable. So it seems to work.

You could leave a comment if you were logged in.
blog/pushbx/2023/0707_msdebug_ldebug_x2b2_warplink_2023_june_work.txt · Last modified: 2023-07-07 11:23:16 +0200 Jul Fri by ecm