2023-06-25
The last three weeks in part had me on vacation, but that doesn't mean we didn't get any work done.
I don't know when I will be able to upload this blog post, as our main desktop (Linux) box is experiencing some difficulties. RAM has already been swapped, and a new PSU has arrived by post.
As usual some work on two of the debuggers happened, as well as X2B (a public domain "EXE to binary" tool) and a port of WarpLink from the original TASM / MASM sources to sources that can be assembled with a recent NASM.
Other than some documentation updates, a bug in the debugger's M command was fixed. This bug affected overlapping moves with segments that differed in the opposite way as the linear addresses. Prior, the comparison to figure out whether to move forwards or backwards happened extended to the 32 bits of the segmented address. This does not adequately deal with the fact that one linear address can be addressed with up to 4096 different segmented addresses.
The other code change is so as to run most of the debugger's code with Enabled Interrupts (EI). The saveints
function disabled interrupts during its IVT access. The solution is to enable them again after some calls to this function.
The only change was to add the x2b2 current release to the build script.
The debugger will now always write the "ALASAP" and CALL 5 fields of PSPs that it creates to insure valid and compatible values.
Quote marks are now accepted as separators by parsing, allowing commands such as E 100 0"foo"0
to write 5 bytes. This improves compatibility to MSDebug.
Some bugs of the F RANGE command were fixed, particularly if the second range is in a small segment (limit < 64 KiB) and there is no length specified for the second range.
Several options for MSDebug compatibility were added:
R F
and R reg
commands in the same way as MSDebug
In response to a stackoverflow question I recommended using either the exe2bin
command of the 2018 free software MS-DOS v2 release (which is free but without sources) or the Public Domain (with sources) X2B. With my attention on that, I improved a variant of the latter, which I called X2B2.
Instead of a separate build that ignores some error conditions, I added parsing for some switches which allow to select the specific parts to ignore. I also fixed the IP ignore part to handle a wrong IP as if an IP of zero was specified. (Open question: Should we check for CS equal to +0 as well?)
The program also checks for a short read of the MZ EXE header now. Additionally, a short read of the image is only allowed if a switch is passed. Without the switch this short read is also handled as an error.
The buffers and variables are aligned variously. The pathname buffers cannot overflow any longer. An initial cl
other than zero is no longer a problem.
Another switch allows files exceeding 64 KiB, with something similar to the prior limit implemented as an optional check.
The pathnames may now contain dollar signs without corrupting the output. Further, spurious NUL bytes are no longer displayed after the pathnames.
The WarpLink source release of version 2.70, released into the Public Domain as of 1999-11-05, was the basis for our repo. About 30 assembly source files are built together to form the main executable of the linker. These were written to target MASM or TASM.
I created a script called fixmem.pl
which initially was intended to fix only the memory accesses done without square brackets. I started out editing the first file manually, but soon found I could automate some of my edits.
The core purpose of the script required to parse variable definitions as well as EXTRN
directives to learn the sizes of symbols. For simplicity, the size must be learned before it is used, though running the script with the same input file specified twice could likely work around this requirement. The needed WarpLink sources are well-behaved however.
Some accesses that already had brackets still needed a size to be added.
8086 instructions with two or one explicit operands needed to be handled. The one-operand form is simpler: If the operand is a single 8086 register name without SIZE PTR
keywords and without brackets, then it is a register operand. Anything else requires brackets, if it doesn't have them yet.
Two-operand forms are more difficult. If an operand is not a register name, and not an equate name, and does not come with a prepended OFFSET
keyword, then it must be a memory access. If it comes with SIZE PTR
or brackets already, it is also a memory access. (This is always true, except for JMP NEAR PTR
which is a direct branch, despite the PTR
keyword.)
One-operand forms always have to specify the access size of a memory operand. Two-operand forms of memory accesses do not have to specify a size, if the other operand is a register. Except for if the destination (first) operand is a memory operand, and the source operand is cl
, and the instruction is a shift or rotate instruction, in which case the size of the memory operand also needs to be specified.
Soon the next hurdle came up: Structure definitions. Instead of editing all the definitions, we relied on recent NASM supporting the db ?
style directives for reserving space, as well as the db NUMBER dup (?)
directives to reserve multiple units of space. That left us with the STRUC
and ENDS
directives that are prefixed by the structure names. Not very difficult to handle with the little-known %00
parameter type to a NASM multi-line macro. The macros for that went into the nasm.mac
macro file.
This file also got other extensions such as defining to empty the OFFSET
and PTR
keywords, as well as handling the PROC
, PROC NEAR
, ENDP
, or LABEL
directives. Further, the DOSSEG
style segment definitions went there, along with .DATA
, .DATA?
, .CONST
, and .CODE
directives to switch to different segments.
Additionally, PUBLIC
and EXTRN
directives were implemented as macros too. The EXTRN
macro is nontrivial, as it has to parse and drop the size or type specifications appended to the symbol's label name with a colon. It was considered less work to handle this in a NASM macro than to change all the directives in the ported sources. The build time performance is probably not very good, but is acceptable.
A particularly good part of nasm.mac
is the EVEN directive. It checks for whether it is used in the _BSS
segment or not, and will automatically expand to either alignb 2
or align 2
. This is better than TASM's implementation of the EVEN directive, because that one will emit a warning whenever it needs to reserve a byte in a BSS section. (I had this concern previously for MSDebug using EVEN directives in space-reserving parts of the sources.)
Subsequently, fixmem.pl
learned a lot of new tricks. IFDEF
directives get a percent sign prepended to be handled by the NASM preprocessor. COMMENT
directives work the same way as in TASM. (I found that the handling of these is very line-oriented even in TASM, and presumably likewise in MASM.) Structure definitions are parsed and remembered specifically to emit proper istruc
uses and "local" labels when a structure instance is to be expanded. Some directives are commented out. INCLUDE
directives are changed to NASM preprocessor directives, with the filenames uncapitalised and .inc
replaced by .mac
as the filename extension. Assignments of the form label = $
are replaced by equates with equ
. DGROUP:
and _TEXT:
prefixes are replaced by wrt
specifiers for NASM.
jmp
and call
instructions with DWORD PTR
keywords are changed to refer to far
memory operands with brackets. Segment override prefixes in front of an opening bracket are moved into the brackets. lods
instructions with explicit operands (usually to specify a segment override) are transformed into NASM-style instructions.
One special case that came up late into the work was that the file mlddl2.nas
had structure definitions matching those in mlddl1.nas
. But the global variables of these structure types were referenced in mlddl2.nas
with simply a single symbol of byte
type. Despite that, the sources referred to structure fields of these globals with the dot syntax. I invented a new directive for these cases, which dumps equates for all the structure fields of a given structure definition, prepended by a given label and a dot.
The new directive also causes fixmem.pl
to infer the symbol size for all of these equates. It may be the better choice to dump these symbol sizes in a recoverable form into the output; for now the sizes are only recorded internally by the script.
Yesterday I finished the port of all source files required for the main linker executable. It can be built all in DOS (using the DPMI executable released by NASM) or mostly on the Linux host (using the host's native NASM). Only the last command, the linking, must be ran inside DOS. This is the WarpLink command to link all OMF object files together to create a DOS .EXE file.
The resulting file has a similar size as the original, and the final link step can be carried out by running the resulting file itself. This latter operation results in exactly the same file as created by the released WarpLink executable. So it seems to work.