« Early mid November work on porting the MS-DOS kernel

November December: Continued work on MS-DOS porting, and lDebug »

Mid late November work on porting the MS-DOS kernel

2024-11-24

This week I worked on porting some more of the MS-DOS kernel to NASM, starting on the msbio module.

x2b2

Bugfix, when the MZ .EXE header indicates a last page size of 0 this means it needs to be handled like a full page (size of 512). This didn't occur in practice but WarpLink does write a 0 for a test with an image size of exactly 512 bytes (the WarpLink MZ header is also 512 bytes by default).
Keep track of actually written size, which can differ from calculated output size in case of the -E switch being specified. If this case occurs, display the "actually written" size on a new line.
Add switch -L which reads a number and applies this number as a relocation segment. This implies the -R switch as well. Number of "processed relocations" is displayed on another new line, if any. (-L 0 acts the same as just -R.) Relocations are needed for the msbio build.
Add -J switch to force a certain offset value from the .EXE image start. This allows more fine-grained control than the -I switch or the default behaviour absent both switches. (Note that with -J and -L both, relocations affecting the cut image prefix are not allowed yet.)
Bump release number.
Fix the get number routine to accept 0h. (The entire routine is fairly hackish, scanning the parameter twice to determine what base to use and then parse the digits. It works for the expected formats (decits only, 0x followed by hexits, or hexits followed by h) with this fix but it probably doesn't reject all invalid inputs.)
Make the errflag variable word-aligned as it is used as a word (for the process exit code).
Replace equates at the end of the code by labels in nobits space. This uses the absolute $ feature of NASM, which is not well documented. I once suggested it be documented.
Add alignment directives to nobits data. No effect with the current buffer sizes, as they're all even sized.

tractest/convlist.pl

Allow .map file class column to overflow into group column, assuming that no group is specified then. (Could use more testing of how WarpLink formats these lines.)
Allow MASM type listing files to format hexdump continuation lines to fit with the .tls format. This was only supported for RASM-86, TASM, and NASM yet. (Does JWasm not do these?)

ident86

The identicalisation tool was updated to support the MS-DOS porting.

Add debugging output, and skip unchanged= output in file editing after to change= line has been found.
Work around a Python oddity. See below.
Expand symlinks in inplace handler so that we edit (overwrite) the pointed-to file, not the symlink itself. Needed to support MS-DOS source includes that cross directories, mostly in the msdos module.
Add a debug mask to enable some output for setting the foundoffset variable.
If foundoffset is None, immediately start the second run in findsourceline's tls search loop. Would throw an exception trying to seek before.
Add detection of critpatchentry and short_addr mmacros from the MS-DOS port as .tls directives equivalent to DW (word-size data).
Add a hint and source file edit type to expand DD directive with a symbol to a DW directive with the same symbol and then a SEG keyword with the symbol repeated. MASM appears to do that automatically when parsing a DD directive, but I preferred to add this edit to ident86 rather than have fixmem guess it.
Add more debugging output with the masks 1024, 2048, and 4096.
Fix to not misdetect .tls lines without a dump if their suffix starts with two hexit letters. Described in more detail in the changeset message.
Harden the fix by requiring a whitespace or End Of Line after the hexdump. Still allows any combination of square brackets, round parens, and hexits. The prior fix checks that the dump ends within the first 40 columns of the .tls file. (Could be hardened further by requiring at least two valid codepoints in the dump.)
Add debugging output with the masks 8192 and 16384 to check hexdump continuation is parsed into tls directives properly.

Form Feed disagreement

The Python open functions, along with loops of the form for line in file:, appear to disagree on whether the Form Feed (^L) counts as a linebreak or not.

Python's open without an encoding specified would not consider ^L to indicate a linebreak. The codecs.open function with an encoding parameter of latin-1 would consider it a linebreak however.

As it matches nano's and NASM's use of line numbers, it would be preferrable to not treat Form Feed as a linebreak. However, codecs.open with the encoding as latin-1 doesn't corrupt the existing linebreaks (CR LF vs LF) whereas open does (regardless the encoding). Specifying a "b" in the open mode (to open) probably wouldn't corrupt the linebreaks but turns all lines read from the file into bytes-like objects. I don't want to deal with changing all of these to fix the Form Feed problem.

The result of the disagreement was that findsourceline counted the lines differently than editsource, leading to corrupted edit spots if the source file contained at least one Form Feed prior to the edit spot.

Quoting the changeset message:

work around, open with latin-1 encoding treats FF as a linebreak

findsourceline used "open" without an encoding specified before. This would treat Form Feed (^L) codepoints as not splitting a line when looping through the file using "for line in file:".

inplace, as called by editsource, used "codecs.open" with the parameter "encoding='latin-1'" instead. Unlike findsourceline this treated Form Feed as a linebreak, with the same idiom of a "for line in file:" loop.

This caused the line number passed from findsourceline to the editsource function to mismatch between the two. The edit would be inserted by editsource too early if Form Feeds were present before the spot to be edited, as editsource's line count grew faster than findsourceline's.

fixmem

Add comments to most blocks of the main loop. The use should be obvious.
Comment out .lall, .sall, and .xall directives.
Expand symlinks to process targeted file. This reads the directory from the name parameter in case it is a pathname, as symlinks are assumed to be relative to the directory in which they are placed. This hardcodes a certain set of codepoints as directory separators (colon, slash, backslash). It also assumes the symlink target path is not absolute.
Support WRT directive insertion when the clause ends in a closing angle bracket.
Add switches to the script.
First switch is --section-clean, which causes the script to drop section directive attributes for files parsed after the switch.
Second switch is --no-istruc-labels which disables an earlier attempt at supporting dotted labels by emitting labels for structure members of a structure instance. This should probably be the default, and the dotted label mechanism should be configurable to make the prior behaviour useful at all.
To avoid editing the three spots in which filenames are appended to the @nextfile array, re-use function processfilename. (Note that this also applies the --delete or --exclude filters, which may not be wanted for the include processing.)
Record that a structure instance's label is a label, avoiding changes in subsequent runs of fixmem.pl
In structure instances do not emit AT directives if not needed for initialisation nor "istruc labels".
Add port equate for _size to be equal to _struc_size in case istruc constructs are emitted. This fixes a problem introduced by an earlier change (2024-10-27).
Add the third switch, --debug-include, to enable some selected debugging output.
Add a switch called --debug-assume.
Fix, match I_NEED directive caps-insensitively for whether to use DATA segment.
Fix xlatb replacement thrice: Once to only change the instruction if to drop anything, and another time to fix the order of $pardrop and $parseg. Last time is to avoid eating blanks after an already translated xlatb, by matching the last text in the to-be-dropped part only if it is not whitespace.
Detect CODE_SEGMENT and include of msgroup.mac as entering the CODE section. (Not complete, the ASSUME directive is not used as it should be.)
Move up handling of COMMENT directives,
and skip commented lines,
and mark COMMENT directive replacements (using %if 0 trees) so that rerunning can detect them,
and insert a blank after the semicolon of the COMMENT comment.
Add more debugging output to processing includes (with AND?).
Add --delete switch, to change to filtering out the basenames given as subsequent parameters (until --no-delete)
Add --mactonas= switch to translate includes from .mac filenames to .nas filenames. This allows automating one step of the msbio port.
Add --exclude switch, like delete but only applicable to the very next parameter.
Add an implicit assumption that the current section is accessible as CS if all prior assumptions fail. In MASM it seems that an ASSUME CS: directive is needed instead. (Works around the problem of msgroup.mac including an ASSUME directive.)
In getlabel strip matching parens around the entire term.
Add brackets and a NEAR keyword for CALL WORD PTR
Allow blanks before colons in several spots (for instance)
Add a FAR keyword to branch (JMP or CALL) to label marked FAR. This uses a NASM feature that expands JMP FAR label to be the same as JMP SEG label:label. (Requires relocation in the output executable, which msbio allows.)
Normalise to allcaps in lookup of structure name for structure instance.
Allow SIZE operand to end in plus or minus sign (unclear why not just match a symbol)
Support an equate with segment override followed by size keyword (translated to labelsize) (shouldn't allow to produce equate with a segment override, which it does if /\boffset\b/ matches) (Offset keyword match should have flag /i)
Do not display an error if no assumption is able to reach a "labelsize with segment" symbol. (Can any labelsize ever be reached?)
Mark structure instance labels with "NASM structure instance" and detect that they are valid as dotted label base, despite not being recorded as BYTE sized labels.
Do not misdetect SIZE word in %warning directives or string data
Support branches with DWORD PTR then a variable name (no brackets?)
Allow double dotted labels, consisting of a label, dot, equate, dot, and another equate. Used by some of the msbio init/config sources.
Add --segdefine and --regdefine switches to expand the list of words that the script recognises as segregs or registers generally.
Attempt to fix LEA not detected for assumptions. This fix turned out to be wrong and spawned another fix, as the NOT, dot, and =~ operators appear to not interact as expected. This changeset therefore broke all processing of assumptions, which I noticed in subsequent ident86 runs. (I asked over at perlmonks.org about the fix, as I didn't understand why it needed two pairs of parens.)
Detect seg defines as register operands as well.
Allow INVOKE or TRANSFER macros to define a label, needed to emit a required port label.
Add FAR keyword if branching to a label with size DWORD and with brackets already present.

MS-DOS kernel

I didn't have time on Sunday to write about this week's changes to the MS-DOS kernel. Briefly, the port to NASM is ongoing.

Symlinks

I used a bunch of symlinks (with limited support by fixmem.pl and ident86 now) to handle files that include other files across directories, in the src/INC/ as well as src/DOS/ or src/BIOS/ directories. (I did not record these symlinks in the repo.)

Critical section patch table

In const2.nas I translated an IRP build time repetition using a NASM mmacro and two uses of that mmacro.

Fun fact: This table, part of the pre-SDA, is still an actual table of code entrypoints to patch in this version of the kernel. This contradicts what the Interrupt List says about "DOS 4.0+":

the DOS kernel does not invoke critical sections 01h and 02h unless it is patched. DOS 3.1+ contains a zero-terminated list of words beginning at offset -11 from the Swappable Data Area (see #01687 at INT 21/AX=5D06h); each word contains the offset within the DOS data segment of a byte which must be changed from C3h (RET) to 50h (PUSH AX) under DOS 3.x or from 00h to a nonzero value under DOS 4.0+ to enable use of critical sections. For DOS 4.0+, all words in this list point at the byte at offset 0D0Ch.

MS data/init overlapping parts

For ms_data.nas I added and used the I_AM_NOBITS mmacro to make the data nobits, as it overlaps with some of msinit.nas. This used to be done by utilising the re-writing org directive of MASM, which NASM doesn't support. (FASM can in principle rewrite parts it has already assembled, so it may support rewriting org?) In conjunction with this, msinit.nas is filled to the end of the data section size and a few non-zero initialisation values in ms_data.nas are recorded to replay them in msinit.nas later before the filling-to-the-end. This special facility is used only five times, which made it simple to spot and edit the specific spots.

Wrong dependency in IFSFUNC module

The src/CMD/IFSFUNC/MAKEFILE depended on msdata.asm for two of its object files. This by all means appears to be a wrong dependency as a rebuild didn't fail after deleting both the file and the dependencies listed in that makefile.

EQU THIS WORD

There was an equate that originally read EQU THIS WORD. It is unclear if there is any difference to a label word directive.

Nonsense equates added for ErrTab

In ms_table.nas I had to translate a macro used to create an error table. As part of translating this table I also added some nonsense equates to make sure that fixmem.pl will create the appropriate port equates, which it wouldn't otherwise do as it doesn't know the ErrTab macro's workings.

msbio module: linking and MZ .EXE flattening

Run fixupp utility on all MASM-generated object files
Create msbio.bin from the .fob fixed-up object files
Copy over mzstack.nas from msdos
Link and flatten file using WarpLink and x2b2, making use of x2b2's new -L switch to apply relocations to the generated image
Create msbio.bin from the WarpLink-linked file
Use the new msbiow.map map file for the convlist.pl run

msbio porting

Use a perl scriptlet to change .inc includes to .nas for a certain list of files. This busywork was replaced by the --mactonas switch to fixmem.pl at a later point.
Fix a mismatching size for the START_BDS variable. This is of no importance but was flagged as an error by fixmem.pl.
Swap text immediate bytes in a DW directive, which fixmem.pl doesn't handle yet. This is the first time it came up during the kernel port. The immediate is commented as "J.K. 11/7/86 Secrete code for DOS 3.30 IBMBIO."
Occasional fixes to specify a destination word size when the source is an immediate number that doesn't fit in a byte. On a 386+ assembler it would be ambiguous whether to use word size or dword size. Or it could just be an overflow bug and the intended use was to access a byte. But for this version of MASM it appears the word size can be implied.
Translate macros in the stack switching / IRQ handling code
Use a perl scriptlet to replace double-dollar sign by DD_ to fix labels starting with dollar signs
Use another perl scriptlet to translate switches from IFNDEF / equ to %ifndef / %iassign
Replace equates by %iassign for psdata_seg. This is the use case for fixmem.pl's --segdefine switch.
Replace equates by %iassign for the registers BH and BL. This is the use case for fixmem.pl's --regdefine switch.
Another case of implied word size with an immediate that doesn't fit in a byte.
Rerun fixmem.pl after a bugfix in the script, to fix an xlatb instruction that has a segment override. The changeset message lists a long series of scriptlets to re-apply certain patches before and after the fixmem.pl run.
Align SYSINITSEG start to a paragraph boundary. Unclear how this was done by MASM.
Rerun fixmem.pl with --exclude msequ.mac to avoid COUNT as an immediate source to be misdetected as a memory load. Shows more scriptlets to re-apply certain patches.
Rerun fixmem.pl yet again, after the bugfix to make it process assumptions again. During the ident86 runs I noticed a suspicious amount of assumptions being added by ident86 source edits, that should have been caught by fixmem.pl already. Related to the LEA detection and the question to perlmonks.org listed for fixmem.pl above.
Within an identicalisation changeset: "The "jmp short EMS_SH_Ret" before "EMS_Stub_X:" had to be forced short manually, as it straddled the limit of a rel8 displacement."

x2b2, convlist, ident86, fixmem, perlmonks, msdos, msdos4

You could leave a comment if you were logged in.

pushbx wiki

Table of Contents

Mid late November work on porting the MS-DOS kernel

x2b2

tractest/convlist.pl

ident86

Form Feed disagreement

fixmem

MS-DOS kernel

Symlinks

Critical section patch table

MS data/init overlapping parts

Wrong dependency in IFSFUNC module

EQU THIS WORD

Nonsense equates added for ErrTab

msbio module: linking and MZ .EXE flattening

msbio porting

pushbx wiki

User Tools

Site Tools

Table of Contents

Mid late November work on porting the MS-DOS kernel

x2b2

tractest/convlist.pl

ident86

Form Feed disagreement

fixmem

MS-DOS kernel

Symlinks

Critical section patch table

MS data/init overlapping parts

Wrong dependency in IFSFUNC module

EQU THIS WORD

Nonsense equates added for ErrTab

msbio module: linking and MZ .EXE flattening

msbio porting

Page Tools