User Tools

Site Tools


blog:pushbx:2024:1125_mid_late_november_work_on_porting_the_ms-dos_kernel

Mid late November work on porting the MS-DOS kernel

2024-11-24

This week I worked on porting some more of the MS-DOS kernel to NASM, starting on the msbio module.

x2b2

  • Bugfix, when the MZ .EXE header indicates a last page size of 0 this means it needs to be handled like a full page (size of 512). This didn't occur in practice but WarpLink does write a 0 for a test with an image size of exactly 512 bytes (the WarpLink MZ header is also 512 bytes by default).
  • Keep track of actually written size, which can differ from calculated output size in case of the -E switch being specified. If this case occurs, display the "actually written" size on a new line.
  • Add switch -L which reads a number and applies this number as a relocation segment. This implies the -R switch as well. Number of "processed relocations" is displayed on another new line, if any. (-L 0 acts the same as just -R.) Relocations are needed for the msbio build.
  • Add -J switch to force a certain offset value from the .EXE image start. This allows more fine-grained control than the -I switch or the default behaviour absent both switches. (Note that with -J and -L both, relocations affecting the cut image prefix are not allowed yet.)
  • Fix the get number routine to accept 0h. (The entire routine is fairly hackish, scanning the parameter twice to determine what base to use and then parse the digits. It works for the expected formats (decits only, 0x followed by hexits, or hexits followed by h) with this fix but it probably doesn't reject all invalid inputs.)
  • Make the errflag variable word-aligned as it is used as a word (for the process exit code).
  • Replace equates at the end of the code by labels in nobits space. This uses the absolute $ feature of NASM, which is not well documented. I once suggested it be documented.
  • Add alignment directives to nobits data. No effect with the current buffer sizes, as they're all even sized.

tractest/convlist.pl

  • Allow .map file class column to overflow into group column, assuming that no group is specified then. (Could use more testing of how WarpLink formats these lines.)
  • Allow MASM type listing files to format hexdump continuation lines to fit with the .tls format. This was only supported for RASM-86, TASM, and NASM yet. (Does JWasm not do these?)

ident86

The identicalisation tool was updated to support the MS-DOS porting.

  • Add debugging output, and skip unchanged= output in file editing after to change= line has been found.
  • Work around a Python oddity. See below.
  • Expand symlinks in inplace handler so that we edit (overwrite) the pointed-to file, not the symlink itself. Needed to support MS-DOS source includes that cross directories, mostly in the msdos module.
  • Add a debug mask to enable some output for setting the foundoffset variable.
  • If foundoffset is None, immediately start the second run in findsourceline's tls search loop. Would throw an exception trying to seek before.
  • Add detection of critpatchentry and short_addr mmacros from the MS-DOS port as .tls directives equivalent to DW (word-size data).
  • Add a hint and source file edit type to expand DD directive with a symbol to a DW directive with the same symbol and then a SEG keyword with the symbol repeated. MASM appears to do that automatically when parsing a DD directive, but I preferred to add this edit to ident86 rather than have fixmem guess it.
  • Add more debugging output with the masks 1024, 2048, and 4096.
  • Fix to not misdetect .tls lines without a dump if their suffix starts with two hexit letters. Described in more detail in the changeset message.
  • Harden the fix by requiring a whitespace or End Of Line after the hexdump. Still allows any combination of square brackets, round parens, and hexits. The prior fix checks that the dump ends within the first 40 columns of the .tls file. (Could be hardened further by requiring at least two valid codepoints in the dump.)
  • Add debugging output with the masks 8192 and 16384 to check hexdump continuation is parsed into tls directives properly.

Form Feed disagreement

The Python open functions, along with loops of the form for line in file:, appear to disagree on whether the Form Feed (^L) counts as a linebreak or not.

Python's open without an encoding specified would not consider ^L to indicate a linebreak. The codecs.open function with an encoding parameter of latin-1 would consider it a linebreak however.

As it matches nano's and NASM's use of line numbers, it would be preferrable to not treat Form Feed as a linebreak. However, codecs.open with the encoding as latin-1 doesn't corrupt the existing linebreaks (CR LF vs LF) whereas open does (regardless the encoding). Specifying a "b" in the open mode (to open) probably wouldn't corrupt the linebreaks but turns all lines read from the file into bytes-like objects. I don't want to deal with changing all of these to fix the Form Feed problem.

The result of the disagreement was that findsourceline counted the lines differently than editsource, leading to corrupted edit spots if the source file contained at least one Form Feed prior to the edit spot.

Quoting the changeset message:

work around, open with latin-1 encoding treats FF as a linebreak

findsourceline used "open" without an encoding specified before. This would treat Form Feed (^L) codepoints as not splitting a line when looping through the file using "for line in file:".

inplace, as called by editsource, used "codecs.open" with the parameter "encoding='latin-1'" instead. Unlike findsourceline this treated Form Feed as a linebreak, with the same idiom of a "for line in file:" loop.

This caused the line number passed from findsourceline to the editsource function to mismatch between the two. The edit would be inserted by editsource too early if Form Feeds were present before the spot to be edited, as editsource's line count grew faster than findsourceline's.

fixmem

MS-DOS kernel

I didn't have time on Sunday to write about this week's changes to the MS-DOS kernel. Briefly, the port to NASM is ongoing.

I used a bunch of symlinks (with limited support by fixmem.pl and ident86 now) to handle files that include other files across directories, in the src/INC/ as well as src/DOS/ or src/BIOS/ directories. (I did not record these symlinks in the repo.)

Critical section patch table

In const2.nas I translated an IRP build time repetition using a NASM mmacro and two uses of that mmacro.

Fun fact: This table, part of the pre-SDA, is still an actual table of code entrypoints to patch in this version of the kernel. This contradicts what the Interrupt List says about "DOS 4.0+":

the DOS kernel does not invoke critical sections 01h and 02h unless it is patched. DOS 3.1+ contains a zero-terminated list of words beginning at offset -11 from the Swappable Data Area (see #01687 at INT 21/AX=5D06h); each word contains the offset within the DOS data segment of a byte which must be changed from C3h (RET) to 50h (PUSH AX) under DOS 3.x or from 00h to a nonzero value under DOS 4.0+ to enable use of critical sections. For DOS 4.0+, all words in this list point at the byte at offset 0D0Ch.

MS data/init overlapping parts

For ms_data.nas I added and used the I_AM_NOBITS mmacro to make the data nobits, as it overlaps with some of msinit.nas. This used to be done by utilising the re-writing org directive of MASM, which NASM doesn't support. (FASM can in principle rewrite parts it has already assembled, so it may support rewriting org?) In conjunction with this, msinit.nas is filled to the end of the data section size and a few non-zero initialisation values in ms_data.nas are recorded to replay them in msinit.nas later before the filling-to-the-end. This special facility is used only five times, which made it simple to spot and edit the specific spots.

Wrong dependency in IFSFUNC module

The src/CMD/IFSFUNC/MAKEFILE depended on msdata.asm for two of its object files. This by all means appears to be a wrong dependency as a rebuild didn't fail after deleting both the file and the dependencies listed in that makefile.

EQU THIS WORD

There was an equate that originally read EQU THIS WORD. It is unclear if there is any difference to a label word directive.

Nonsense equates added for ErrTab

In ms_table.nas I had to translate a macro used to create an error table. As part of translating this table I also added some nonsense equates to make sure that fixmem.pl will create the appropriate port equates, which it wouldn't otherwise do as it doesn't know the ErrTab macro's workings.

msbio module: linking and MZ .EXE flattening

msbio porting

  • Use a perl scriptlet to change .inc includes to .nas for a certain list of files. This busywork was replaced by the --mactonas switch to fixmem.pl at a later point.
  • Fix a mismatching size for the START_BDS variable. This is of no importance but was flagged as an error by fixmem.pl.
  • Swap text immediate bytes in a DW directive, which fixmem.pl doesn't handle yet. This is the first time it came up during the kernel port. The immediate is commented as "J.K. 11/7/86 Secrete code for DOS 3.30 IBMBIO."
  • Occasional fixes to specify a destination word size when the source is an immediate number that doesn't fit in a byte. On a 386+ assembler it would be ambiguous whether to use word size or dword size. Or it could just be an overflow bug and the intended use was to access a byte. But for this version of MASM it appears the word size can be implied.
  • Use a perl scriptlet to replace double-dollar sign by DD_ to fix labels starting with dollar signs
  • Use another perl scriptlet to translate switches from IFNDEF / equ to %ifndef / %iassign
  • Replace equates by %iassign for psdata_seg. This is the use case for fixmem.pl's --segdefine switch.
  • Replace equates by %iassign for the registers BH and BL. This is the use case for fixmem.pl's --regdefine switch.
  • Another case of implied word size with an immediate that doesn't fit in a byte.
  • Rerun fixmem.pl after a bugfix in the script, to fix an xlatb instruction that has a segment override. The changeset message lists a long series of scriptlets to re-apply certain patches before and after the fixmem.pl run.
  • Align SYSINITSEG start to a paragraph boundary. Unclear how this was done by MASM.
  • Rerun fixmem.pl with --exclude msequ.mac to avoid COUNT as an immediate source to be misdetected as a memory load. Shows more scriptlets to re-apply certain patches.
  • Rerun fixmem.pl yet again, after the bugfix to make it process assumptions again. During the ident86 runs I noticed a suspicious amount of assumptions being added by ident86 source edits, that should have been caught by fixmem.pl already. Related to the LEA detection and the question to perlmonks.org listed for fixmem.pl above.
  • Within an identicalisation changeset: "The "jmp short EMS_SH_Ret" before "EMS_Stub_X:" had to be forced short manually, as it straddled the limit of a rel8 displacement."
You could leave a comment if you were logged in.
blog/pushbx/2024/1125_mid_late_november_work_on_porting_the_ms-dos_kernel.txt · Last modified: 2024-11-25 18:25:54 +0100 Nov Mon by ecm