2024-11-24
This week I worked on porting some more of the MS-DOS kernel to NASM, starting on the msbio module.
-L 0
acts the same as just -R.) Relocations are needed for the msbio build.0h
. (The entire routine is fairly hackish, scanning the parameter twice to determine what base to use and then parse the digits. It works for the expected formats (decits only, 0x followed by hexits, or hexits followed by h) with this fix but it probably doesn't reject all invalid inputs.)absolute $
feature of NASM, which is not well documented. I once suggested it be documented.The identicalisation tool was updated to support the MS-DOS porting.
The Python open functions, along with loops of the form for line in file:
, appear to disagree on whether the Form Feed (^L) counts as a linebreak or not.
Python's open without an encoding specified would not consider ^L to indicate a linebreak. The codecs.open function with an encoding parameter of latin-1
would consider it a linebreak however.
As it matches nano's and NASM's use of line numbers, it would be preferrable to not treat Form Feed as a linebreak. However, codecs.open with the encoding as latin-1 doesn't corrupt the existing linebreaks (CR LF vs LF) whereas open does (regardless the encoding). Specifying a "b" in the open mode (to open) probably wouldn't corrupt the linebreaks but turns all lines read from the file into bytes-like objects. I don't want to deal with changing all of these to fix the Form Feed problem.
The result of the disagreement was that findsourceline counted the lines differently than editsource, leading to corrupted edit spots if the source file contained at least one Form Feed prior to the edit spot.
Quoting the changeset message:
work around, open with latin-1 encoding treats FF as a linebreak
findsourceline used "
open
" without an encoding specified before. This would treat Form Feed (^L) codepoints as not splitting a line when looping through the file using "for line in file:
".inplace, as called by editsource, used "
codecs.open
" with the parameter "encoding='latin-1'
" instead. Unlike findsourceline this treated Form Feed as a linebreak, with the same idiom of a "for line in file:
" loop.This caused the line number passed from findsourceline to the editsource function to mismatch between the two. The edit would be inserted by editsource too early if Form Feeds were present before the spot to be edited, as editsource's line count grew faster than findsourceline's.
istruc
constructs are emitted. This fixes a problem introduced by an earlier change (2024-10-27).%if 0
trees) so that rerunning can detect them,JMP FAR label
to be the same as JMP SEG label:label
. (Requires relocation in the output executable, which msbio allows.)I didn't have time on Sunday to write about this week's changes to the MS-DOS kernel. Briefly, the port to NASM is ongoing.
I used a bunch of symlinks (with limited support by fixmem.pl and ident86 now) to handle files that include other files across directories, in the src/INC/ as well as src/DOS/ or src/BIOS/ directories. (I did not record these symlinks in the repo.)
In const2.nas I translated an IRP build time repetition using a NASM mmacro and two uses of that mmacro.
Fun fact: This table, part of the pre-SDA, is still an actual table of code entrypoints to patch in this version of the kernel. This contradicts what the Interrupt List says about "DOS 4.0+":
the DOS kernel does not invoke critical sections 01h and 02h unless it is patched. DOS 3.1+ contains a zero-terminated list of words beginning at offset -11 from the Swappable Data Area (see #01687 at INT 21/AX=5D06h); each word contains the offset within the DOS data segment of a byte which must be changed from C3h (RET) to 50h (PUSH AX) under DOS 3.x or from 00h to a nonzero value under DOS 4.0+ to enable use of critical sections. For DOS 4.0+, all words in this list point at the byte at offset 0D0Ch.
For ms_data.nas I added and used the I_AM_NOBITS mmacro to make the data nobits, as it overlaps with some of msinit.nas. This used to be done by utilising the re-writing org
directive of MASM, which NASM doesn't support. (FASM can in principle rewrite parts it has already assembled, so it may support rewriting org
?) In conjunction with this, msinit.nas is filled to the end of the data section size and a few non-zero initialisation values in ms_data.nas are recorded to replay them in msinit.nas later before the filling-to-the-end. This special facility is used only five times, which made it simple to spot and edit the specific spots.
The src/CMD/IFSFUNC/MAKEFILE depended on msdata.asm for two of its object files. This by all means appears to be a wrong dependency as a rebuild didn't fail after deleting both the file and the dependencies listed in that makefile.
There was an equate that originally read EQU THIS WORD
. It is unclear if there is any difference to a label word
directive.
In ms_table.nas I had to translate a macro used to create an error table. As part of translating this table I also added some nonsense equates to make sure that fixmem.pl will create the appropriate port equates, which it wouldn't otherwise do as it doesn't know the ErrTab macro's workings.
jmp short EMS_SH_Ret
" before "EMS_Stub_X:
" had to be forced short manually, as it straddled the limit of a rel8 displacement."