User Tools

Site Tools


blog:pushbx:2025:0119_ms-dos_simpler_buffers_and_doscode_split

MS-DOS: simpler buffers and DOSCODE split

2025-01-19

This week I first uploaded some changes from the last 20 days concerning an adaptation of the MS-DOS v2.11 buffers subsystem to the lMS-DOS code base. Then I worked on splitting the DOSCODE segment from DOSDATA. This work isn't finished: While the split is done, it is not yet useful. But the hardest part seems to be done.

MS-DOS v2

This older repo, based on the free software release in 2018, received an update. I used a perl scriptlet to strip non-printable ASCII codepoints from the assembly language source files. This allows to view the files as text using the hgweb server.

The scriptlet is perl -i -pe 's/[\x00-\x08\x0B\x0C\x0E-\x1F]/ /g' *.ASM

lDebug

lmacros

Pick a change first done on lmacros2.mac in the lMS-DOS repo. Allow the use of lframe -2 (in 16-bit code) to enable using the macros without pushing the BP register to the expected spot.

This is used in lMS-DOS's function restore_world which has the final output BP within its stack frame so we do not want to additionally push the input BP. This is achieved by combining lframe -2 with several lpar uses, and an lemit off followed by lenter. After the non-emittng lenter we place a manual mov bp, sp which sets up the Base Pointer register to the right offset to allow access to the declared parameters. As is usual for more complex stack frames we clean up with an lleave ctx mmacro call.

lDOS boot

lMS-DOS

Buffer subsystem

This work results in a build-time choice of two different buffer subsystems: The original MS-DOS v4 one with hash buckets, a secondary cache, and ability to use EMS for buffers, and the transplanted MS-DOS v2 one with some adjustments to work with MS-DOS v4 such as extending the sector numbers to 32 bits.

The choice is controlled using the src/INC/buf2sw.mac file, which either adds a define called BUF2 or does not do so. Using BUF2 saves about 2 KiB of resident memory use when comparing apples to apples (ie without BUFFERS /X EMS use).

I started this work in late 2024 December, but only uploaded it within the last week. The individual steps:

This last change was what took me so long to finish. When I had almost prepared it, I tested assembling without BUF2 defined. I found that in one spot I had the condition backwards. Other than that it seems to have worked out of the box.

I noted in the changeset message that the change "still has bugs". This seems to have been wrong however; a different bug seems to have affected my testing. I did not find any other bugs in the buffer conversion yet.

Other

Critical section patches

Prepare a transfer function (transfer_doscode_to_dosdata) to allow running critical sections functions in DOSDATA from code running in DOSCODE. The transfer function already "works" despite being unneeded at this point.

At this point I believe the error occurred that I wrote xchg ax, [?in_ax_out_dosdata_retf] without using BP as the base address register. This, of course, lead to a crash.

The interrupt list documents that MS-DOS v3.10+ has a table of patches, in the pre-SDA data, that specifies four offsets in DOSDATA to patch from retn to push ax to enable some critical section calls. It also documents that as of MS-DOS v4 all four words point to the same byte and the patch should be from 00h to a nonzero value instead.

This appears to be inaccurate, in the v4 release these are still the v3-style code patchsites. The different style probably was introduced with v5's DOSDATA/DOSCODE split. In lMS-DOS I chose to retain the old-style patchsites, hence the transfer function to pass the control flow from DOSCODE to DOSDATA and back.

The build promptly failed in IFSFUNC because (like SHARE before it) it includes some object files of the MSDOS kernel module to get the data layout right. Fixed by adding the new patchsite names to the list of placeholder entrypoints in IFSFLINK.ASM.

More other changes

  • In disp.nas rewrite restore_world and save_world to use only the stack, no more temporary variable in memory. This is needed because all the segment registers save CS may hold user values when running these. Solved with some good old stack juggling.
  • Add reloc-abs-byte error warning from lmacros to nasm.mac. This is a safer change than including lmacros everywhere.
  • Change the critical section transfer function to accept the target entrypoint in bx, rather than a pointer to a word containing the same. The pointer to a variable approach would be useful if we directly pushed it to the stack, however in this case I push BX and write the parameter to the BX register instead. That makes the indirection useless.
  • Move the int 23h caller and int 0 handler into DOSDATA for good. This involves several transfers to bridge the gap. To get from the int 23h handler in DOSCODE to the DOSDATA part, the transfer_doscode_to_dosdata function from the critical section functions is re-used, pushing a placeholder for the near return offset then the original BX value (twice push bx) and adding 6 to SP after the transfer to get rid of the two return addresses.
  • (Same changeset:) Replace PRA return code to modify the parent's user stack frame instead of pushing from CS-addressed variables. The old code, with an appropriate transfer to DOSDATA, remains commented out. (The use of CS-addressed variables is also the reason to put the int 23h caller into DOSDATA.)
  • In 2F.122A function (FastInit) access DOSDATA using ES segment rather than CS.

disp.nas CS uses

A single changeset with several changes:

  • User stack frame functions (33h, 50h, 51h, 62h, 64h) get DS on stack, and the DS register => DOSDATA. Most of them have to access DOSDATA so this is the smallest solution.
  • CALL 5 handler replaced by RxDOS v7.21's that I contributed. Avoids the need for a CS-addressed variable. (Note that CALL 5 fails in qemu because it initialises A20 to on, but the kernel doesn't know about the HMA yet. Other than that it works.)
  • Int 21h handler doesn't use CS addressing any longer except for the Dispatch table.
  • Int 21h and CALL 5 callers compare the maximum function numbers as hardcoded immediates. It seems we cannot get the linker to embed 8-bit relocations for equates here. So just hardcode them and add a check in ms_table.nas as a reminder.

More changes

  • Prepare 28 files for DOSCODE split. Mostly changing the CODE section to DOSDATACODE (intermediate placeholder), and changing CS uses to SS. In processing these files I started out with the command nano *.nas to work through all files, searching for section, \bCODE\b, and \bcs\b in every file.
  • Update ifs.nas, including changing IFS_DOSCALL to access DOSDATA using DS rather than CS. In this case using SS is not always possible as part of the code may run on a user stack. During this conversion I noticed that the 20h functions leave DS => DOSDATA. This is true of the original releases and I preserved it in my update. (Some other functions like 2F.122C also write other registers not documented in the interrupt list.)
    • Switching stacks back to the user stack is done by artificially pushing ds, si, and fl onto the user stack to allow popping them off after setting SS:SP = DS:SI.
  • Replace RETF 2 by proper stack frame manipulation involving BP and LAHF to implement Leave2F. This is a pet peeve of mine.
  • Delete some notes about functions used at DOSINIT time. In our NEARDOSINIT we no longer create the initial process overlapping DOS data so we do not need to call 21.55's code early. Only SETMEM is used before a process is set up, and it is a much leaner target.
  • Prepare 25 more files for DOSCODE split. This is the continuation of the earlier check for section, CODE, and CS.
  • exec.nas: The transfer mmacros had to be disabled to allow building exec.nas (DOSDATACODE) before proc.nas (CODE) was converted. Undo this change now that proc.nas has been converted.
  • Put divide overflow message into DOSCODE. It belongs there since it is not subject to change at runtime.
  • In cpmio.nas optimise an access to the "FETCHI tag". I don't rightly understand the purpose of this but we do use it for now. A likely improvement not yet included is to hang properly rather than call DOSINIT (leading to an uncontrolled crash).
  • Handle DOSGROUP references. This was done using nano *.nas again, searching only for DOSGROUP this time. The LEAVEDOS entrypoint in DOSDATA does a transfer to doscode_leavedos to preserve the prior protocol. (The interrupt list calls this the "offset in DOS CS of code to return from INT 21 call". Unclear what MS-DOS v5 does.)

lDOS entry

  • Adapt lDOS entry.asm. _DOSCODEHMA disabled for now, _RELOCATEDOSCODE always enabled. _INT28 disabled.
    • To aid in debugging, the entry stubs get a retf instruction directly behind the call. The relocating code returns to this retf with the stack set up to branch into DOSCODE's entrypoint.
    • Because of the retf inserted, the breakpoint detection is not needed any longer. It is dropped.
    • i00, i06, i19, i29, i31, casemap relocation entrypoints disabled for now. i13 and i1B were already disabled. i28 disabled using a define.
    • IRT data commented out for now.
    • Fixed segment 70h replaced by BIOCODE. Same thing after the kernel image is relocated.
    • Use of a DOSENTRY variable pointing to DOSDATA replaced by using only RI31S.
    • Preparations to allow accessing HMA DOSCODE as either segment FFFFh or FFFEh.
    • Stack juggling to implement the debugging help retf
    • Device entrypoint still included in DOSCODE entry, albeit unused
    • i20 entry in DOSENTRY directly chains to i21, QUIT entrypoint in disp.nas commented out
    • msinit.nas sets up the interrupt handlers (and CALL 5) to point to DOSENTRY
    • Add the relocated mmacro to nasm.mac, with part commented out and a global directive added. (Unlike the old lRxDOS sources it is expected to specify the label as a parameter without a colon.)
    • Add okcallentry and badcallentry (sharer hooks default handlers)
    • Replace NEARDOSINIT code to init fastentry, ifsentry, and sharer entries by hardcoding the entries in their appropriate fields
  • In msinit.nas init end of DOSDATA segment before calling DOSCODE first
  • Fix an annoyance, in part: IFSFUNC's IFSFLINK.ASM had to be updated to add all links used by msinit.nas. The purpose is to recreate the DOS data layout for use by the IFSFUNC module. The subsequent final DOSCODE split change caused fixup overflows after naively adding doscode_start and afterdoslabel to the link file. So IFSFUNC now gets its own variant of msdata.obj called msdatai.obj, which doesn't assemble any of the NEARDOSINIT code. This is incomplete and less ambitious than the corresponding changes to the sharer, but suffices for now.
  • In dosmac.mac remove a clause reading wrt DOSGROUP from short_addr mmacro. This was the last apparent roadblock to the DOSCODE split. (I tried to run the updated kernel and the int 21h dispatcher got the wrong addresses because of this wrt clause.)

Final DOSCODE split

Today's last changeset finally splits DOSCODE from DOSDATA. Most of the changes to 59 files was done using a perl scriptlet. Its purpose is to change all instances of DOSDATACODE or DOSDATATABLE to the respective DOSCODE sections, except where a line is marked with an "in DOSDATA" comment. The scriptlet reads:

perl -i -pe '
  if (not /in DOSDATA/) {
    s/DOSDATA(TABLE|CODE)/DOSCODE$1/g;
  }' src/DOS/*.nas src/INC/*.nas src/BIOS/*.nas src/BIOS/*.asm;
hg revert -C src/INC/dosseg.nas

A small manual change in entry.asm is to use the DOSCODETABLE section for the initial entry. This is relevant because of the doscode_start label which we want to be at offset 0 within the DOSCODE segment. (For HMA use later we will also want to accurately place the CALL 5 entry at 10_00C0h.)

msinit.nas places DOSCODE to the for now final position before its first call to the DOS (either using transfers or int 21h).

The future

I want to place DOSCODE and later DOSDATA in temporary locations in the LMA initially, so as to allow relocating them to the UMA (either) or the HMA (DOSCODE only) later without creating memory fragmentation. Of course, if no HMA nor UMA is available then the final relocation would be to a low part of the LMA.

I want to add lDOS's memory handling to allow operating on the UMA, along with handling for DEVICEHIGH=, SHELLHIGH=, and other data UMA or HMA relocations.

Further, the integration of msbio (BIOCODE aka DOSENTRY) and msdos (DOSDATA / DOSCODE) is still poor. Moving parts of msbio into DOSCODE or DOSDATA will shrink the parts that have to remain in DOSENTRY.

Another missing bit is the Interrupt Restoration Table of at least 5 entries at 70h:100h.

At this point I will likely change the reported DOS version to some v5.xx value as feature parity with MS-DOS v5.00 will have been reached. (v5.00 and v5.50 have been used by official Microsoft releases (MS-DOS and the NTVDM DOS, respectively) so I will likely avoid these specific numbers.) As part of this change, 21.3306 "get true version number" should be supported. Albeit of course the "true" version number is some sort of fiction by now.

You could leave a comment if you were logged in.
blog/pushbx/2025/0119_ms-dos_simpler_buffers_and_doscode_split.txt · Last modified: 2025-01-19 19:20:01 +0100 Jan Sun by ecm