User Tools

Site Tools


blog:pushbx:2023:0402_tsr_updates_kernel_boot_test_more_v20_bug_tests

TSR updates, kernel boot test, more V20 bug tests

2023-04-02

This last week, I worked some on the updates to the TSR example, slightly extended the inst2d2f tool and set it up for current builds, did a few minor updates to lDebug, and I finally merged the ecm-boot-test branch for the FreeDOS kernel. I also checked whether the long-form pop on the 95LX can be traced or proceeded past without a crash, testing all 7 GPRs other than sp.

TSR example updates

For the most part, these updates are about changing the parameter parsing and adding the new /J switch handling. The new /J switch allows to specify individual letters to "jettison" different features. I suggested the options so far in an earlier blog post, except that I renamed T to B and did not add equivalents to /N or /O yet. Once finalised this change can be picked up for all the TSRs that are largely sharing the example's transient code base.

(I also posted about the new /J switch on the forum, to help decide whether /Jx (to disable a feature) should be equivalent to /J+x or to /J-x.)

A related change is that the "system check" now occurs only in the install handler. This is a call to interrupt 21h function 4Dh with CY to ensure the DOS returns NC. Before, this was done as the very first code in the application. That would not allow to disable this check with a /J flag, however. So it moved into the installer, before the process relocation and resident installation, and it was insured that nothing other than the installer requires an MS-DOS v2 compatible system.

The check also had an error message added instead of silently terminating. Adding a way to disable this check at run time, without patching the program, is a nod to systems that may fail this particular check but be compatible enough regardless. It appears that the built-in DOS layer of DOSBox and older versions of DOSBox-X may be among those.

Part of the MS-DOS v1 compatibility is to check that function 52h to get the "List of Lists" actually modifies bx, which is now preinitialised to 0FFFFh before the call. We don't strictly need this to be supported as we only want it to learn the (informational) start of the UMCB chain of DOS UMBs.

Another change is to check that the int 2Fh and int 2Dh handlers are valid before we call them. This check occurs early in the application, before figuring out the first UMCB or parsing the command line, but after saving the memory allocation strategy and UMB link state and setting up the Control-C and Critical Error handlers. That means it cannot be disabled by a /J flag. It allows the transient and resident both to rely on being able to call these interrupt handlers without further checks. The error message suggests to use inst2d2f, which may work to make the system compatible enough to run.

inst2d2f changes

The changes to the program itself are just to fail gracefully on MS-DOS v1. Additionally, a mak.sh script was added and the wwwecm.scr scripts were changed to add the current build of this program to the /ecm/download/ directory.

lDebug changes

The minor changes were to increase alignment of a few variables, and to drop an unused variable. The alignment, especially for the csrpos variable, is to avoid the remote possibility of an xchg instruction with a memory operand causing a "split lock" fault. It was recently found that dosemu2's alarm signal can hang the application if this occurs on a Linux kernel trying to "rate limit" the dosemu2 process.

The unused variable was an artefact of the symbolic branch's setup of the XMS symbol tables. It had been unused for a while already but was forgotten until now.

FreeDOS kernel boot tests

Back in 2022 May I first suggested the boot test for the kernel. In its first iteration now merged, this builds a 1440 KiB diskette image with the kernel and a small test application. The image is created using the bootimg assembler script, along with an ldosboot boot sector loader built with the FreeDOS load protocol. It is executed using qemu. The result file is read using an mtools program. Both the kernel built by gcc ia16 and the one built by OpenWatcom are tested.

Before merging I updated the lmacros, bootimg, and ldosboot revisions picked for the test. I also modified the test script to use the _BOOTPATCHFILE define rather than the older _BOOTFILE. This actually allows to use any file system parameters, and bootimg will pull out the code and data specific to the loader while retaining the BPB that it creates of its own. (Technically that means the ldosboot loader is not required and the test could use the kernel's own FAT12 loader.)

I recently was added to the FDOS developers team and was thus able to merge the Pull Request by my self. However, I wasn't entirely satisfied with the result as github added two fields, Commit and CommitDate, to all of my commits rather than using them as-is. Further, andrewbird suggested using the option that creates an explicit merge commit as that is more accessible in another github interface.

The NEC V20 bug test

I got the 1 MiB model of the HP 95LX, not containing any data of interest, and wrote a small test program that executes all 7 long-form pop ModR/M instructions in a row, excepting pop sp which would be difficult to use at best.

All of the 7 GPRs behaved similarly: Running a g abo command (placing a breakpoint after the nop behind the pop) or executing the instructions without any breakpoints worked just fine, modifying sp but not the target register. Whenever I placed a breakpoint right after one of the instructions, or ran one of them with the Trace Flag set, crashes occurred.

The crashes seemed to vary among the different registers: Some, like cx, hung the machine requiring a reset using Ctrl-Shift-On. (Ctrl-Alt-Del did not seem to be accepted.) Others returned control to the debugger, but with all registers some random values without any known pattern as to how they got that way. The control flow generally returned by way of an unexpected breakpoint interrupt, or a trace interrupt.

During this testing, whenever I had to reset the device using Ctrl-Shift-On, the machine asked whether to reset the RAM disk (drive C:). Each time I told it not to do that. However, eventually this prompt was preceded by a message stating that the RAM disk had been corrupted. So it does seem possible that resetting, or perhaps crashing, can harm the disk contents. (The debugger and test seemed to work just fine even after that message appeared, but obviously you shouldn't depend on it not eating your files.)

The oddest thing was that I also tested the exact sequence from the "V20 bug" file that introduced me to the bugged long-form pop instructions. In this sequence, debug (presumably Microsoft's Debug) was used to assemble the pop cx and then a nop instruction, and then the sequence was executed using a g =100 106 command – placing a breakpoint right behind the long-form pop cx. The author of that document did not comment on any crashes for this sequence, but in my test it crashed exactly the same as executing just the one instruction with a breakpoint behind it. Perhaps the crash is specific to HP 95LX devices? It is difficult to tell.

You could leave a comment if you were logged in.
blog/pushbx/2023/0402_tsr_updates_kernel_boot_test_more_v20_bug_tests.txt · Last modified: 2023-04-02 16:20:41 +0200 Apr Sun by ecm