Following the bug report on the forum, I changed the PM int 21h handler of lDebugX to first switch stacks and then transfer control to the code section, as the transfer function as yet requires a 16-bit stack (either ss B bit or esph must be zero).
This involved moving some patch sites into the entry section. This is not a problem any longer as the 386-related patches now store separate tables for the entry section. However, it was a problem before this change which is why the original int 21h entry in the entry section used all manual patches not using the tables.
To test that the entry section patches work, I rebuild lDebugX with _CATCHPMINT214C=0. It required some changes to build. For instance, with _DEBUG=0 the excsave
table wasn't included in the build. To avoid using this table I tried to build using _DEVICE=0 _TSR=0. This failed due to a recent change to inicomp involving the lCFG block, which didn't allow building with _IMAGE_EXE=1 _DEVICE=0.
After clearing these hurdles, tracing into Protected Mode led to a dosemu2 process killing crash upon terminating the DPMI client. Using dosemu2's dosdebug was not particularly helpful. bpintd 21 4C00
did skip all the tedious tracing required otherwise, but ti
immediately led to the crash rather than tracing into the PM int 21h handler or the subsequent 86 Mode execution of the Parent Return Address.
Aside: dosemu2's dosdebug doesn't come with a P command and its breakpoints don't seem to work particularly well. I also had to manually remove stale files to make it find its dosemu process. This involved deleting either ~/.dosemu/run/*
(older dosemu2) or /var/run/user/1001/dosemu2/dosemu.dbg*
(newer dosemu2) before starting the dosemu2 process to attach to.
Eventually, I tried out patching the debugger built with _CATCHPMINT214C to nop out parts of its PM int 21h handler. This led to some progress. Patching away the call to pm_reset_handlers
did not lead to a crash. Patching away the entire call to pmint21_4C_code
(a near call plus its word data), or patching the dispatcher so it never executes its own handler (conditional jne
to unconditional jmp
), led to the crash. Nopping out parts of pmint21_4C_code
revealed that clearing the canswitchmode
flag caused the machine to not crash.
This led me to investigate all uses of this flag. I found out in run.asm what was causing the crash: reset_interrupts (_DEBUG only) and handle_mode_changed both will try to switch back into Protected Mode if the debugger is re-entered in 86 Mode and the canswitchmode
flag is set. The reset function is to swap out the lDDebugX / lCDebugX PM handlers for the outer debugger's. The mode changed function takes care of, in this case, converting stored selectors to corresponding segments if they have a 86 Mode compatible base. (There was an unrelated bug if a selector did not match a segment base, introduced much earlier in 2021.)
Manually clearing this flag in Protected Mode by running the commands s dentrysel:0 l 2000 as dwords dif
then r dword [srs:sro] clr= 40000
also avoids the crash. Patching the _DEBUG=0 lDebugX to assume this flag is not set in handle_mode_changed also works.
I also tried to move the call to handle_mode_changed to after the call to reset_interrupts in lDDebugX so as to allow tracing handle_mode_changed and observing the flag being set, to allow a workaround at the exact position of the check. However, this failed because reset_interrupts also uses the canswitchmode flag and (naturally) cannot be traced by lDebugX. (If lDDebugX would first reset its 86 Mode handlers and then the Protected Mode handlers then the latter could be traced after the former has already run. But this is not currently how it is done.) Further, I undid this move because I am unsure whether calling handle_mode_changed only after reset_interrupts would work correctly, or whether the latter requires some of the housekeeping done by handle_mode_changed. So they should stay in the original order.
In conclusion, this is another reason that the _CATCHPMINT214C build option must be set. There is already an additional option, _OVERRIDE_BUILD_PM_DEBUG, which must be used to enable building with _PM=1 _DEBUG=1 _CATCHPMINT214C=0 lest an %error is emitted. I added a similar error to the build if the handle_mode_changed function in its current state is used along _CATCHPMINT214C=0. I added another error if the exception handler uses the entry_to_code_sel
transfer function which assumes a 16-bit stack.