2023-01-08
Some more work went into lDOS boot to insure that it will not accidentally leave the value "CL" in the command line signature field (at word [ss:bp - 14h]
). If the "attrib save" is not used and the loader relocates itself, then the "last available sector" segment pointer could have ended up containing the unwanted signature value. To avoid this, the assembly inserts one or two push bx
after the directory search, which will push the counter of directory entries per sector into the field. (This value is a power of two between 1 and 256.)
If the loader does not relocate itself and loads to below itself then the "last available sector" segment pointer will always be below 7C0h.
If the loader uses the attrib save then the first or second word of the directory search stack will end up in the signature field. This is chosen so that the count of remaining directory entries in the current root directory sector ends up in the signature field, which is always below-or-equal 256.
Yesterday I unearthed the old immasm branch of lDebug.
First I had to fight some with Mercurial (hg) to let me recreate the branch's modifications to existing files. Using the default merge tool setup (including the dreaded vimdiff) on the server wound up doing all sorts of undesired changes to the files. So I used hg revert
to restore these files to their default branch contents. Then I recreated the changes almost exactly, working from the diff of the latest immasm
branch merge commit to its default branch parent. Then I used hg resolve --mark --all
to tell Mercurial about the merge status. That allowed me to commit the merge.
Next, I had to fix the immasm.asm
source file in several ways. I updated the usage conditions header, and changed the debugger name from "NDebug" to "lDebug". I added the sectioning directives to support the data-code split. I also changed things to use the auxiliary buffer in a separate segment.
Finally, I fixed a bug that may have been present all the way back then already: The immasm
entrypoint wasn't *called* by anyone so returning from it was invalid. The return had to be replaced by a jump back to cmd3
, the command loop entrypoint.
After all of this work I wrapped all of the branch's changes in conditional assembly constructs and added the build option _IMMASM
, which defaults to off. Then I merged the branch into the default branch. Like the symbolic branch before it, this spelled the end of the immasm branch.
However, there are several problems remaining:
There is no code selector allocated for the auxbuff yet, so immasm does not work in lDebugX as yet. Using the scratch selector is not valid as other code expects it to be an R/W data selector in any case.
Using the auxbuff is perhaps not the most desirable solution, allocating 32 or 16 bytes in the entry segment may be preferable. This would solve the selector problem because there is already an entry code selector.
If we go with a smaller allocation, we need to make sure that the assembler does not write past the end of the allocation. A possible solution would be to limit or prohibit the db
, dw
, and dd
directives. This was not a problem with the current (> 8 KiB) auxbuff because the largest assembler output can be achieved with dd
at 2 input bytes per 4 output bytes. As the input line is limited to 255 bytes, fewer than 512 bytes can be assembled this way.
If the assembler is restricted to not accept repeated prefixes of the same kind, and no define data directives, then in theory it should never emit a single instruction extending past 15 bytes. That is so because 15 bytes is the 286+ instruction length limit.
Finally, there is the remaining problem of handling special instructions. Those are all the ones that deal with cs
, ip
, or both. This includes all branch instructions, such as jmp
, jcc
, call
, retn
, retf
, iret
, loop*
. It also includes mov
and push
with cs
as source. And it includes all instructions with an applicable cs:
prefix. And in particular, far indirect jmp
and call
can have both a cs:
prefix as well as involving cs
in another way. This is true of mov
from cs
into a memory operand, as well.
It is still to be determined if all these special instructions should be handled by the assembler, or by calling into the disassembler silently, or by T/P/taken
style manual disassembly.
The disassembler is attractive because it already handles things such as determining the "jumping" notice for conditional branches, or calculating the referenced memory location for indirect branches.
The assembler likely needs to be modified for conditional branches in any case because they only stretch the short distance (128 bytes) on machines below the 386. Perhaps the addition of the 5-byte workaround would help this, same as NASM does it.
There is also the question of how to handle repeated string instructions, though instead of tracing them the solution may simply be to run them until the breakpoint afterwards is run. That cannot be the general case for all instructions, however, as branches may need to be handled differently. Though perhaps it could be the general case if all branches are handled in another way already.
The handling of cs
segment override prefixes could probably be done by setting up either ds
or es
to hold the original cs
segment or selector, whichever isn't used by the instruction otherwise.
That means ds
for lods
, movs
, and cmps
, as well as outs
. (The other string instructions stos
, scas
, and ins
never use an override if given.)
A mov
can use a ds
override instead of cs
except if the source or destination is itself ds
. (What about cs
as both override and source?)
lds
can use an es
override instead of cs
.
Any other override use (eg push
, pop
, arithmetic instructions) can trivally use ds
overrides instead of cs
. Note that an explicit ds
prefix should be used in case the addressing is with a bp
, ebp
, or esp
base register.
Branches need to be handled specifically so as to set up the proper call frames and jump targets. Should calls and software interrupts be traced or proceeded past?