====== lDOS boot CL signature mismatch, lDebug immasm branch ====== **2023-01-08** Some more work went into lDOS boot to insure that it will not accidentally leave the value "CL" in the command line signature field (at ''word [ss:bp - 14h]''). If the "attrib save" is not used and the loader relocates itself, then the "last available sector" segment pointer could have ended up containing the unwanted signature value. To avoid this, the assembly inserts one or two ''push bx'' after the directory search, which will push the counter of directory entries per sector into the field. (This value is a power of two between 1 and 256.) If the loader does not relocate itself and loads to below itself then the "last available sector" segment pointer will always be below 7C0h. If the loader uses the attrib save then the first or second word of the directory search stack will end up in the signature field. This is chosen so that the count of remaining directory entries in the current root directory sector ends up in the signature field, which is always below-or-equal 256. ===== lDebug immediate assembler ===== Yesterday I unearthed the old immasm branch of lDebug. First I had to fight some with Mercurial (hg) to let me recreate the branch's modifications to existing files. Using the default merge tool setup (including the dreaded vimdiff) on the server wound up doing all sorts of undesired changes to the files. So I used ''hg revert'' to restore these files to their default branch contents. Then I recreated the changes almost exactly, working from the diff of the latest ''immasm'' branch merge commit to its default branch parent. Then I used ''hg resolve -%%%%-mark -%%%%-all'' to tell Mercurial about the merge status. That allowed me to commit the merge. Next, I had to fix the ''immasm.asm'' source file in several ways. I updated the usage conditions header, and changed the debugger name from "NDebug" to "lDebug". I added the sectioning directives to support the data-code split. I also changed things to use the auxiliary buffer in a separate segment. Finally, I fixed a bug that may have been present all the way back then already: The ''immasm'' entrypoint wasn't *called* by anyone so returning from it was invalid. The return had to be replaced by a jump back to ''cmd3'', the command loop entrypoint. After all of this work I wrapped all of the branch's changes in conditional assembly constructs and added the build option ''_IMMASM'', which defaults to off. Then I merged the branch into the default branch. Like the symbolic branch before it, this spelled the end of the immasm branch. However, there are several problems remaining: There is no code selector allocated for the auxbuff yet, so immasm does not work in lDebugX as yet. Using the scratch selector is not valid as other code expects it to be an R/W data selector in any case. Using the auxbuff is perhaps not the most desirable solution, allocating 32 or 16 bytes in the entry segment may be preferable. This would solve the selector problem because there is already an entry code selector. If we go with a smaller allocation, we need to make sure that the assembler does not write past the end of the allocation. A possible solution would be to limit or prohibit the ''db'', ''dw'', and ''dd'' directives. This was not a problem with the current (> 8 KiB) auxbuff because the largest assembler output can be achieved with ''dd'' at 2 input bytes per 4 output bytes. As the input line is limited to 255 bytes, fewer than 512 bytes can be assembled this way. If the assembler is restricted to not accept repeated prefixes of the same kind, and no define data directives, then in theory it should never emit a single instruction extending past 15 bytes. That is so because 15 bytes is the 286+ instruction length limit. Finally, there is the remaining problem of handling special instructions. Those are all the ones that deal with ''cs'', ''ip'', or both. This includes all branch instructions, such as ''jmp'', ''jcc'', ''call'', ''retn'', ''retf'', ''iret'', ''loop*''. It also includes ''mov'' and ''push'' with ''cs'' as source. And it includes all instructions with an applicable ''cs:'' prefix. And in particular, far indirect ''jmp'' and ''call'' can have both a ''cs:'' prefix as well as involving ''cs'' in another way. This is true of ''mov'' from ''cs'' into a memory operand, as well. It is still to be determined if all these special instructions should be handled by the assembler, or by calling into the disassembler silently, or by T/P/''taken'' style manual disassembly. The disassembler is attractive because it already handles things such as determining the "jumping" notice for conditional branches, or calculating the referenced memory location for indirect branches. The assembler likely needs to be modified for conditional branches in any case because they only stretch the short distance (128 bytes) on machines below the 386. Perhaps the addition of the 5-byte workaround would help this, same as NASM does it. There is also the question of how to handle repeated string instructions, though instead of tracing them the solution may simply be to run them until the breakpoint afterwards is run. That cannot be the general case for all instructions, however, as branches may need to be handled differently. Though perhaps it could be the general case if all branches are handled in another way already. The handling of ''cs'' segment override prefixes could probably be done by setting up either ''ds'' or ''es'' to hold the original ''cs'' segment or selector, whichever isn't used by the instruction otherwise. That means ''ds'' for ''lods'', ''movs'', and ''cmps'', as well as ''outs''. (The other string instructions ''stos'', ''scas'', and ''ins'' never use an override if given.) A ''mov'' can use a ''ds'' override instead of ''cs'' except if the source or destination is itself ''ds''. (What about ''cs'' as both override and source?) ''lds'' can use an ''es'' override instead of ''cs''. Any other override use (eg ''push'', ''pop'', arithmetic instructions) can trivally use ''ds'' overrides instead of ''cs''. Note that an explicit ''ds'' prefix should be used in case the addressing is with a ''bp'', ''ebp'', or ''esp'' base register. Branches need to be handled specifically so as to set up the proper call frames and jump targets. Should calls and software interrupts be traced or proceeded past? {{tag>ldosboot ldebug}} ~~DISCUSSION~~