User Tools

Site Tools


blog:pushbx:2022:0706_ldebug_symbolic_and_segment_splits

lDebug, symbolic, and segment splits

The Return of the Symbolic Branch!

Recent comments by Bret Johnson caused me to revisit the symbolic branch of lDebug. These were one and two, referencing my earlier reply to this thread on DOS assembly language resources.

In particular, the second message ends in:

I also sometimes use ECM's lDebug, but that is mostly when I'm trying to figure out what someones else's program is doing and don't have the source code or some other way to look at the symbols/names to help try and figure out what's going on. lDebug is much better/easier than D86/D386 for certain kinds of debugging/research.

So, the natural reaction is to consider: what is lDebug not (yet!) better and easier for than the competition? (Mind, I didn't actually ask Bret about this yet.)

One feature still sorely lacking in lDebug is to tie the source, where available, to what the debugger "knows". Some part of that can be bridged by using TracList, the trace listing companion application to lDebug. But this is still lacking.

In particular, all the communication is one way only, TracList listening to what the debugger does on its control I/O terminal. You cannot tell the debugger to, say, put a breakpoint on some particular function. Not, that is, without looking up the symbol and copying its address into the debugger's input manually. (Which TracList does make easier, just not fully so.)

Recent attempts

Having hacked parts of the symbolic branch into "run.asm: do not proceed past near imm call if its callee is retf" on 2022-05-02, I decided this Monday (2022-07-04) to first merge the symbolic branch's head of 2021-08-08 with the parent changeset of the "do not proceed past …" revision.

I started to work on this undertaking on the desktop system (also running a Debian Linux system as one does), where I can quickly handle most of the merge headaches using the graphical interface of Diffuse, instead of the terrible vimdiff. (I keep needing to look up its commands, and they're definitely more typing even knowing them.) The 3 screens, each 17 inches diagonally and one in vertical orientation, are also of use.

Testing the merge result as lDDebug I quickly ran into not one but two yet unknown bugs: expr.asm's issymbol? does not promise es = ss to isstring? for non-access-slice 86M-memory symbol tables, and when allocating the full buffer to the XMS-memory symbol tables using z /s=max then adding a symbol like z add symbol=qux offset=100 then the zz commit insert fails, a nasty bug making the application unusable as each non-insert command is canceled as it tries to commit the symbols.

I want to add a z abort command to get out of such a vicious state and reset the commit facility, but to do that I will have to reverse-engineer enough of the commit mechanism in the first place.

At least I am fairly certain both bugs are retained from the last work we did on the symbolic branch, not introduced by the merge attempt so far.

The code segment's 64 KiB limit

Assembling this symbolic branch candidate merge as of today, to the lDDebug build without DPMI support, fills the code segment to beyond F800h bytes (62 KiB). It is likely that for lDDebugX it won't fit into 64 KiB any longer. So we're looking at a future code segment split, tentatively the lDEBUG_CODE and lDEBUG_SYMCODE segments. We're considering to use far calls, relocating them to the final layout manually. (Much the same way that MZ .EXE relocations work, but under our own direction to enable dynamic choice of layout, allow inicomp compression, and also still support the boot entrypoint.)

We're probably looking at some NASM macros which expand to either call far 0:offset and a relocation entry, or push cs then call near offset, depending on which segment they're emitting into. And of course making all affected functions return far, plus specifying uses of lframe as far too. I have considered adding special relocating functions (thunks if you will) but this is likely more complicated for very little gain.

This is probably going to be a similar, though smaller, undertaking as the "data code split", which we did in 2019 as the combined entry/data/code/stack segment started exceeding 64 KiB. That was started on 2019-05-20 and ended up being merged on 2019-06-09.

Symbolic branch future

The first and foremost thing to note on it: There likely won't be any. That is, no more symbolic branch in the hg repo. We're looking into completing the merge (with the 2022 May revisions, then with the most recent developments as of 2022 July) and then gradually add conditional assembly to the branch that will allow disabling all symbolic features with a build option. Then the branch can be closed and its updates merged into the default branch.

Other than that I'd like to see some documentation and to finish use cases for at least simple symbol support, that is with just static symbols sans relocation schemes. That won't suffice to debug the debugger itself, or my resident programs, or one of the free kernels, but it will be a step forwards at this point. And that's what counts.

Discussion

C. MaslochC. Masloch, 2022-07-06 23:47:42 +0200 Jul Wed

After several merges the symbolic branch has been merged with the current default branch. Bugs not yet changed. Did add a check for the code segment size though, as NASM happily assembles past the segment limit with only warnings on overflows in call and jump instructions.

As expected, lDDebugX fails to build. lDDebug now just so stays below 64 KiB, though reaches up to above 65 kB.

C. MaslochC. Masloch, 2022-07-08 20:38:21 +0200 Jul Fri

Actually we will have to use thunks, at least for DebugX. That's because far calls encode an immediate segment, but in Protected Mode we need to switch to using a selector instead. So it's either thunks or patching all relocations every time we switch modes.

C. MaslochC. Masloch, 2022-08-26 01:51:53 +0200 Aug Fri

Dual code segment support now implemented, also refer to "Recent segmentation in lDebug and not yet merged NASM patches" and "Dual code segments mechanisms". Did go with lDEBUG_CODE2 in the end, as perhaps we may use the second code segment for non-symbolic code too.

You could leave a comment if you were logged in.
blog/pushbx/2022/0706_ldebug_symbolic_and_segment_splits.txt · Last modified: 2022-07-07 19:15:16 +0200 Jul Thu by ecm