User Tools

Site Tools


blog:pushbx:2023:0828_late_august_work

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

blog:pushbx:2023:0828_late_august_work [2023-08-28 21:25:47 +0200 Aug Mon] (current)
ecm created
Line 1: Line 1:
 +====== Late August work ======
 +
 +**2023-08-27**
 +
 +This week some development happened. I also finished an audit of all lDebug changes in recent months and did [[https://hg.pushbx.org/ecm/ldebug/rev/580633680725|a pass over the entire lDebug manual]] to update the worst outdated parts. Finally, I prepared [[https://sourceforge.net/p/freedos/mailman/freedos-user/thread/9b70d387-bac2-d223-00d0-d78628bde1dd%40ulukai.org/#msg37887910|the lDebug release 6]] yesterday.
 +
 +
 +===== MSDebug changes =====
 +
 +The range parameter type's manual entry [[https://hg.pushbx.org/ecm/msdebug/rev/6cc7a04368fc|was updated]] to specify that the default length is clamped to the end of a segment if the start is close to that.
 +
 +
 +===== FreeDOS kernel =====
 +
 +While trying [[https://hg.pushbx.org/ecm/edrdos/|to build EDR-DOS]] using JWasm, WarpLink, x2b2, OpenWatcom 1.9, as well as the tools shipped with the OpenDOS release, I encountered a bug in WarpLink. When invoked from the mak.bat or command/make.bat scripts, WarpLink reported not finding a file despite it existing.
 +
 +First I attempted to debug the problem by running ''ldebug /p warplink @resp2'' but that made the problem disappear. Next I tried ''intercep c:\bin\warplink.exe @resp2'' but that also made the problem disappear. Likewise ''intercep c:\command.com warplink @resp2''.
 +
 +I finally ran lDebug (with ''lh'', but this only puts the environment into the UMA), changed the allocation strategy to last fit (int 21h function 5801h with ''bx'' equal to 2), then ran an N command to load lCDebugU, then L, then G. In the lCDebug application I entered a TSR command and then G. Next, I instructed the first debugger (lDebug) to quit itself. This left the second debugger resident near the top of the Low Memory Area (using 116 KiB), though with a memory gap behind it (of 82 KiB for the compressed lCDebug, only 46 KiB for the uncompressed lCDebugU).
 +
 +This finally allowed the linker to exhibit the problem while we had a debugger resident. However, the problem would take some more time to debug. I first re-entered the debugger with a small 6-byte utility called ''int3.com'' and ran these commands:
 +
 +<code>install indos
 +uninstall debug
 +bp new ptr ri21p when ah == 3D</code>
 +
 +This allowed us to Go again and have the applications break into the debugger whenever they opened a file using int 21h service 3Dh. After this, I ran WarpLink again and continued running it until the offending open. Sure enough, DOS returned an error opening this file.
 +
 +First I checked the buffer with the pathname passed to DOS. It contained the expected name, ''.\BIN\CMDLIST.OBJ''. Next I assembled a little helper on the stack of the application, to get the current directory, like so:
 +
 +<code>r sp -= 80
 +a ss:sp
 + push ax
 + push dx
 + push ds
 + push si
 + mov ah, 47
 + mov dl, 0
 + push ss
 + pop ds
 + mov si, (sp + 30)
 + int 21
 + pop si
 + pop ds
 + pop dx
 + pop ax
 + jmp (cs):(ip)
 + .
 +r csip sssp</code>
 +
 +Tracing this yielded no result, the cwd was ''COMMAND'' as expected. The open still failed the same way afterwards.
 +
 +The next order was to trace into the DOS. Just using the T command was not sufficient, as it seemed the dosemu2 handler would loop somehow. So I used a ''di 21'' command to find the DOS's entrypoint. I used a G command with a temporary breakpoint to trace into this handler. (I discovered during this part that the DOS code segment was in the Low Memory Area, which I did not fix immediately because the exact configuration was needed to reproduce the bug. [[https://pushbx.org/ecm/dokuwiki/blog/pushbx/2023/0706_how_to_set_up_dosemu2_freedos_and_ldebug#comment_18b35efe30ff13d02179d9430300503e|The fix was]] to add a ''dos=high'' directive to the FDCONFIG.SYS file.)
 +
 +I eventually traced into the DOS dispatcher, the DosOpen function, the DosOpenSft function, and the truename function. Finally I found that the truename function was underflowing the input buffer to check for a second dot before the dot that was the first text in the buffer. And sure enough, one byte before the pathname buffer, when the bug happened there was a dot in that spot in memory.
 +
 +I recalled that this bug may have already been fixed in fdpp and quickly found [[https://github.com/dosemu2/fdpp/commit/fe1c4dc7fe5a2218d3badf90c3a8e43550da5821|the relevant patch]] and [[https://github.com/dosemu2/fdpp/issues/212|issue]], by comparing the ''kernel/newstuff.c'' file to FreeDOS's and using github blame to find the commit. It happened to be an issue (on github) in which I had commented, actually. The comments weren't related to that bug, rather, asking about financially contributing to dosemu2. However, I certainly must have scanned the patch back then.
 +
 +I have since [[https://github.com/FDOS/kernel/pull/113|adapted the patch]] to the FreeDOS kernel, with credit to stsp. It was merged last night.
 +
 +The changed kernel happened to make the kernel no longer hit the bug case, though I tested the specific call with the buffer that exhibited the bug before and it is indeed fixed.
 +
 +The other work on the FreeDOS kernel involves [[https://github.com/FDOS/kernel/issues/112|four FCB find bugs]] that I found:
 +
 +  * FreeDOS defaults to search for any directory entry (all attributes except volume label) when FCB find first is used without an extended search FCB. EDR-DOS defaults to a zero attribute.
 +  * FreeDOS would truncate the current directory cluster written to and read from a search FCB to 16 bits, which would lose a high word of a 32-bit cluster number on a FAT32 FS.
 +  * The second bug was masked by the third, however. That was the fact that the kernel always retained its internal search DTA for FCB Find Next, not reloading from the search FCB. The logic to do just that was reversed, running for FCB Find First (in which case it was useless) but not for FCB Find Next. This disabled concurrent searches from ever working.
 +  * The logic to update the search DTA from the search FCB was local-drive-specific. A proper solution has to copy nearly all of the reserved fields of the DTA in order to support redirectors, who may use different fields in the DTA than DOS.
 +
 +
 +I did not prepare patches for these problems yet, but reported them to the FreeDOS kernel repo's issue tracker.
 +
 +
 +===== WarpLink =====
 +
 +Trying to run the EDR-DOS build utilising WarpLink on the local machine, where dosemu2 can use KVM V86M and KVM PM for running DOS software, I actually ran into a different WarpLink problem. It hung. Running ''ldebug /p warplink @wlbios.rsp'' I encountered the same hang. Next, I ran dosemu2 with a serial port connected to a local terminal application. Then in lDebug I ran the command ''install serial, timer''. Upon reaching the hang I sent two Control-C keypresses to the serial input of the debugger. It immediately disassembled a most suspect instruction: ''mov ax, [FFFF]''. Of course that would hang! But only on KVM, as dosemu2's software emulation of the CPU does not respect segment limits. Rerunning the debugger with ''install intfaults'' also caught the fault immediately, as expected.
 +
 +Now, what was the cause of that? It turned out my memory access fix script that I used to port WarpLink to NASM didn't pick up negative numbers as being numeric properly. Thus ''mov ax, -1'' was converted wrongly to ''mov ax, [-1]''. I am lucky to have found this bug, actually.
 +
 +There was an entire class of related bugs. The cases other than [[https://hg.pushbx.org/ecm/warplink/rev/117ba491f355|simple negative numbers]] involved either [[https://hg.pushbx.org/ecm/warplink/rev/320471f70e07|an equate in some numeric expression]], or [[https://hg.pushbx.org/ecm/warplink/rev/9819ef1b664e|a numeric expression consisting of only numbers]]. All of these were added to the memory access fix script, though its support is not perfect. (For example, an expression involving two equates would not be recognised. Equates that are actually references to memory labels would also not be handled correctly.)
 +
 +Another fix involves a numeric expression involving an OFFSET keyword, but with the keyword not appearing at the very beginning of the operand. This was fixed by [[https://hg.pushbx.org/ecm/warplink/rev/f7eeaadaee3c|detecting an offset keyword]] with the regexp ''/\boffset\b/i'' where before there was an ''^'' anchor at the beginning.
 +
 +Something else was learning to construct a scriptlet which copies the original files and applies the ''fixmem.pl'' script to all ported files all over again, then re-applies several patches that did some manual fixes. In one case, a patch had to be applied before running ''fixmem.pl''. With the scriptlet completed, it was very easy to rerun the entire port and then find the differences from the prior revision using a ''hg d'' or ''hg d | diffrr'' command. To avoid having to re-invent such scriptlets, I [[https://hg.pushbx.org/ecm/warplink/rev/328ec73d4eb5|noted them down in the commit messages]] of the changes to the repo.
 +
 +Finally, I [[https://hg.pushbx.org/ecm/warplink/rev/4caf2487d9a5|shortened several jumps]] to largely identicalise the NASM build output to the TASM build.
 +
 +I did this by running the NASM build (using the host NASM with ''./mak.sh'', not the DOS NASM) then running ''bdiff wl.exe wltasm.exe''. Often, the bdiff result would not be very useful as is, as it stops when encountering too many different bytes. In this case, say, if it ended on finding 426 different bytes, rerun as ''bdiff -426 wl.exe wltasm.exe''. (It is important to put the number switch before the filename parameters.) Then scroll to the offset at which it previously stopped.
 +
 +Next, run ''ldebug /f wl.exe''. (Instead of re-running the debugger, I simply reloaded the changed file using a subsequent L command.) If the change was on, for example, offset 1138, disassemble the nearby bytes using a command like ''u 1138 + F0''. The F0h displacement is 100h for the offset in the PSP segment minus 10h to start disassembly a few instructions earlier. (Sometimes this requires retrying with a few different offsets to get the disassembly to synchronise properly.)
 +
 +Next, the hardest part: Identify a particularly unique instruction. It is best if it involves only registers, as it can be searched for in the sources best then. Search for it in the *.nas files using grep, specify to include some context lines, pipe the output to less. Then search for the particular spot in the sources.
 +
 +All cases of wildly different bytes in the NASM build, after fixing all the bugs (some of which I found while identicalising), were jumps specified with ''NEAR PTR'' but actually optimised to short jumps by TASM. However, letting NASM optimise all jumps by simply dropping all the ''NEAR PTR'' uses also wasn't correct. Perhaps it depends on whether the jump is backwards or forwards.
 +
 +After fixing all the jumps, the only remaining differences are in the data segment. It appears that the alignment bytes differ, and that the TASM build creates a slightly longer load image. I assume this is due to it emitting some data that is nobits for the NASM build. There are some warnings related to this in the TASM build.
 +
 +
 +===== lDebug =====
 +
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/e735999c4d73|/PE switch and /PS switch]] for only filename extension guessing and only path search.
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/5f854bb9ab7a|/PW- switch]] to disable unknown filename extension warning.
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/95a84ea8bce5|Retire the additional nohook2F DIF]]. This allows the user to retry the DPMI hook by setting the DCO4 flag again.
 +  * In manual section on range parameter, [[https://hg.pushbx.org/ecm/ldebug/rev/cea851d6eb9a|list clamping default length]] at segment end. (Same change as in MSDebug.)
 +  * Loop over [[https://hg.pushbx.org/ecm/ldebug/rev/de4c5618fc2d|a table of length keywords]].
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/a6fb8ccd369c|Add length keywords]] for pages (512 bytes), KiB, MiB, and GiB.
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/9fa49a2e4240|Add DCO flags and INSTALL nouns]] for PAGINGRC and PAGINGSCRIPT. Also PAGINGRE, though enabling that is only useful when the run commands also enable paging.
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/7dcfe2851576|Add variable DDTEXTAND]]. Default is 0FFh. Setting it to 7Fh emulates MSDebug dumping top half text by stripping the most-significant bit.
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/7dcfe2851576|Add INSTALL nouns]] RX and TM.
 +  * Fix TM command [[https://hg.pushbx.org/ecm/ldebug/rev/597c6b2ed217|to accept any nonzero expression]] to enable tracing into interrupts. The manual already specified that this should work.
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/b67f0e406556|Do not parse]] ''T M...'' as a TM command, only ''TM...'' without a blank.
 +  * [[https://hg.pushbx.org/ecm/ldebug/rev/13377e096e44|Fix an lDebugX bug when in PM]], searching in a segment with 64 KiB limit, and the search pattern matches at the very end of the segment.
 +  * [[https://hg.pushbx.org/ecm/ldebug/shortlog/release6|Release maintenance]].
 +
 +
 +{{tag>msdebug freedos warplink ldebug}}
 +
 +
 +~~DISCUSSION~~
  
blog/pushbx/2023/0828_late_august_work.txt ยท Last modified: 2023-08-28 21:25:47 +0200 Aug Mon by ecm