User Tools

Site Tools


blog:pushbx:2024:0812_early_august_work

Early August work

2024-08-11

EIDL

EIDL was a simple TSR that installed an int 28h handler which runs a hlt instruction, then chains to the prior handler. My adaptation of it combines the simple handler with the TSR example's multiplexer, as well as the optimal installation and advanced deinstallation method.

Because the example does not leave a PSP allocated, the updated EIDL uses less memory than the original. Despite this the program can detect that it is already installed (as requested in an issue), can be uninstalled if its handlers are reachable, and can be disabled or enabled without uninstalling.

I reported my adaptation of EIDL in another SvarDOS EDR-DOS issue:

I adapted Robert's EIDL to my tsr example. This needs 192 Bytes of resident memory, MCB included, and installs without creating fragmentation, using PSP relocation. It includes an AMIS-compliant multiplexer and thus will detect if it is already installed, and allows uninstalling a resident instance as well. It also allows to disable or re-enable a resident instance without uninstalling too. And it doesn't leak an SFT entry if you install it as eidl > NUL.

Repo is at https://hg.pushbx.org/ecm/eidl and current build should appear at https://pushbx.org/ecm/download/eidl.zip later today (in 6h as of writing).

Fix an annoying error which led the application to write a NUL byte and a lone CR in its load size display at the end of a run. Prior to this change I used to add an empty line after calling WarpLink to avoid overwriting the displayed size. The problem was caused by an off-by-one offset calculation in the loop that wrote the size.

Having added the first change I also changed the sign-on message to indicate ecm release 0 and the fact that the program is in the public domain.

TracList

Two small updates to the convlist.pl script:

The JWasm change was likely needed for Enhanced DR-DOS's fdos.asm.

kernwrap

lDebug

  • Add DAO flag to always display MODRM keyword. (Particularly for two-register instructions regardless of whether the order matches the line assembler's default.)
  • Fix: Do not show MODRM keyword for memory operand of an xchg ax, mem, as a memory operand must be encoded as the ModR/M operand. (The error was in assuming that any ModR/M operand paired with ax would need the keyword.)
  • Add a config.sys with switches=/n to the qemu image that boots a DOS to run the debugger as an application
  • Drop timestamps from the mak.sh output (same change as kernwrap)

Enhanced DR-DOS

The code to detect the boot partition using the hidden sectors passed by the boot loader was restored into bdosldr.asm via SvarDOS pick. It had been observed that this detection should run even for single-file load to change the default drive if not booting off what is detected as drive C:.

A subsequent change was to make this detection reach into the UPB to match the int 13h unit number to what was passed by the prior loader, too. I'd noticed during testing that with hda1 and hdb1 at the same hidden sectors, the kernel would detect hda1 as the boot drive even though it was actually loaded from hdb1. Any matching hidden sectors on two different units, where the wrong drive is found first, would hit this bug.

The change isn't particularly nice, but the BPB returned from a block device doesn't have a field for the int 13h unit so we had to reach into the surrounding UPB (UDSC) to get the unit. Commented some in a SvarDOS EDR-DOS issue.

DRDOS port to NASM

I've written about this port on the SvarDOS issue tracker for EDR-DOS.

The original update to the public repo was lacking a macro file, nasmorg.mac, that was required for header.nas. Thus the build failed. I updated the repo the day after.

DRBIO port to NASM

This port is ongoing as of today. Like the DRDOS port, it caused several updates to both the fixmem.pl script and ident86.

x2b2

To better integrate with the EDR-DOS build scripts x2b2 will now return with a nonzero return code on errors. This code is always 255 for now.

ident86

Many smaller changes. The major improvements are as follows:

  • Detect source file and line to patch for several types of hints
  • Add switch to specify build scriptlet (-b)
  • Dump detected source file part (-S)
  • Edit detected source part (-E)
  • Specify patterns to find source file from trace listing source (-p)
  • Run and display checksums (-P)
  • Emit timestamps into reports (-t and -T)
  • Repeat after an edit (-r, requires -e -E -b and uses the same checksum as -P)

A few more recent, smaller changes:

fixmem

These scripts and macros were forked into a repo of their own after living in the WarpLink and MSDebug repos. The main script is fixmem.pl which started out as fixing memory references so that they are valid for NASM.

Memory references

The basic task of fixing memory references involves detecting several types of operands:

  • Registers (all 8086 registers are recognised)
  • Register cl (this is special if it is the second operand of a shift or rotate)
  • Address already with brackets
  • Plain number
  • Equate mapped to a plain number
  • Variable to be used as an address
  • Variable with an offset keyword
  • Address marked by segment override but no brackets

Depending on which operand types are found, the script may have to add brackets and a size keyword to the instruction. Single-operand instructions are a little easier because they cannot operate on immediates. Two-operand instructions can operate on immediates (albeit technically only as source operands).

Memory operand sizes

A memory access's size may be implied by the size of a variable used in the access. An equate may also imply a size, specified as in foo equ word ptr 26h. Such an equate does not imply a memory access on its own, but it can give the size for an access that uses it. An equate may also include one or more registers for addressing, in which case it does imply a memory access.

labelsize defines

As NASM equates have no way to store a size or an address involving registers, fixmem.pl rewrites these into a use of the labelsize macro. In the resulting NASM source text the second operand to this macro, the size, is actually ignored. It is used during fixmem.pl operation however. The companion macro file nasm.mac has an mmacro for labelsize definitions. This mmacro uses %define to allow the indicated label to expand to the given definition. This is required for definition texts involving any registers.

There are three possible pitfalls in using NASM's %define in this way: First, it won't work if the labelsize's label is referenced before its definition. Second, it may result in surprising order of evaluation problems if a labelsize's definition involves operators like a plus but the resulting label is used in a subtraction, multiplication, or division. Finally, it is possible the labelsize define could interfere with other uses of the label.

Neither of these three problems has actually occurred in the source texts that I have worked on yet. (ETA: The use before definition did actually crop up during the DRBIO port. I fixed it manually in that case.)

String instructions and xlatb

Some instructions may have explicit memory operands which are used only to select a size and/or a segment override. The script translates these instructions to NASM's native format in which segment overrides are used as mnemonic prefixes (without a colon) and the access size is indicated by a trailing size letter of the string instruction.

In MASM-compatible source texts, the size letter form of a string instruction may be combined with an explicit operand so that the operand only specifies a segment. Both string instructions and xlatb may be given with a variable to access. Segment overrides may be implied by this variable and the current assume directives.

Segment overrides

The fixmem.pl script will not detect segment prefixes based on assume lines yet, including in non-string instructions. To fill this gap, ident86 comes with specific handling of missing (or wrong) segment overrides in its -E mode.

Two-byte text literals

The MASM-compatible source texts assume that a 16-bit immediate operand to an instruction that is composed of two bytes of text will store the first byte of the text in the upper half of the numeric 16-bit value. That means if such a value is stored to memory, or compared with data from memory, the text literals are swapped as opposed to writing the two text bytes as a string.

NASM changes the order of text literals so that they match text stored in memory. For instance, mov word [bx], ":\" would store the expected string in memory for NASM but would store the string "\:" in memory for MASM.

The fixmem.pl script tries to detect uses of such immediates and will swap them as appropriate. It will mark lines in which this occurred with a comment. If this comment is found in the input, or both bytes of the text literal are the same value, then no swapping takes place.

NASM port labels and equates

Port labels and port equates are emitted when fixmem.pl detects uses of a label or equate that do not match NASM's view of this value. The original case for this was capitalisation differences. Another, later case involved structure members accessed using the name of the structure followed by a dot and the name of a member of that structure.

Because of how NASM's structures work, all structure member names must carry unique (global) labels. This is a problem occasionally.

Unneeded segment prefixes and sizes

The les and lds instructions mustn't specify sizes in their memory operands for NASM. The lea instruction mustn't specify a size, and should be stripped of any segment overrides because MASM will ignore them whereas NASM will (uselessly) emit them into the output file.

Further, MASM will accept useless ds: and ss: segment overrides. The ds override is useless for addresses not involving bp. (Or for a32, addresses not involving ebp or esp as ModR/M or SIB bases. fixmem.pl has very little support for 386+ instructions as yet, so it only checks for bp.) In turn, the ss override is useless for addresses that do default to the stack segment (that is, they involve an a16 bp register).

The useless segment overrides may be used to disambiguate memory accesses. This is not needed for NASM syntax which always uses square brackets to indicate explicit memory operands.

Rerunnability

All changes done by fixmem.pl are intended to be safely parseable as input for another fixmem.pl run. This is one reason that the script doesn't strip offset keywords off immediate operands. The output is expected to be used with fixmem's NASM macros, which define offset as an smacro with empty definition. The ptr keyword is similarly preserved some of the time, albeit with less of a concrete reason.

Segments and groups

There is some support for converting segment and group directives. NASM requires a segment directive to start with the directive keyword (segment or section) followed by the segment name and attributes. In MASM, the segment name comes first. The specific attributes differ somewhat as well.

NASM will warn if an existing segment is used again with any attributes, even if the attributes match those used originally. The fixmem scripting doesn't eliminate these warnings yet.

NASM's -f obj format requires a group to be defined in a single group directive, whereas MASM-compatible assemblers allow multiple group directives for the same target group. The fixmem.pl script will try to replace such group directives by a single one. However, this requires multi-pass operation.

Multi-pass operation

The fixmem.pl script does not support multi-pass operation by itself. However, it does come with some support for two ways to simulate it. Either the same input file may be specified twice to fixmem.pl, or an input file may be copied into a temporary file and this copy may be specified before the regular input file. If the temporary file is used, it should be named with a filename ending in .tmp to make use of fixmem's support for this model.

Both ways should work to pick up variable definitions after their uses in the source text. The multi-pass operation for group directives works better with the temporary file way. Variables with unknown sizes also cause the error messages to be prepended with a Temporary: prefix when they occur in .tmp files.

Macro file support

To handle macro files needed to process an input file, the macro files should be specified as input files on the same command line, before the regular input file that is to be processed. Multiple input files that are to be assembled by one run of the assembler (ie, they are linked by include directives) can be specified together as well. Due to the rerunnability support, specifying the same macro file to multiple invocations of fixmem.pl should not result in further changes after the first invocation.

You could leave a comment if you were logged in.
blog/pushbx/2024/0812_early_august_work.txt · Last modified: 2024-08-12 19:21:40 +0200 Aug Mon by ecm