2024-07-28
The last two weeks I improved ident86 and used it to identicalise the Enhanced DR-DOS drdos module (ported to JWasm) as well as a new port of MSDebug to NASM. This new port re-used the fixmem.pl script from the WarpLink port to NASM.
This repo contains a test case for assembling data in a segment that belongs to a group, created from the MSDebug sources. The NASM manual states that references to such data should default to use the group as a reference. But if the data is assembled using the old Microsoft Macro Assembler from the 2018 free software release of MS-DOS v2 then NASM's references to the data are by default addressed using the segment as a reference, not the group.
I used this test repo to report the problem to the NASM bugtracker.
I worked around this problem by porting debconst and debdata in the MSDebug sources first, so that they are assembled by NASM. In that case the references do work as expected.
This repo had fixmem.pl and nasm.mac added from the WarpLink repo. Both were enhanced to work with the old MASM sources of MSDebug. Some of the changes include:
db
won't confuse the replacement of not
, or
, xor
keywords that's only supposed to occur in expressions:abs
size as an equatedb
(require either colon or a blank to separate label from a db
directive)The final changes were just maintenance:
Most of these changes were picked from the SvarDOS repo.
Pick all SvarDOS changes to the drdos module to port it to JWasm. Identicalised using ident86 at every step.
Added cfg.bat and ovr.bat to select or deselect compression of the drdos and drbio modules. Allows to optimise the edrpack build a little by disabling all original compression and dropping the code used to uncompress these modules.
Align the stack in biosinit.asm.
In DIR command show accumulated size of listed files.
Fix DIR /2 command if a directory entry has no date time stamp.
Add zerocomp tool that compresses a file without any framing data. Used for inicomp.
A bug was fixed thrice in linfo.eld and set.eld (1, 2) concerning the re-use of multi-purpose puts handlers. When re-using them, their downlink needs to be initialised anew to avoid the possibility of using a stale downlink (if the downlink was modified from the original extcall to puts_ext_done).
This bug was reported by a user via email.
Add support for Enhanced DR-DOS zerocomp as a compression method.
Fix a mistake in a size of the xchg
instruction.
Add explanation that swapping operands for test
and xchg
instructions results in the same meaning of the instruction. Particularly double-register instructions may be encoded two different ways. In test
an immediate operand always must come last but other than that operand order doesn't matter. (This I found out while improving ident86.)
In convlist.pl support hex dump continuation lines in NASM listings that do not fill the entire 40 columns of the dump area.
-o -256
, a negative number to the -o switch)xchg
or test
if this yields a no difference (only relevant for reg,reg encodings)tee
Introduce fuzzy logic comparison to mark lines that differ only in the immediate numbers within a certain range. To do this, all numbers behind the instruction mnemonic are stored into a list and replaced in the text by placeholders starting with a string of Zs and continuing with a string of Ys next for a second immediate. Then the text after replacement is compared with the other instruction. If they now match, next the numbers in the lists are compared. Their delta is calculated, fed to the absolute function, and then it is checked that the delta does not exceed 32. If all of this is matched then the lines are considered to be "fuzzy same". This is only used if side by side mode is used.
db
directives rather than trying to disassembledb
range doesn't get expanded to the trace listing hex dump lengthdb
command dump doesn't lead to repeating datadw
and dd
data items so no partial items are listed. (Changeset message incorrectly lists "db and dw".)
Add -e switch. This can be set to 0 (default, as before) or 1 or 2. If nonzero, all the hints are worded so as to specify what edits are needed in the specified file to match the other file. We'll usually use -e 2
for now.
Add -c switch. This specifies a filename to use as a cookie file. On initialisation, if the file exists then the last line is read. This line specifies a minimum offset like for the -m switch. On finding a definite difference (not no difference, nor samesame, nor fuzzy same) the offset of the current range's start is appended to the cookie file, except if the file already exists and the last line matches the new line to write already.
The idea is to automatically skip content that is likely already identicalised after finding and fixing a definite difference. I used to do this manually to lower the time needed to run ident86. And when you do something robotic and simple manually, repeatedly, why not automate it?
The idea is to work on a file with the -c switch for step by step work, then run a full comparison without -c, -m, or -M to ensure the entire file is identicalised.
The hints are fairly specific for a number of differences already. It is possible to automatically read the hints, relate them to the trace listing, relate that to the original listing file, and then try to heuristically find the corresponding spot in the original source text file. This would allow to programmatically apply the edit to the source file without user intervention, then loop back to assembling and comparing the file again.