====== Mid to late July work ====== **2024-07-28** The last two weeks I improved ident86 and used it to identicalise the Enhanced DR-DOS drdos module (ported to JWasm) as well as a new port of MSDebug to NASM. This new port re-used the fixmem.pl script from the WarpLink port to NASM. ===== testwrt ===== [[https://hg.pushbx.org/ecm/testwrt/|This repo]] contains a test case for assembling data in a segment that belongs to a group, created from the MSDebug sources. [[https://www.nasm.us/xdoc/2.16.03/html/nasmdoc8.html#section-8.4.2|The NASM manual states]] that references to such data should default to use the group as a reference. But if the data is assembled using the old Microsoft Macro Assembler from the 2018 free software release of MS-DOS v2 then NASM's references to the data are by default addressed using the segment as a reference, not the group. I used this test repo [[https://bugzilla.nasm.us/show_bug.cgi?id=3392916|to report the problem to the NASM bugtracker]]. I worked around this problem by porting debconst and debdata in the MSDebug sources first, so that they are assembled by NASM. In that case the references do work as expected. ===== MSDebug ===== This repo had fixmem.pl and nasm.mac added from the WarpLink repo. Both were enhanced to work with the old MASM sources of MSDebug. Some of the changes include: * [[https://hg.pushbx.org/ecm/msdebug/rev/b99e36de1f33|Replace SIZE keyword]] followed by structure name * [[https://hg.pushbx.org/ecm/msdebug/rev/b99e36de1f33|Replace IF and IF NOT directives]] * [[https://hg.pushbx.org/ecm/msdebug/rev/0932e7ad57c6|Replace = expressions]] by equates * [[https://hg.pushbx.org/ecm/msdebug/rev/67565bfbaf70|Ignore subttl directive]] * [[https://hg.pushbx.org/ecm/msdebug/rev/0bffbf42d920|Allow for differing capitalisation]] in equate names * [[https://hg.pushbx.org/ecm/msdebug/rev/8cbf24547ff7|Allow uncapitalised include directives]] * [[https://hg.pushbx.org/ecm/msdebug/rev/a0c1b4adc200|Add a group macro]] * [[https://hg.pushbx.org/ecm/msdebug/rev/4595874d7883|Swap two-letter text bytes]] in word immediates * [[https://hg.pushbx.org/ecm/msdebug/rev/34a90d237b7d|Replace operator keywords]] in calculations * [[https://hg.pushbx.org/ecm/msdebug/rev/fe2c066e5bab|Extend supported calculations]] to detect numeric or equate operands * [[https://hg.pushbx.org/ecm/msdebug/rev/417eaaedc32c|Add org macro]] * [[https://hg.pushbx.org/ecm/msdebug/rev/b248d9ff436d|Accept DG: keyword]] and translate it to a WRT clause * [[https://hg.pushbx.org/ecm/msdebug/rev/b0e8ece86da5|Accept segment overide]] before BYTE PTR * Add the addequate function to add aliases for differently-capitalised equates * [[https://hg.pushbx.org/ecm/msdebug/rev/91a45df34a3e|Call addequate]] if equate occurs in an equate * [[https://hg.pushbx.org/ecm/msdebug/rev/a891e74dc492|Hide segment macros from trace listing file]] to avoid confusing convlist.pl * [[https://hg.pushbx.org/ecm/msdebug/rev/4b9f55e5f384|Add macro]] to translate ''int 3'' to ''int3'' * Several fixes to not mix up linebreaks ([[https://hg.pushbx.org/ecm/msdebug/rev/610d83ed5f01|1]], [[https://hg.pushbx.org/ecm/msdebug/rev/aed81a5078bd|2]], [[https://hg.pushbx.org/ecm/msdebug/rev/b2a8ed6fd82c|3]], [[https://hg.pushbx.org/ecm/msdebug/rev/0f44ab8168f3|4]]) * [[https://hg.pushbx.org/ecm/msdebug/rev/6f97af5a137d|Detect wait and default]] as disallowed equate and label names, [[https://hg.pushbx.org/ecm/msdebug/rev/093749d5e7e7|prepend dollar sign]] for NASM * [[https://hg.pushbx.org/ecm/msdebug/rev/a665674e02d6|Allow BYTE PTR]] in parens * [[https://hg.pushbx.org/ecm/msdebug/rev/181fa8c29712|Allow WRT term]] in parens * [[https://hg.pushbx.org/ecm/msdebug/rev/8749f18ab7c9|Allow push or pop memory operand without a size]] * [[https://hg.pushbx.org/ecm/msdebug/rev/05ddd1ce5ce9|Detect dot followed by a label]] * [[https://hg.pushbx.org/ecm/msdebug/rev/bac281c1cf7d|Parse single operand]] with brackets but lacking size * [[https://hg.pushbx.org/ecm/msdebug/rev/e061ca0ab917|Call addequates more]] * [[https://hg.pushbx.org/ecm/msdebug/rev/92aaa96ed6e9|Fix mixup of all-caps equates or labels]] * [[https://hg.pushbx.org/ecm/msdebug/rev/78faed199e02|Make sure rerunning on the same file]] is allowed (sometimes passing the same file twice is useful) * [[https://hg.pushbx.org/ecm/msdebug/rev/20fb6cdb56df|Fix so left shift operator in an operand]] isn't misdetected as a structure instance * [[https://hg.pushbx.org/ecm/msdebug/rev/e899e4fc5453|Fix operating on the same file twice in a row]] (this used to use an inferior file change detection based on the name, leading to an infinite loop) * [[https://hg.pushbx.org/ecm/msdebug/rev/c741eb742f04|Delete size keywords]] for ''lds'' and ''les'' * [[https://hg.pushbx.org/ecm/msdebug/rev/22adf16901e5|Fix]] so labels starting with ''db'' won't confuse the replacement of ''not'', ''or'', ''xor'' keywords that's only supposed to occur in expressions * [[https://hg.pushbx.org/ecm/msdebug/rev/a522fe463ad6|Accept BYTE PTR without brackets]] before a plain number, does not indicate memory access * [[https://hg.pushbx.org/ecm/msdebug/rev/fb22340f2a9e|Use EXTRN directive]] with '':abs'' size as an equate * [[https://hg.pushbx.org/ecm/msdebug/rev/2c62a567f2e1|Fix misdetection of labels]] ending in ''db'' (require either colon or a blank to separate label from a ''db'' directive) The final changes were just maintenance: * [[https://hg.pushbx.org/ecm/msdebug/rev/eb467c5f4386|Replace exe2bin by x2b2]] * [[https://hg.pushbx.org/ecm/msdebug/rev/f634856d708c|Build with NASM files]] instead of original * Drop obsolete files * [[https://hg.pushbx.org/ecm/msdebug/rev/d4712512c63e|Bump release number]] ===== Enhanced DR-DOS ===== Most of these changes were picked from the SvarDOS repo. Pick all SvarDOS changes to the drdos module to port it to JWasm. Identicalised using ident86 at every step. [[https://hg.pushbx.org/ecm/edrdos/rev/ded5f8218ef5|Added cfg.bat]] and ovr.bat [[https://hg.pushbx.org/ecm/edrdos/rev/fc2f68e0a75e|to select or deselect compression]] of the drdos and drbio modules. Allows to optimise the edrpack build a little by disabling all original compression and dropping the code used to uncompress these modules. [[https://hg.pushbx.org/ecm/edrdos/rev/9be23eccb2a4|Align the stack]] in biosinit.asm. In DIR command [[https://hg.pushbx.org/ecm/edrdos/rev/13db4a578693|show accumulated size of listed files]]. [[https://hg.pushbx.org/ecm/edrdos/rev/c3a7e99c10bb|Fix DIR /2 command]] if a directory entry has no date time stamp. [[https://hg.pushbx.org/ecm/edrdos/rev/988a9dd9add1|Add zerocomp tool]] that compresses a file without any framing data. Used for inicomp. ===== lDebug ===== A bug was fixed thrice [[https://hg.pushbx.org/ecm/ldebug/rev/0f0c1508125a|in linfo.eld]] and set.eld ([[https://hg.pushbx.org/ecm/ldebug/rev/ffa5d1e484b7|1]], [[https://hg.pushbx.org/ecm/ldebug/rev/02ff795f4316|2]]) concerning the re-use of multi-purpose puts handlers. When re-using them, their downlink needs to be initialised anew to avoid the possibility of using a stale downlink (if the downlink was modified from the original extcall to puts_ext_done). This bug was reported by a user via email. ===== inicomp ===== [[https://hg.pushbx.org/ecm/inicomp/rev/ef69f7eef438|Add support for Enhanced DR-DOS zerocomp]] as a compression method. ===== Instruction reference ===== [[https://hg.pushbx.org/ecm/insref/rev/8f65a575aae5|Fix a mistake in a size]] of the ''xchg'' instruction. [[https://hg.pushbx.org/ecm/insref/rev/454f45e20c28|Add explanation that swapping operands]] for ''test'' and ''xchg'' instructions results in the same meaning of the instruction. Particularly double-register instructions may be encoded two different ways. In ''test'' an immediate operand always must come last but other than that operand order doesn't matter. (This I found out while improving ident86.) ===== TracList / tractest ===== In convlist.pl [[https://hg.pushbx.org/ecm/tractest/rev/13f709bea29a|support hex dump continuation lines in NASM listings]] that do not fill the entire 40 columns of the dump area. ===== Ident86 ===== * In side by side mode [[https://hg.pushbx.org/ecm/ident86/rev/2ac299d92277|do same or samesame replacement in the next line]] after re-sync * [[https://hg.pushbx.org/ecm/ident86/rev/190202f27f4e|Display error if empty range selected]] * For -M switch [[https://hg.pushbx.org/ecm/ident86/rev/2723150d34d0|accept leading plus sign]] to add to -m number * [[https://hg.pushbx.org/ecm/ident86/rev/7286a3e9be0f|Display counters of differences detected]] * [[https://hg.pushbx.org/ecm/ident86/rev/03b904bdf01b|Allow to re-sync side by side view]] with multiple one-sided lines * [[https://hg.pushbx.org/ecm/ident86/rev/b364d20befbe|Add -o switch]] to change .tls offset (MSDebug needs ''-o -256'', a negative number to the -o switch) * [[https://hg.pushbx.org/ecm/ident86/rev/458108bdb2aa|Allow to swap operands]] of ''xchg'' or ''test'' if this yields a no difference (only relevant for reg,reg encodings) * [[https://hg.pushbx.org/ecm/ident86/rev/6a7d8ff080ca|Delete size keyword]] within brackets * [[https://hg.pushbx.org/ecm/ident86/rev/c060e484ec4d|Expand a16 byte displacements]] to unsigned words * [[https://hg.pushbx.org/ecm/ident86/rev/a924b56abf30|Delete a16 displacement]] equal to zero * [[https://hg.pushbx.org/ecm/ident86/rev/d80477609150|Allow segment override]] for a16 edits * [[https://hg.pushbx.org/ecm/ident86/rev/f6f2662bdcdc|Put hg revision hash of ident86 script]] * [[https://hg.pushbx.org/ecm/ident86/rev/bce9947fab1e|Add -v and -V switches]] (show only version and quit, or do not show version) * [[https://hg.pushbx.org/ecm/ident86/rev/ef0642732628|Enable line buffering]] for use with ''tee'' * [[https://hg.pushbx.org/ecm/ident86/rev/adb3cc488634|Allow -v to work without filenames]] * [[https://hg.pushbx.org/ecm/ident86/rev/bcf2c9a4f11c|Drop empty initial line]] from disassembly result (did no harm but would have had to be handled specifically later) * [[https://hg.pushbx.org/ecm/ident86/rev/f6064e2c44a6|Repeat disassembly]] if malformed line detected (previously would just raise on an assert, aborting the run) * [[https://hg.pushbx.org/ecm/ident86/rev/ae1143d1792f|Limit repetition of disassembly]] [[https://hg.pushbx.org/ecm/ident86/rev/26f78be5d5d3|Introduce fuzzy logic comparison]] to mark lines that differ only in the immediate numbers within a certain range. To do this, all numbers behind the instruction mnemonic are stored into a list and replaced in the text by placeholders starting with a string of Zs and continuing with a string of Ys next for a second immediate. Then the text after replacement is compared with the other instruction. If they now match, next the numbers in the lists are compared. Their delta is calculated, fed to the absolute function, and then it is checked that the delta does not exceed 32. If all of this is matched then the lines are considered to be "fuzzy same". This is only used if side by side mode is used. * [[https://hg.pushbx.org/ecm/ident86/rev/34ace3671441|Add display of earliest difference]] (first line that is not no difference, nor samesame, nor fuzzy same). Requires side by side mode * [[https://hg.pushbx.org/ecm/ident86/rev/34ace3671441|-d switch]] to limit disassembly after earliest difference * [[https://hg.pushbx.org/ecm/ident86/rev/a1ac327cd29d|List earliest difference address]] at the very end * [[https://hg.pushbx.org/ecm/ident86/rev/06386c072240|Extract function swapoperands]], and then [[https://hg.pushbx.org/ecm/ident86/rev/499c5098d9d4|call it during side by side view]] * [[https://hg.pushbx.org/ecm/ident86/rev/9658c2345e1a|Extract function markearliest]], call it also [[https://hg.pushbx.org/ecm/ident86/rev/9afcd983db57|when re-syncing side by side view]] * Display [[https://hg.pushbx.org/ecm/ident86/rev/8ae14e13e650|a hint of how many NOPs still expected]] but missing * [[https://hg.pushbx.org/ecm/ident86/rev/4cc40957a248|Record the seek offset]] of the best matching line in the trace listing (.tls) file * When .tls source indicates that source used define data directives, [[https://hg.pushbx.org/ecm/ident86/rev/9d9919a6af55|display the data using the same directive]] rather than trying to disassemble * Add support for WarpLink extended (/mx) .map files to detect alignment bytes and dump them using ''db'' directives [[https://hg.pushbx.org/ecm/ident86/rev/ec9c2c1d9a6f|rather than trying to disassemble]] * Fix so that a shorter ''db'' range [[https://hg.pushbx.org/ecm/ident86/rev/3b04b6cf8fd4|doesn't get expanded to the trace listing hex dump length]] * [[https://hg.pushbx.org/ecm/ident86/rev/3b04b6cf8fd4|Extract function handlenops]] and [[https://hg.pushbx.org/ecm/ident86/rev/7d577626ddf4|call it during side by side view]] * Fix to [[https://hg.pushbx.org/ecm/ident86/rev/8b1cdaf59722|correctly clear replace variable]] so a multi-line ''db'' command dump doesn't lead to repeating data * [[https://hg.pushbx.org/ecm/ident86/rev/bc27c3d93ef2|Use fuzzy comparison]] in detectnops * [[https://hg.pushbx.org/ecm/ident86/rev/c464b10bc57a|Note which file is missing a NOP]] in the detectnops hint * [[https://hg.pushbx.org/ecm/ident86/rev/e8f31341541e|Add a hint if one file contains a NOP]] and the other contains another instruction at this address * Do not repeat disassembly on [[https://hg.pushbx.org/ecm/ident86/rev/f2f4be749791|obviously bad match]], which is a D command dump with Xs in a data element. This is a stopgap solution to ease debugging. It doesn't detect if an element other than the first starts with Xs. * [[https://hg.pushbx.org/ecm/ident86/rev/8404b242c26b|Round up ranges]] of ''dw'' and ''dd'' data items so no partial items are listed. (Changeset message incorrectly lists "db and dw".) * [[https://hg.pushbx.org/ecm/ident86/rev/332545a81cdc|Give a hint if a segment override mismatch is detected]] in a different line ==== Edit which file hints ==== [[https://hg.pushbx.org/ecm/ident86/rev/ba4cca03e7fb|Add -e switch]]. This can be set to 0 (default, as before) or 1 or 2. If nonzero, all the hints are worded so as to specify what edits are needed in the specified file to match the other file. We'll usually use ''-e 2'' for now. ==== Cookies ==== [[https://hg.pushbx.org/ecm/ident86/rev/86703d36fa68|Add -c switch]]. This specifies a filename to use as a cookie file. On initialisation, if the file exists then the last line is read. This line specifies a minimum offset like for the -m switch. On finding a definite difference (not no difference, nor samesame, nor fuzzy same) the offset of the current range's start is appended to the cookie file, except if the file already exists and the last line matches the new line to write already. The idea is to automatically skip content that is likely already identicalised after finding and fixing a definite difference. I used to do this manually to lower the time needed to run ident86. And when you do something robotic and simple manually, repeatedly, why not automate it? The idea is to work on a file with the -c switch for step by step work, then run a full comparison without -c, -m, or -M to ensure the entire file is identicalised. ==== The future ==== The hints are fairly specific for a number of differences already. It is possible to automatically read the hints, relate them to the trace listing, relate that to the original listing file, and then try to heuristically find the corresponding spot in the original source text file. This would allow to programmatically apply the edit to the source file without user intervention, then loop back to assembling and comparing the file again. {{tag>nasm msdebug edrdos ldebug inicomp insref traclist ident86}} ~~DISCUSSION~~