User Tools

Site Tools


blog:pushbx:2024:0107_enhanced_dr-dos_single-file_load

Enhanced DR-DOS single-file load

2024-01-07

As I had previously mentioned on the FreeDOS FDISK bug tracker on github, I went and combined the EDR-DOS kernel files to create a single-file load experience.

I'd still be interested in learning how to combine the BIO and the DOS file into a single kernel file to then combine this with my iniload and inicomp stages, changing the kernel to use the native lDOS / RxDOS.3 load protocol. This protocol is described some in the lDOS boot docs: https://pushbx.org/ecm/doc/ldosboot.htm#protocol-sector-iniload

One of the advantages to booting off of lDOS iniload is that it can be used as a number of different format kernels (unfortunately not as the current EDR-DOS DRBIO load protocol), quoth https://pushbx.org/ecm/doc/ldebug.htm#buildingprocess

The bootable executables can be used as MS-DOS 6 protocol IO.SYS, MS-DOS 7/8 IO.SYS, PC-DOS 6/7 IBMBIO.COM, FreeDOS KERNEL.SYS, RxDOS.3 RXDOS.COM, or as a Multiboot specification or Multiboot2 specification kernel. In any kernel load protocol case, the root FS that is being loaded from should be a valid FAT12, FAT16, or FAT32 file system on an unpartitioned (super)floppy diskette (unit number up to 127) or MBR-partitioned hard disk (unit number above 127). In addition, the bootable executables also are valid 86-DOS application programs that can be loaded in EXE mode either as application or as device driver. (Internally, all the .com files are MZ executables with a header, but they are named with a .COM file name extension for compatibility.)

I wrote about why to prefer a single-file kernel load in the forum, as well: https://www.bttr-software.de/forum/forum_entry.php?id=20762&page=1&category=0&order=time

I also described some possible advantages to a single-file load in the BTTR Software forum (1, 2). Not all of these are utilised by my single-file load as yet, as the build still creates two separate files for the build. Also, each file is still created from linking a number of object files rather than building the entire module in one big assembly.

I don't think it is required to combine the file. It is good for other reasons though.

What are the other reasons?

Several:

  • BIO and DOS file can get out of sync, potentially resulting in failures to boot or even data corruption.
  • One more file to keep track of.
  • Need a FAT FS read implementation in the BIO that is only ever used to read the DOS file, this is in bdosldr.a86 for EDR-DOS, called by the BIO init routines here.
  • The BIO kernel has to locate the DOS file, meaning it will have to include a directory scanner. (Arguably, lDOS iniload contains just as much of a FAT FS reader (to read the remainder of the kernel file) as EDR-DOS's BIO file, but lDOS iniload certainly does not need a directory scanner.)
  • Compression of kernel files also needs two bespoke solutions, one for each file, whereas lDOS iniload + inicomp has a single compression stage which depacks the entire remaining kernel. This can make for better compression ratio than compressing two files, too.

Back in the day I discussed this with Udo in the EDR-DOS forums. However, that thread is likely lost to time. I recall that Udo brought up you can update one of the files without the other, and vendors could get away with only providing a BIO file sharing a common DOS file that they wouldn't have to know much about. At the time I already noted that this last advantage is minor if the entire kernel is available as free software.

The second link that I included in my list also hints it is possible for the DOS to be resident somewhere already, and not have the BIO load it from a file. This means part of the work towards a single-file kernel may already be done.

Another advantage is you may be able to share more code between the entire kernel than if you split it into two files. And some build time calculations may be possible to optimise things that a two-file kernel cannot do. (In lRxDOS's single-assembly build, NASM can potentially calculate things that even a single-file kernel build cannot if you use a linker to link multiple object files into one executable.)

Intention

As described in the quoted text, a single-file load has some advantages:

  • File can be smaller when compressed
  • Updates can never install an incompatible set of kernel files
  • No disk read, directory scan, and file system read needed to load additional file

Implementation

drkernpl

The drkernpl stage of the lDOS boot ecosystem is a payload to lDOS's iniload. It is based on fdkernpl, the payload that passes control to an embedded FreeDOS kernel. In the case of drkernpl, two payload files are included, corresponding to the DRBIO and DRDOS modules.

Changes in the EDR-DOS repo's sources

The DRDOS module is in fact unchanged from the prior double-file load protocol. DRBIO contains some patches which concern keeping track of the position (segment address) and length of the loaded DRDOS module, as well as relocating it to the position it would be loaded to by the read_dos function.

Some additional code in DRBIO takes care of temporarily relocating the DOS module to segment 1070h first. This is only used if the compbios compression is used, as the depacker for this uses memory up to 70h:FFF0h. The compbios compression encodes consecutive zeroes using a simple run-length encoding. When compressing the entire kernel file, also enabling compbios is at best neutral but can actively increase the size of the final file.

Other than the additional DRDOS payload, the protocol of drkernpl to DRBIO mimics that of the original EDR-DOS load protocol: DRBIO is fully loaded at linear address 700h, entered at cs:ip = 70h:0, and passed a boot sector with (E)BPB at ds:bp == ss:bp. The boot load unit is passed in dl == bl, but can also be read from byte [ss:bp + 24h] (FAT12/FAT16) or byte [ss:bp + 40h] (FAT32). The DOS allocation is passed in ax ⇒ DOS file and si = length of DOS file in bytes. The stack is allocated up to 8 KiB in a segment behind the DOS file.

Code dropped from DRBIO

The compbios compression has been disabled to simplify the handling of the DRDOS module, and to avoid compressing the already-compressed stretch of data.

The entire read_dos function is dropped, including its file system login, FAT chain walking, directory scanning, and file reading. Several multiplication and division functions were also dropped. A DOS version check function remains in the bdosldr source file, and the DOS-related error message also remains as it is used by this function. The DRDOS module's filename in FCB format also got dropped.

iniload

The iniload stage is mostly unchanged from the recent revisions of lDOS boot. It can be loaded as a (first) kernel file for the following load protocols:

  • lDOS / RxDOS.3
  • FreeDOS
  • Enhanced DR-DOS
  • MS-DOS v6 / IBM-DOS
  • MS-DOS v7
  • Multiboot v1
  • Multiboot v2

The only change to iniload is for the addition of the EDR-DOS load protocol. This requires the full kernel file to be at least 32 KiB in size, to allow distinguishing EDR-DOS load from MS-DOS v6 / IBM-DOS load, all of which use an entrypoint of 70h:0.

(Secretly, the change also allows to load the kernel using the NTLDR, BOOTMGR, or DOS-C IPL.SYS load protocols. But these are not documented.)

inicomp

The inicomp stage allows to compress a kernel at build time to depack it at runtime. In the lDOS boot ecosystem, inicomp is used to build a compressed kernel. For building the compressed EDR-DOS kernel, inicomp is assembled for single-mode operation, allowing to enter it only as a kernel. (When enabled, inicomp can also be entered as a DOS device driver or DOS application.)

Literally no changes to inicomp were added related to EDR-DOS single-file load.

drload

The drload stage is a replacement for iniload. The advantage of drload is purely to optimise the resulting file for size. The disadvantage is that the file can only be used as a kernel for the FreeDOS or EDR-DOS load protocols.

This limitation allows to drop all disk read, FAT chain walking, and file read code from drload, and most file system related code. The initial, already working revision of drload was created almost exclusively by dropping lines from the copied iniload source file. All that remains is mostly dedicated to setting up the stack and a few variables for the next stage.

As described in the ldosboot manual, drload uses a subset of the iniload-to-payload protocol to run its own payload. This subset suffices to run a kernel-mode-only inicomp as well as drkernpl.

Second payload executable

The single-file (uncompressed) kernel that uses iniload is named edrdos.com. To avoid the odd choice of a file named .COM without being a valid DOS application executable, this kernel includes the second payload executable option of the iniload stage. Basically, the MZ executable header is mostly valid and refers to an image separate from the kernel payload image.

In the first revision of the single-file load, this executable accepts an empty command line tail or one starting with the word "version" as a request to display the version string of the kernel. (This revision of the kernel identifies itself as the "2023 December" revision.)

Compression

Aside the uncompressed files (edrdos.com for iniload, edrdos.sys for drload), the mak.sh script of the single-file kernel also builds two compressed files. These are named edrpack.com (for iniload) and edrpack.sys (for drload).

The files are compressed in the LZMA-lzip format. (The term "LZMA-lzip" is from the lzip version 1.21 manual, more recently the format is only called "LZMA-302eos".) From experiences with lDebug, this likely results in the smallest output file of all the supported formats. However, depacking on a low-end machine (eg NEC V20) may take several minutes to complete.

The drload-using compressed file is smaller than the sum of the original double-file kernel files compressed using the pack101 method, supporting the idea of an advantage of the single-file load:

pack$ du drbio.sys.* drdos.sys.* --bytes --total
35575   drbio.sys.unpacked
36973   drdos.sys.unpacked
72548   total
pack$ du drbio.sys drdos.sys --bytes --total
18414   drbio.sys
27799   drdos.sys
46213   total
pack$
$ du * --bytes
80896   edrdos.com
76080   edrdos.sys
50176   edrpack.com
45680   edrpack.sys
$

Of course, two files are also worse in that they use at least two directory entries and more cluster slack space in the file system. For example, with the smallest cluster size on a file system that uses 512 Bytes/sector:

pack$ echo "$(( (18414 + 511) / 512 * 512 ))"
18432
pack$ echo "$(( (27799 + 511) / 512 * 512 ))"
28160
pack$ echo "$(( 18432 + 28160 ))"
46592
pack$
$ echo "$(( (45680 + 511) / 512 * 512 ))"
46080
$

Future

Dropping the early initialisation of sp to C000h (48 KiB within the stack segment) should be acceptable when compbios compression is not used.

The support for an UPX-packed DRDOS module in DRBIO can be dropped for the single-file kernel, as there is no advantage to using UPX in this way.

The dots progress display (_COUNTER) of the inicomp stage may be enabled to indicate the progress of the depacker, especially for slower machines.

Building using different inicomp compression methods may be added. The support exists within inicomp, using it is merely a question of scripting. (The canonical implementation of all of the methods is in lDebug's mak.sh script.)

Set up

Currently to use the provided mak.sh script to build the four single-file kernels, some setup is needed:

  • ldosboot repo accessible as ../ldosboot/
  • lmacros likewise
  • scanptab likewise
  • inicomp likewise
  • lzip and nasm must be found in the path

The script will default to building the entire kernel using dosemu2 and the mak.bat script. Passing the first parameter as the unquoted string "onlypl" allows to skip the dosemu2 call.

Reliability

Currently the kernel does some things that may not be reliable:

  • Relocates sp to C000h early without relocating ss (could have been a problem before too)
  • Copies DRDOS module using forward rep movsb (Direction Flag UP) unconditionally, assumes that source and destination do not overlap
  • Assumes that the DRDOS module fits in the memory layout without checking that this is true (could have been a problem before too, in fact due to sector size the problem was probably worse before)

On the other hand, in some ways the single-file kernel is definitely more reliable:

Use

Any one of the four kernel files suffices to load the kernel. FreeDOS SYS or EDR-DOS SYS may both be used to install the boot sector loader, using their /K switch to set the appropriate filename for the kernel. (EDR-DOS SYS is so old it does not contain loaders that support 256 sectors per cluster.) Other loaders that support either EDR-DOS or FreeDOS are usable as well.

When using the edrdos.com or edrpack.com file (iniload), lDOS's instsect may also be used, in which case the /F= switch is used to set the filename. A build of instsect (with the lDOS protocol loaders) is included in lDebug builds. Renaming either of the .COM (iniload) files to io.sys or ibmbio.com also allows to load them using boot loaders for MS-DOS v6/v7 or PC-DOS v6/v7.

Discussion

Andrew BirdAndrew Bird, 2024-01-10 19:47:42 +0100 Jan Wed, 2024-01-10 19:54:27 +0100 Jan Wed

Hi Ecm,

I'm glad you are back to blogging! So this is an interesting write up. As I'm sure you know Dosemu2 boots DOS kernels on its created on the fly fatfs filesystem without using the native boot block. In your new single file scheme what changes to Dosemu2 are required to boot the new EDR-DOS?

Thanks,

Andrew

E. C. MaslochE. C. Masloch, 2024-01-10 20:19:09 +0100 Jan Wed

Two replies to your comment posted on this page. I think I didn't correctly reply to your comment so I don't know whether it will have notified you.

E. C. MaslochE. C. Masloch, 2024-01-10 20:20:17 +0100 Jan Wed

Ah by the way please don't start a paragraph with two blanks as you did here. I edited your post to make it show up correctly. With the blanks dokuwiki seems to interpret it as a code line (no line wrap and monospaced font).

E. C. MaslochE. C. Masloch, 2024-01-10 20:11:19 +0100 Jan Wed
I'm glad you are back to blogging!

Yeah, I didn't get to do it since early December. Still considering a longer post to list everything that I did since, in the style of the weekly updates.

So this is an interesting write up. As I'm sure you know Dosemu2 boots DOS kernels on its created on the fly fatfs filesystem without using the native boot block. In your new single file scheme what changes to Dosemu2 are required to boot the new EDR-DOS?

You can use dosemu's FreeDOS load protocol with just a different filename, or the old EDR-DOS load protocol. What I need is ss:bp → boot sector with hidden sectors and load unit, load address 600h or 700h, entered with segment adjustment : offset equal to 0:0, whole file loaded. To detect EDR-DOS load in iniload, ss:sp need to not overlap the loaded file data and must be above linear 700h too.

I haven't decided on a canonical / preferred name for now, among the four that I create in my script. edrdos.sys and edrpack.com are both not that useful because they save some file size in one way but spend it another way. So edrdos.com and edrpack.sys remain as the more useful names. I'd suggest edrdos.com because compression is not terribly useful to a system that can run dosemu2.

E. C. MaslochE. C. Masloch, 2024-01-10 20:17:52 +0100 Jan Wed
You can use dosemu's FreeDOS load protocol with just a different filename, or the old EDR-DOS load protocol.

Note, you should of course drop the second filename check / allocation for the single-file load.

Also, I noticed that you specify the sp is not used for EDR-DOS load. However, for iniload it is required. https://github.com/dosemu2/dosemu2/blob/1b8f049aa12d557768d632cc00ca9626a1ef81d0/src/base/misc/fatfs.c#L1732

Further, are there any checks that the file size doesn't overflow? I don't see any in the file load nor in the FreeDOS specific part.

Andrew BirdAndrew Bird, 2024-01-11 15:17:03 +0100 Jan Thu
I think I didn't correctly reply to your comment so I don't know whether it will have notified you.

No, sorry I didn't get any notification.

Ah by the way please don't start a paragraph with two blanks as you did here.

That explains things, I usually indent the first paragraph as a sort of throwback to letter writing at school many years ago. After trying to get things to wrap and failing miserably I just gave up, so thanks for fixing it for me.

Further, are there any checks that the file size doesn't overflow?

In the past when we just loaded say the first four sectors or so I guess it wasn't important. Now that we load the whole file for some DOSes we probably should.

I haven't decided on a canonical / preferred name for now

I'm probably going to wait a while until things settle down, although I might run with this locally for testing as I see mentions in various commit logs regarding int21/71xx functions in EDR-DOS. I hadn't realised anybody had implemented LFN functions for DOS except DOSLFN. I'm keen to understand what int21/71a0 should return in the case that the filesystem is not VFAT. I'd been considering implementing int21/71a0 in FDPP (with callout to redirector if necessary), but at the moment it doesn't support LFN on its FAT drives. So for example, should it return flags == 0, filename length == 12, pathname length == 67, and name FAT / FAT32. On the other hand should int21/71a0 even exist if the filesystem doesn't support LFN.

E. C. MaslochE. C. Masloch, 2024-01-14 22:30:51 +0100 Jan Sun
No, sorry I didn't get any notification.

None at all?

In the past when we just loaded say the first four sectors or so I guess it wasn't important. Now that we load the whole file for some DOSes we probably should.

Yes, I agree. When I added the RxDOS.3 support I made sure to clamp size to what fits. (Unlike the FreeDOS/EDR-DOS load, lDOS/RxDOS.3 load is designed so that arbitrary data like a .ZIP archive may be appended to the kernel.) You could probably adapt that calculation but compare size for an error condition rather than to clamp it.

I'm probably going to wait a while until things settle down,

I just prepared my 2024 January revision which includes most patches from the SvarDOS EDR-DOS repo, except the ones to build with the Watcom make and wlink.

although I might run with this locally for testing as I see mentions in various commit logs regarding int21/71xx functions in EDR-DOS. I hadn't realised anybody had implemented LFN functions for DOS except DOSLFN.

As mentioned in https://hg.pushbx.org/ecm/edrdos/file/6497e3a1a0c7/doc/BUGS.TXT#l89 :

Enhanced DR-DOS 7.01.08 WIP (28.3.2009) includes partial support for several LFN functions (7143h,714E/4Fh,71A1h) which are used by COMMAND.COM or other tools, however, they do not actually support LFN functionality. If you need full LFN support, you have to load DOSLFN or a similar TSR.

The purpose of the pseudo-LFN functions is to allow working with 64-bit file sizes on FAT+ file systems (38 bits allocatable of course).

I'm keen to understand what int21/71a0 should return in the case that the filesystem is not VFAT. I'd been considering implementing int21/71a0 in FDPP (with callout to redirector if necessary), but at the moment it doesn't support LFN on its FAT drives. So for example, should it return flags == 0, filename length == 12, pathname length == 67, and name FAT / FAT32. On the other hand should int21/71a0 even exist if the filesystem doesn't support LFN.

https://fd.lod.bz/rbil/interrup/dos_kernel/2171a0.html

For EDR-DOS the function isn't supported at all. DOSLFN returns "?" or "FAT" or "FAT32" or "CDFS" as the name. If I get around to it I may try running Phantom on an MSW 95 VM (pcjs has one) and see what it says for the redirected drive.

Andrew BirdAndrew Bird, 2024-01-15 14:01:22 +0100 Jan Mon
None at all?

No, I'm pretty sure I've never had any email from your blog. I just checked the profile held email address and it's correct. It's possible that it could be being dropped as spam as I do run spamassassin on my mx for the domain. I do think that's unlikely as I still do manage to receive an amount of spam even though.

Enhanced DR-DOS 7.01.08 WIP (28.3.2009) includes partial support for several LFN functions …

I never could get that WIP version to run properly under dosemu2, it would boot but randomly freeze at some point.

The purpose of the pseudo-LFN functions is to allow working with 64-bit file sizes on FAT+ file systems (38 bits allocatable of course).

Ahh, I get it now.

For EDR-DOS the function isn't supported at all. DOSLFN returns "?" or "FAT" or "FAT32" or "CDFS" as the name.

I think we already did some investigations here https://github.com/dosemu2/dosemu2/issues/770 and here https://github.com/dosemu2/dosemu2/issues/539 , they are pretty long threads. It seems I have a habit of raking this topic up every few years. I think this time I'm going leave well alone and not waste any more peoples time.

Thanks

Stuart AxonStuart Axon, 2024-01-18 00:03:15 +0100 Jan Thu, 2024-01-18 22:15:22 +0100 Jan Thu

I went down a search engine rabbit hole on DrDOS and Windows 95 not running it today (it seems they don't surface as much info as there used to be any more), anyway tried again and found this blog, great to see you are doing this work, having only seen you before DOSEMU2 github.

I can see there's a bunch to read to catch up on :)

I should really see if anyone has made any scripts to set DrDOS up on Qemu next.

[EDIT] I had unforgivably mixed up DOSEMU and Dosbox… the title of the window in Dosemu being "dos in a box" for years didn't help [/EDIT]

E. C. MaslochE. C. Masloch, 2024-01-19 22:27:51 +0100 Jan Fri

The single-file load is being discussed on the SvarDOS's EDR-DOS repo: https://github.com/SvarDOS/edrdos/issues/28

You could leave a comment if you were logged in.
blog/pushbx/2024/0107_enhanced_dr-dos_single-file_load.txt · Last modified: 2024-01-08 09:42:28 +0100 Jan Mon by ecm