2022-09-20
Yesterday I loaded a bunch of files onto the HP 95LX, intending to read on the device. I used some nondescript "PDF to text" website to gain plain text from two PDFs. Next, I attempted to convert the resulting Big Text File from the UTF-8 encoding to Code Page 850 using iconv. This repeatedly complained about invalid codepoints. I manually search-and-replaced all the smartquotes and friends (using Pluma, the text editor of the MATE Desktop Environment). After that I wrote a small scriptlet to make iconv point me to the next invalid codepoint, so that I could replace those manually. (These were mostly CJK, Russian, or Greek sniplets in the text.) After having processed about one third of the Big Text File I figured out that iconv has a switch, -c, which will make it skip invalid codepoints. (I am not sure whether they are omitted or replaced by a placeholder.)
After finishing the conversion of the Big Text File I split it into smaller files using a perl scriptlet. I had to break up the files not only for each chapter but also after every 1000 lines, so as to get result files smaller than 64 KiB. That's because I anticipated there would be a file size limit in the Memo application.
Then, it was time to transfer the files to the 95LX. This was harder than it ought to be because there were several dozen files and I have yet only mastered the transfer from the Linux box connected to the serial port using the Datacomm application's XMODEM protocol, which can only transfer one file at a time and requires entering the source and destination filename on both sides.
I figured I could zip up the files and send the archive plus a DOS build of Info-ZIP's unzip tool. That wasn't entirely wrong. However, I initially started sending the whole distribution SFX file of unzip, which took way too long. So I unpacked the SFX using dosemu2 on the Linux host, yielding an unzip.exe that comes in at less than 60 KiB. (Judging from its performance this is also a packed executable.) Even with the sunk cost fallacy, I noticed it'd be faster to cancel the transfer now and send only unzip.exe instead of waiting for the whole SFX to arrive.
I decided to pack the Bunch of Smallish Text Files into several archives, and then pack the archives into one main archive. This was because unpacking all the text files alongside the whole archive may have exceeded the disk size of the 2 MiB SRAM card.
After transferring the archive, I noticed that unzip (after complaining about the lack of a TZ environment variable) errored out noting it did not have enough memory to unpack the first file that I attempted. I had to crank up the main memory from the 386 KiB I'd previously allocated to it. (On the 95LX, the internal memory is split with a configurable ratio between the main memory and the RAM disk that is used for the C: drive. The model that I have is equipped with 512 KiB of internal memory.) I have it set to 418 KiB of main memory now.
However, the Memo application complained for some files that it couldn't open them because they were too large. These files were smaller than 64 KiB but exceeded 50 kB. I went and loaded them in the debugger, then saved them into two separate halves each. I didn't bother finding a linebreak or anything to split on, instead opting to simply divide the size in bytes by 2.
That got us almost all the way. However, I noticed that the linebreaks were missing in the Memo application. Loading a file in the debugger it turned out it had Linux-style, LF-only linebreaks. So I had to create an implementation of the unix2dos tool. I opted for a "filter" style program, that is one which reads from its stdin and writes to stdout (much like the b16cat tool I entered with the help of echoify). Thus avoiding the need for parsing filenames and opening the files in the tool. My first attempt appeared to work, but it was prohibitively slow (as depicted by the progress indicator that I added for every 256th iteration). That was probably due to making two DOS syscalls for every input byte. So I created the second revision, which makes two syscalls per every 8 KiB of the input! This one is fast enough, needing only a few seconds to run to completion on the larger files.
Unfortunately I overwrote the first program with the second one, noticing only later that I had used the filenames unix2dos.com and unix2dos2.com, which DOS conflated because of its 8.3 Short File Name limit. It wasn't much to phone home about, just a loop reading one byte from interrupt 21h service 3Fh handle 0 (stdin), and writing either one or two bytes using service 40h handle 1 (stdout). The progress indicator was written to handle 2 (stderr).
Here is the second program:
&r dco6 or= 1000 &u 100 l 100 36BE:0100 81FC0088 cmp sp, 8800 36BE:0104 7301 jae 0107 36BE:0106 C3 retn 36BE:0107 BA0020 mov dx, 2000 36BE:010A B90020 mov cx, 2000 36BE:010D 31DB xor bx, bx 36BE:010F B43F mov ah, 3F 36BE:0111 CD21 int 21 36BE:0113 726B jb 0180 36BE:0115 91 xchg ax, cx 36BE:0116 E368 jcxz 0180 36BE:0118 89D6 mov si, dx 36BE:011A BF0040 mov di, 4000 36BE:011D 90 nop 36BE:011E 90 nop 36BE:011F 90 nop 36BE:0120 AC lodsb 36BE:0121 3C0A cmp al, 0A 36BE:0123 741B jz 0140 36BE:0125 AA stosb 36BE:0126 E2F8 loop 0120 36BE:0128 BA0040 mov dx, 4000 36BE:012B 89F9 mov cx, di 36BE:012D 29D1 sub cx, dx 36BE:012F B440 mov ah, 40 36BE:0131 BB0100 mov bx, 0001 36BE:0134 CD21 int 21 36BE:0136 EBC8 jmp 0100 36BE:0138 90 nop 36BE:0139 90 nop 36BE:013A 90 nop 36BE:013B 90 nop 36BE:013C 90 nop 36BE:013D 90 nop 36BE:013E 90 nop 36BE:013F 90 nop 36BE:0140 B80D0A mov ax, 0A0D 36BE:0143 AB stosw 36BE:0144 EBE0 jmp 0126 36BE:0146 90 nop ... 36BE:017F 90 nop 36BE:0180 B8004C mov ax, 4C00 36BE:0183 CD21 int 21 36BE:0185 90 nop ... 36BE:01FF 90 nop &q
2022-09-21
Today I managed to nearly brick the 95LX when I experimented with how to set environment variables automatically in its DOS.
I created a CONFIG.SYS file on the internal memory RAM disk drive C: and failed to use the /P or /K switches to COMMAND.COM in the SHELL= directive. I did use a /C switch however. This resulted in the shell running my A:\AUTOEXEC.BAT file (as desired) but then exiting back to DOS's initial process. In this MS-DOS version (3.22) this apparently leads to an error about not finding a shell, without any input possible to retry the same or a different command.
Luckily I was able to recover the device without a factory reset that would have led to the loss of all files on the RAM disk. Relevantly, just yesterday in fact the CR2325 battery I'd ordered for the SRAM card had arrived, which I had quickly inserted into the card. (As per the 95LX's manual you can replace the card battery while the card is inserted in the powered 95LX so as not to lose all its data.)
So to fix it there would have been two ways: Either write an A:\CONFIG.SYS that takes precedence with a vanilla SHELL= directive, or modify the A:\AUTOEXEC.BAT to do something sensible, like starting an interactive shell as a child, or the $SYSMGR program which (as I learned) provides access to all of the applications, including the Memo text editor.
Now, the modern day peripheral for accessing the 95LX style RAM card from a Linux desktop box host goes for beyond 500$ so naturally I don't have one. But I do have something else: A second HP 95LX, though in slightly worse shape. (The screen hinges are too loose and the backup battery tray is missing. And I only have the one card, which actually came with this second device.) It was easy to insert the card into this computer. Even if the files would have not permitted booting with the card inserted, I have found one can insert the card at any time after booting just as well. In any case, I took to Memo and modified the batch file to end in a line reading "command". Swapping the card back to the first 95LX I was able to boot to a useable shell and start the $SYSMGR program manually to get back to its Memo.
Subsequently I tested a few more commands and settled on an A:\CONFIG.SYS with the line SHELL=C:\COMMAND.COM /P, deleting the file on the C: drive instead. That means I can boot without the card to directly load the system manager or with it to load the shell. (With this directive the shell automatically loads the A:\AUTOEXEC.BAT file without specifying it explicitly.)
Other than that I have continued reading on the first 95LX and of course went to write this post today, late in the evening.
2022-09-23
Today I added a 40-column friendly mode to the D commands (DB, DW, and DD). It will list up to 8 bytes' worth of data per line, instead of the default 16. Further, it will not list the segment on each line, so that listing in a 16-bit segment (limit < 64 KiB) will fit within the 40 columns.
The segment can be displayed by enabling the header or the trailer, which will no longer include the "header" or "trailer" labels and instead the segment (or selector).
There are two options (in DCO6) which are disabled by default: Indenting the data of the odd lines (starting with an address that modulo 16 equals 8) by an additional blank, and displaying the dash in the middle of the data. The indenting was planned earlier but considered not to be aesthetically pleasing. The dash was easy to add optionally.
This marks the last of the three most common commands to be modified to respect the 40-column mode option. That's R, U, and D. It was easier than expected, not requiring much code to be copied or very much branching to be added. We may revisit the U command to drop the indentation of the operands to the 8th column after the mnemonic offset, next. This would allow more instructions to be fully readable in 40 columns. Further, the default length of the D commands may be shortened to accommodate the 40x16 screen of the 95LX.
Web service used to extract the text from a PDF:
https://tools.pdfforge.org/en/extract-text
Scriptlet used to get iconv to point to invalid codepoints:
$ tail --bytes +"$(iconv -f UTF-8 -t CP858 fic.txt -o ficcp.txt 2>&1 | sed -re 's/.*at position (.*)$/\1/g')" fic.txt | head
Scriptlet used to split the big text file into several dozen smaller files:
$ perl -e 'my $OUTNR = 1; my $LINENR = 1; my $OUTOPEN = 0; while(<<>>) { if ($OUTOPEN == 0) { open($OUTHANDLE, ">", "fic".$OUTNR.".txt"); $OUTOPEN = 1; }; print $OUTHANDLE $_; if (/qqqq Chapter/ || $LINENR > 1000) { close($OUTHANDLE); $OUTOPEN = 0; $OUTNR = $OUTNR + 1; $LINENR = 0; }; $LINENR = $LINENR + 1; }' ficcp.txt
b16cat and echoify: