This shows you the differences between two versions of the page.
— |
blog:pushbx:2023:0321_cpu_performance_comparison [2023-03-21 23:14:21 +0100 Mar Tue] (current) ecm created |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== CPU performance comparison ====== | ||
+ | |||
+ | '' | ||
+ | |||
+ | | ||
+ | |||
+ | ===== The LZMA and LZSA2 depack test ===== | ||
+ | |||
+ | I extracted the files '' | ||
+ | |||
+ | * The 1 MiB model of the HP 95LX, which [[https:// | ||
+ | * The old box, running a single core 686 (Intel Pentium 3) at 1 GHz. This is running MS-DOS 7.10 and JemmEx v5.69, thus in Virtual 86 Mode. | ||
+ | * The new box, running a quad core AMD A10-7870K at nearly 4 GHz. This is running Debian Linux, which in turn runs dosemu2 or qemu, both with KVM, and in both cases running a recent FreeDOS kernel in Real/ | ||
+ | |||
+ | |||
+ | ==== The new box ==== | ||
+ | |||
+ | I uploaded [[https:// | ||
+ | |||
+ | This is the lzip test: | ||
+ | |||
+ | < | ||
+ | -rw-rw-r-- 1 180736 Mar 8 18:21 ldebug5/ | ||
+ | $ DEFAULT_MACHINE=qemu QEMU=./ | ||
+ | | ||
+ | $ DEFAULT_MACHINE=dosemu ./test.sh 1024 lz | ||
+ | Info: 1SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS [cut for brevity] | ||
+ | | ||
+ | $ </ | ||
+ | |||
+ | We can see that dosemu2 (it is '' | ||
+ | |||
+ | Next, the LZSA2 test: | ||
+ | |||
+ | < | ||
+ | -rw-rw-r-- 1 186368 Mar 8 18:21 ldebug5/ | ||
+ | $ DEFAULT_MACHINE=qemu QEMU=./ | ||
+ | 2.15s for 1024 runs ( 2ms / run), method | ||
+ | $ DEFAULT_MACHINE=dosemu ./test.sh 1024 sa2 | ||
+ | Info: 1SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS [cut for brevity] | ||
+ | 2.92s for 1024 runs ( 2ms / run), method | ||
+ | $ </ | ||
+ | |||
+ | The time is under 3ms on both VMs. At this point the scale of the per-run time is too unclear, so we can enhance the test using the '' | ||
+ | |||
+ | < | ||
+ | 2.19s for 1024 runs ( | ||
+ | $ DEFAULT_MACHINE=dosemu INICOMP_SPEED_SCALE=2 ./test.sh 1024 sa2 | ||
+ | Info: 1SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS [cut for brevity] | ||
+ | 2.94s for 1024 runs ( | ||
+ | $ </ | ||
+ | |||
+ | Writing of the variable, this '' | ||
+ | |||
+ | < | ||
+ | qemu-system-i386 --enable-kvm " | ||
+ | |||
+ | Finally, I also extracted lDebug' | ||
+ | |||
+ | |||
+ | ==== The old box ==== | ||
+ | |||
+ | To run the tests on the old box, I transferred the two '' | ||
+ | |||
+ | < | ||
+ | tdebug.com b %1 | ||
+ | echo.|time</ | ||
+ | |||
+ | The lzip test took from 20:32:59,91 to 20:33:30,61 for 256 runs. That's almost exactly 30s. 30_000ms per 256 runs results in 117.2ms per run. LZSA2 took from 20:38:18,15 to 20:38:23,75 **for 1024 runs**. That's about 5.5s. 5_500ms per 1024 runs results in 5.37ms per run. | ||
+ | |||
+ | |||
+ | ==== The HP 95LX ==== | ||
+ | |||
+ | To run these tests, I set up the 1 MiB model of the HP 95LX, as the tdebug test programs need in excess of 350 KiB of memory to run. As usual I first transferred Public Domain ZModem, then used that to transfer all the other programs and scripts I needed. Without an SRAM card, the disk space on the internal RAM drive quickly filled up, so I actually transferred the lzip-compressed test program first, tested it, then deleted it. Only then I transferred the LZSA2-compressed test program and tested that. | ||
+ | |||
+ | The lzip test took a long time, as expected. It ran from 19:03:59 to 19:50:04 for 16 runs, just under 46 minutes. 46 minutes are 2760s. And 2_760_000ms per 16 runs results in 172_500ms per run. Nearly 3 minutes per run matches prior experience on another HP 95LX, which was the reason we switched the inicomp winner to the faster LZSA2. | ||
+ | |||
+ | That depacker, in turn, took from 20:02:51 to 20:04:42 for 16 runs, for a duration of 111s. 111_000ms per 16 runs results in 6_938ms, also matching our experiences with compressed executables on the 95LX. | ||
+ | |||
+ | |||
+ | ==== Comparison ==== | ||
+ | |||
+ | For LZMA-lzip: The HP 95LX takes 1471 times as long as the 686 machine. And the 95LX takes 6388 times as long as the A10. The A10 is about 4.3 times as fast as the 686. | ||
+ | |||
+ | For LZSA2: The HP 95LX takes 1292 times as long as the 686. And the 95LX takes 2417 times as long as the A10. LZSA2 depacks only 1.87 times as fast on the A10 as it does on the 686. | ||
+ | |||
+ | For reference, the 5.37 MHz NEC V20 runs at a frequency that's about one 186th of the 686, and one 740th of the A10. And the 686 of course runs at a quarter the (maximum) frequency of the A10. | ||
+ | |||
+ | Conclusion? CPU-bound tasks like depacking do greatly benefit from higher frequencies, | ||
+ | |||
+ | |||
+ | ===== Bret Johnson' | ||
+ | |||
+ | There were three problems I ran into while using [[https:// | ||
+ | |||
+ | < | ||
+ | mov dx, 21 | ||
+ | in al, dx | ||
+ | . | ||
+ | r v0 := aao - (sp - 10) | ||
+ | s cs:100 length bxcx range ss:sp - 10 length v0 | ||
+ | if (src == 0) then goto :error | ||
+ | f cs:sro length v0 90 | ||
+ | goto :eof | ||
+ | :error | ||
+ | ; Waste loop input instruction not found</ | ||
+ | |||
+ | It assembles a signature code sequence on the stack, determines its length automatically, | ||
+ | |||
+ | The second problem that I encountered was when running SLOWDOWN in dosemu2 KVM on the new box, it kept crashing trying to execute a '' | ||
+ | |||
+ | < | ||
+ | | ||
+ | . | ||
+ | r v0 := aao - (sp - 10) | ||
+ | s cs:100 length bxcx range ss:sp - 10 length v0 | ||
+ | if (src == 0) then goto :error | ||
+ | f cs:sro length v0 90 | ||
+ | goto :eof | ||
+ | :error | ||
+ | ; WBINVD instruction not found</ | ||
+ | |||
+ | The final problem was that after using the two patches, the program was interrupted by a division overflow. The division itself is in the '' | ||
+ | |||
+ | My solution to this is to manually set '' | ||
+ | |||
+ | This is the command to run in order to use only the '' | ||
+ | |||
+ | < | ||
+ | |||
+ | This results in a Slowdown-Unit rating of around 620 in dosemu2 KVM on the A10. The "MHz of an equivalent 80486" results in about 50 MHz. | ||
+ | |||
+ | This is the command to run in order to use both patches, and continue past the overflowing division: | ||
+ | |||
+ | < | ||
+ | |||
+ | This results in the fixed Slowdown-Unit rating of up to 63400, and a 486 equivalent of 5350 MHz. | ||
+ | |||
+ | |||
+ | On the 686 box, neither the '' | ||
+ | |||
+ | |||
+ | On the NEC V20 we get 17 SUs, regardless of whether the '' | ||
+ | |||
+ | |||
+ | ===== Conclusion ===== | ||
+ | |||
+ | CPU-bound benchmarks are much faster on a modern machine than they are on older ones. The frequency increase does not actually suffice to explain the speedup. Some things, like doing I/O, were not sped up nearly as much however. | ||
+ | |||
+ | {{tag> | ||
+ | |||
+ | |||
+ | ~~DISCUSSION~~ | ||