If I read thru this entire thread, then I remember a discussion years ago on an Amiga related forum. Many people did speed tests with a very simple program, while most got acceptable results, some very few had timing peeks in the test, which were unusual high.
The issue turned finally out to be related to the processor's instruction cache, or more precise to wrong MMU table setups, which marked some memory areas as "not cachable". Whenever it did happen, that a program was loaded into such a non cachable memory region (and hence executed from there), then the program went extremly slow, as the processor didn't cache/prefetch the program's instructions, but always made a memory access to fetch the next instruction. This got most obvious in tight loops, which usually would fit into the instruction cache entirely, so that no further memory access (by instruction, not data) would be required until the loop exits.
Long text..., my best guess is a non-optimal BIOS/Firmware configuration (cache related settings) on that slow machines.
Another thought of my own, what's about faulty/defect hardware, anything which will (by error) throw lots of interrupt requests, which the processor must handle with high priority compared to the normal priority any programs usually run with.