Thursday, January 2, 2020

My new Z80 emulator

So five years ago, I started speculating about how emulators could be improved and came up with several ideas.  One of them was to emulate the clock pin on CPUs (and other devices such as flip-flops or the PIA6821) for maximum accuracy.  I speculated that this would take a major performance hit but perhaps with multi-threading technology, this performance hit could be offset.

Well, I've been working on this secretly for several months now.  I've emulated almost every TTL chip on the Dragon's Lair logic board as a standalone device and tied them all together.  Performance at the moment is barely adequate on modern beefy hardware (75% CPU usage on a single thread on an AMD Ryzen) and not adequate on a Raspberry Pi 4.

I experimenting with using multi-threading solutions to speed this up, but the critical timing and the need to keep the whole system running in lock-step made me conclude that this must be run on a single thread.  I'll keep looking at ways to use other threads to offload things like video and sound processing which I think will work fine.

So the good news is that I believe I'll be able to design the "perfect" Dragon's Lair emulator and it will be able to run at full speed on modern CPUs (such as the aforementioned Ryzen) and possibly even older CPUs like the Intel i5.  When I say perfect, I mean that every single pin on every single IC in the system (including the Z80) will be emulated and have the exact timing of original hardware.  Where a device is digital, there will be no shortcuts.  (analog devices such as the sound chip will not be held to this same standard of perfection).  I'm pretty excited about this because I sometimes study/repair original game board sets and not having the ability to study how the hardware is supposed to work via emulation has been quite inconvenient.

The bad news is that Dragon's Lair is the simplest of laserdisc games, so a game like Star Rider (which has three 6809E CPUs and a ton of TTL chips) probably won't able to run at full speed even on the newest of hardware using this approach.  I'll be looking at thoughtful optimizations/compromises to make for these scenarios since obviously with enough compromises, any of these old games could be made to run at full speed.

Here's a brief description of how the Z80's pins work from an arbitrary instruction:


Here's every single pin of a Z80 captured via my logic analyzer.  When I embarked on the journey to emulate this beast, I thought "this is insane.  it will take so long to figure this stuff out."  but.. this is the kind of emulator I want Daphne to be, so I did it anyway.  now it's starting to make sense.

I had to do two separate captures because my logic analyzer can capture 32 pins at a time (which is a ton!) and the z80 has 40 pins.  that alone almost made it give up.

I've put colored lines to show how this instruction (LD (HL), 00h) works.

First section ("M1") is loading the instruction's opcode from the ROM program.  It sets the address to 1153h, lowers RD and MREQ lines, then the EPROM puts 36h on the data bus.  Z80 then raises RD/MREQ, then lowers MREQ/REFRESH, putting the number 0005h on the address bus which is used for dynamic ram to refresh itself (I think dragon's lair ignores it because it uses static ram).  That takes 4 clock cycles and is known in Z80 speak as "M1".  Next section ("M2") it needs to read the rest of the instruction, so it sets address to 1154h, lowers RD/MREQ again and the EPROM puts 00h on the data bus.  M2 is complete.  Now it has the full instruction.  It goes to the next section ("M3").  HL happens to have a value of A000h, so it sets the address bus to A000h, lowers MREQ but instead of lowering RD, it puts 00h on the data bus and lowers WR which tells the rest of the system that "Hey, I want to write a 00h to address A000h".  The rest of the system's hardware maps that address to the RAM and the RAM wakes up and grabs the data from the data bus and stores it.

The amazing thing is... how did they design the Z80 in 1976 or whenever?