My custom CPU (aka mCpu) is now working at a basic level on my Digilent Arty S7-50 FPGA.
This video shows the CPU up and running and doing something useful! Its an implementation of Conway’s classic Game of Life cellular automaton ‘game’ where flicking the switches runs the game.
As the CPU boots up, it has a fixed file loaded into ROM in the Block SRAMĀ at address 0. The bootloader in the RAM knows enough to be able to find the attached SD card, load data into memory and begin executing that. So its easy to iterate on the program.
The good:
Has a super simple assembly syntax:
Main_Loop:
save.pc rstack, 0
br.always :DrawControllerStrings
; Read from btn array:
load.4 r9, 0xF003 ; 0xF003 is mem mapped IO for buttons
; Check for the first button
; 1'st bit is the first button
and r2, r9, 1<<1
cmp r2, r2, 0
li r1, 0
br.eq r2, :Main_DontTick
; The first button is pressed - run a normal tick
save.pc rstack, 0
br.always :TickLife
The above code saves the PC to the stack, runs ‘DrawControllerStrings’, reads the memory mapped IO for the buttons, checks that the first one is pressed, branches to ‘Main_DontTick’ if it isnt, else calls ‘TickLife’.
CPU runs at 200mhz (no optimisations at all performed to increase this)
Its a 7 deep super simple pipeline:
pc read
instruction fetch
instruction decode
register fetch
alu operation
memory operation (optional)
register/PC write
Has a simple (512 byte) BIOS/OS which can access switches & buttons, read from the memory card, handle errors and write characters to the screen.
Memory mapped IO supplies the necessary access to the outside world – the CPU itself knows nothing of attached peripherals.
Its a RISC based CPU with load/store architecture and 15 simple instructions which are bit/arithmetic/load/store/compare/branch.
Has only 3 instruction forms which make decoding super easy.
The not yet so good
CPU pipeline is fixed and cannot yet stall. So no loading of DRAM data or integer divides. Memory mapped IO is carefully designed to have its data ready when needed.
Missing plenty of useful (but not strictly necessary) instructions.
No actual pipelining of instructions. Although there’s a 7 stage pipeline only 1 stage is active at once. But the design is so that this should be easy to add.
No flags so overflow, carry, etc have to be performed manually in ASM. The compare instruction is very powerful however and makes up for some of this. I’m in two minds on whether to fix this – on one hand having no special flags/registers is really nice and clean. However, it does mean doing extra work when needed.
No memory virtualisation
So many missing optimisations like branch prediction, speculative execution, OOE, etc. These will come with time.
This post is mainly just a sampler of where I am right now with the mCPU part of the m32 project. As I gather the bits together I plan to do a breakdown of implementing the CPU but that will come after I do the GPU break down. Colin Riley has a good starter course on designing a CPU from scratch which is where I got the inspiration for this.