CPU, OS, SD and the Assembler come together

My custom CPU (aka mCpu) is now working at a basic level on my Digilent Arty S7-50 FPGA.

This video shows the CPU up and running and doing something useful! Its an implementation of Conway’s classic Game of Life cellular automaton ‘game’ where flicking the switches runs the game.

As the CPU boots up, it has a fixed file loaded into ROM in the Block SRAMĀ  at address 0. The bootloader in the RAM knows enough to be able to find the attached SD card, load data into memory and begin executing that. So its easy to iterate on the program.

The good:

Has a super simple assembly syntax:

Main_Loop:

save.pc rstack, 0
br.always :DrawControllerStrings

; Read from btn array:
load.4 r9, 0xF003 ; 0xF003 is mem mapped IO for buttons

; Check for the first button
; 1'st bit is the first button
and r2, r9, 1<<1
cmp r2, r2, 0
li r1, 0
br.eq r2, :Main_DontTick

; The first button is pressed - run a normal tick
save.pc rstack, 0
br.always :TickLife

The above code saves the PC to the stack, runs ‘DrawControllerStrings’, reads the memory mapped IO for the buttons, checks that the first one is pressed, branches to ‘Main_DontTick’ if it isnt, else calls ‘TickLife’.

CPU runs at 200mhz (no optimisations at all performed to increase this)

Its a 7 deep super simple pipeline:

pc read

instruction fetch

instruction decode

register fetch

alu operation

memory operation (optional)

register/PC write

Has a simple (512 byte) BIOS/OS which can access switches & buttons, read from the memory card, handle errors and write characters to the screen.

Memory mapped IO supplies the necessary access to the outside world – the CPU itself knows nothing of attached peripherals.

Its a RISC based CPU with load/store architecture and 15 simple instructions which are bit/arithmetic/load/store/compare/branch.

Has only 3 instruction forms which make decoding super easy.

The not yet so good

CPU pipeline is fixed and cannot yet stall. So no loading of DRAM data or integer divides. Memory mapped IO is carefully designed to have its data ready when needed.

Missing plenty of useful (but not strictly necessary) instructions.

No actual pipelining of instructions. Although there’s a 7 stage pipeline only 1 stage is active at once. But the design is so that this should be easy to add.

No flags so overflow, carry, etc have to be performed manually in ASM. The compare instruction is very powerful however and makes up for some of this. I’m in two minds on whether to fix this – on one hand having no special flags/registers is really nice and clean. However, it does mean doing extra work when needed.

No memory virtualisation

So many missing optimisations like branch prediction, speculative execution, OOE, etc. These will come with time.

This post is mainly just a sampler of where I am right now with the mCPU part of the m32 project. As I gather the bits together I plan to do a breakdown of implementing the CPU but that will come after I do the GPU break down. Colin Riley has a good starter course on designing a CPU from scratch which is where I got the inspiration for this.

Leave a Reply

Your email address will not be published. Required fields are marked *