GPU running at 150fps!

The mGPU is now working at a basic level on my Digilent Arty A7-100 FPGA.

Frontend vertex processing and backend pixel rasterizing with per pixel perspective correction and z-cull all work.

Its not yet integrated with the CPU so currently it runs a bunch of pre-calculated command buffers loaded from an SD card.

Current Setup:

The good:

Its a tile based renderer, rendering 64×64 pixels at a time running at 300mhz on Arty A7-100. It has 1 frontend vertex processor and 2 back end pixel pipelines. Currently each pixel pipeline can output 1 pixels (so 2 total) per clock throughput, with a latency of ~50 clocks. The number of pixels per clock is limited by the number of DSPs on the chip. I can easily make them 4 wide or add more pipelines if I can free up DSPs.

The GPU and display code uses triple buffered frames. One frame is being displayed on screen, whilst the GPU renders to two others, swapping as it goes. When a frame is done displaying on the screen, the display code swaps to use the most recent GPU render and it continues like that.

The above demo runs at about 150fps, and uses 26% DSPs, 15% LUTs, 9% FFs and 21% BRAM.

The bad:

No integration with mCPU yet.

Currently the texture samplers (which are fixed 64×64 pixels, 5-6-5 rgb format with no mipmapping nor any kind of multisampling or filtering), are loaded into fixed SRAM so can’t dynamically fetch more data during the pixel pipeline. This need a lot of work to add both fetching of data on demand, probaly use one of the S3TC texture formats, and then I can consider multisampling/mipmapping/filtering.

Also the DRAM access is currently running at half speed due to trying too hard to handle FWFT FIFO queues and as a result badly limiting throughput. Its running at 128bit wide 83Mhz but only processing every other clock. This is my first thing to fix as the above demo is pretty memory bandwidth starved and adding sampler data on demand will not happen until I’ve fixed that.

Published by mikevine_wp

Leave a Reply Cancel reply