I'm working on a systolic array design and I'm try...
# caravel
j
I'm working on a systolic array design and I'm trying to find out what the most efficient way to connect it to the RISC-V processor in Caravel is. I'm worried that if I used the wishbone there will be a data movement bottleneck to get data into the systolic array. Are there any tutorials available about using direct memory access in Caravel? It would be neat to have the systolic array read the data directly from the SPI flash: https://caravel-harness.readthedocs.io/en/latest/memory-mapped-io-summary.html
t
How would reading from the SPI flash be any less of a bottleneck than the wishbone bus? How fast do you need to move data? The simplest choice that I can think of is that you have a program running on the SoC that copies itself into memory and then runs a loop directly from memory, so that the SPI flash isn't being used to transfer program memory, so then you can have the program make rapid reads off of the the SPI flash and transfer the data directly to the user project through the wishbone bus.
1
m
Yes, that makes a huge difference
j
Thanks for the fast reply and the point about the SPI flash also being a bottleneck. I don't have concrete numbers on how fast I need to move the data. I will try the approach that you suggested! Am I correct in thinking that you are suggesting I should connect the systolic array to the wishbone bus? After that I can implement the rest of the approach by writing a C program for the SoC that copies itself into memory as you described?
t
I'd really need to see a block diagram of what you're proposing. Generally speaking, the existing setup of Caravel assumes that you would extend the function of the processor by putting everything in the memory mapped space of the user project area; if you wanted to implement a systolic array as a peripheral of the SoC, then you would need to copy all data into the array through wishbone bus transfers and read the result back the same way. There is no access back to the SoC memory from the user project, and likely not enough memory there anyway. You could stuff a bunch of SRAM into your user project and let that be an extension of the existing RAM for the SoC. You could also implement a separate SPI flash controller which would then give you access to a large amount of external memory without interfering with the SoC program; you would then have the option of using an external SPI RAM, not necessarily flash. The "best" option in some sense is to use the Openframe chip and implement the entire SoC yourself, including the systolic array, additional memory, and possibly RISC-V instruction extensions specifically for the systolic array operations. But that increases the size and complexity of the project, and it depends on how much time you want to spend doing timing closure.
j
I like the idea of using openframe and implementing the entire SoC myself or using LiteX to generate most of the SoC and adding my systolic array block. Thanks for the advice!
t
In that case, my advice may have sent you on a long journey. . .
j
I appreciate that having LiteX generate a SoC which includes my systolic array design is a very big project but I have used LiteX before and I think it would be a good project!