open-source-silicon.dev

With one of those soldered onto a board with the caravel, I could at least already accelerate a single layer of... GPT2 or so...
<https://www.digikey.pt/en/products/detail/winbond-electronics/W9825G6KH-6/5001919>

Thats 256MBit not 256Kbit....just saying.

Yes.
The biggest matrix operation to be expected for accelerating LLaMA is a 4096x4096 matrix, that makes 4096*4096*4/(1024*1024)=64MB=512MBit... Yeah, I miscalculated but differently than you wanted to point out.
I'd need 3x512MBit for accelerating LLaMA