Yes.
The biggest matrix operation to be expected for accelerating LLaMA is a 4096x4096 matrix, that makes 4096*4096*4/(1024*1024)=64MB=512MBit... Yeah, I miscalculated but differently than you wanted to point out.
I'd need 3x512MBit for accelerating LLaMA