I read through this several times until I think I understand what you're doing: You have a program to execute reads and writes on your user project using the Caravel management SoC. When you run that code from the SPI flash, it takes 250 core clock cycles for a single data read or write, but when you copy that code into SRAM and run it, it takes only 25 cycles for the same data read or write.
The main difference is still just the difference in parallel vs. serial. The SRAM reads 32 bits in one go, whereas the SPI flash reads one bit at a time at a clock rate that's half the core clock rate, so yes, it's a lot slower. If you configure the SPI flash controller to run in QSPI + DDR mode, you'll get an 8x speedup. But it will still be slower than running the program from SRAM.