(1) Have you tried just using the flash in full quad mode? See the caravel repository verilog/dv/caravel/mgmt_soc/qspi/qspi.c. This is by far the easist to set up if that's a high enough throughput for you. (2) There is an example in the picorv32 repository that uses the "flashio_worker" routine in start.s to copy code from the flash to SRAM and then jump to SRAM to execute. See in the picorv32 repository picosoc/firmware.c; e.g., set_flash_qspi_flag is calling the C routine flashio() that copies flash_worker from flash to SRAM and then jumps to the subroutine in SRAM.