Is that 64b ICache implemented as 4 x 16b cache lines of which at least one of those cache lines is always doing speculative instruction fetching, meaning it is busy. The cache lines matter is relevant as running from flash might eat 260 - 360 cycles to fill a cache line while the processor can execute those 4 new instructions in 4 cycles with 4 cycles latency so it spend a lot of time starved