@Matt Venn: My takeaway from doing the problematic serial shift register for configuring the GPIO on caravel is not to trust the tools to solve any timing issues outside of the lowest level that you synthesize. My first solution for the GPIO serial register was to run the clock in and out of each individual block, so that the clock signal was always local, and should always have a similar relationship to the data coming out of each block. My final solution for the GPIO serial register---which I only implemented recently---is to re-clock the last data output on the falling edge of the clock. That makes it very, very hard to have a hold violation, because the hold violation amount would have to be half a clock cycle. That way, if you have any violation, setup or hold, between blocks, then you can resolve it just by slowing down the clock (this is a limited and cumbersome implementation of two-phase-non-overlapping clocking, which is the ultimate way to avoid all timing issues if you don't care about performance). But: Note that the problems with the GPIO serial loader stem mostly from the fact that the GPIO cells are synthesized, but are connected together with an assembly process that doesn't have timing information for the GPIO cells because we don't have a tool that creates liberty timing files for a synthesized macro block. If you are synthesizing the infrastructure for the TinyTapeout in one go, then you can probably rely on the synthesis tools to handle the timing between all the scan flops, using the dual clock specification that @tnt showed above. The pertinent question here is whether each user project has a clock, and how that clock relates to the scan clock---it looks like you are doing a 3-step process by (1) scanning in input data, (2) clocking the user DUT, and (3) scanning out output data. In that case, the clock driving the user projects needs only to not overlap the scan clock.