So in the clock constraints above, what I ended up...
# timing-closure
t
So in the clock constraints above, what I ended up doing was force the tool to accept negative hold times at the input and generate positive hold times at the output. To do that, it add delays (and it does, I see
dly
blocks being added in input/output path). Of course there is still no guarantee but there is 0.4 ns of margin, so if the nets between blocks ( clk_out / data_out of prev block to clk_in / data_in of the next ) have less skew between them than that, then it will work.