There were hold violations in two main places, and I think neither of them applies to you. Within a block synthesized entirely by Openlane, the timing is properly handled, which explains why the processor itself works fine and doesn't have any hold violations that I'm aware of. The issues arose because we put together the top level as a hierarchy above the processor, and all of the GPIO control blocks were instantiated in the top level. The synthesis tools had no way to alter the timing between the blocks, since it was just doing a top-level routing job, and the paths between the GPIO blocks were long and subject to a lot of parasitic coupling and delays. Then there was a second error which was a failure of the back-annotation of extracted delays into the hierarchy, so Openlane wasn't reporting the timing violations that it couldn't fix (the extraction was fine, and the delays had been calculated but didn't get applied to the post-layout STA). The other place we had errors was in the housekeeping unit, because the SCK input to the housekeeping was not specified as a clock, and so the tools didn't even try to analyze or correct the timing involving SCK (the fact that the housekeeping is set up to allow access from both the SPI and the wishbone meant that the clock setup was non-trivial).
Once we understood the errors in setup and methodology, we finally got the tools to report errors which matched the violations we have seen on silicon.
All of these issues existed in MPW-one, but the fiasco with the clock tree synthesis was so bad that it shadowed all the other errors.
There remain some unknowns related to any interface between blocks at the top level, so that includes the wishbone interface between the processor and the user projects. That, by contrast to the other problems, was being mitigated by margining the timing at the interface in the setup. Whether that margining was enough or not remains to be seen; we'll have to get feedback from designers who made use of the wishbone interface.