@htamas: On every chip I have tested, there are only two conditions that occur between any pair of GPIO configuration blocks, which you correctly identified as: (1) an independent hold violation, and (2) a dependent hold violation on falling edges. I have never observed the other two (no violation, or dependent hold on rising edges). That I only see one kind of dependent hold violation makes sense, since the hold violation occurs in the equivalent spot for every GPIO configuration block, so the digital circuits generating the data and clock signals are the same cell type, and while the absolute delays may vary by wiring path, the relative delays between rising and falling edges will be the same for every instance.
The violations occur in two long shift registers running up the sides of the chip to access all the GPIO pads. The shift registers work by holding the configuration values in the housekeeping block inside the processor, then copying them to a position near the GPIO pin using the shift register. So the GPIO configuration is two long serial bitstreams of (13 * 19 = 247) bit each, 13 bits per GPIO times 19 GPIOs per side.
When an independent hold violation occurs, the bits "slip" forward by one position in the chain. Effectively, that means that the first of each set of 13 configuration bits gets lost. To compensate for that, the initial configurations can be shifted so that the other 12 configuration bits end up in the right position. The words written to the configuration hold registers in the housekeeping module are then shifted versions of what was supposed to be in those registers, and the data cross over into neighboring configuration registers.
When a dependent hold violation occurs, the bits slip forward by one position, but only if there is a 1->0 transition. Otherwise it works as intended. The net effect of that is that when the bit stream passes through the point where there is a dependent hold violation, the last "1" bit in any run of 1s gets stripped off. So if you start with the bit stream "11100", after it goes through one dependent hold violation it becomes "11000", and after it goes through two dependent hold violations, it becomes "10000". That condition can be compensated for exactly by adding additional "1" bits to then end of a run of 1s for every dependent hold violation that bit will pass through in the shift register before it reaches its final destination.
So there are two issues that crop up:
(1) For every independent hold violation, a bit gets lost: The high bit of one configuration register (which determines the output mode) is always equal to the low bit of the next configuration register (which determines if the GPIO is management-controlled or user project-controlled). The trick here is figuring out how each GPIO can be configured so that those two bits can be the same value. Sometimes that forces an output to be configured in a weak pull-up or weak pull-down mode, but otherwise it is usually possible to find a set of configurations that works.
(2) The dependent hold violations can be corrected for exactly only up to a point---if you have to keep adding "1" bits, then you eventually run out of bits that you can add. If you have a run "10001", then you can correct for two dependent hold violations by making the run "11101", but correcting for three dependent hold violations is impossible because the bitstream would be "11111", and suddenly there is no longer a falling edge transition to be a hold violation. Fortunately, there are plenty of valid configurations that have long runs of zeros, but if the chip has more than, I think, ten dependent hold violations on either side, then it becomes impossible to correct the ones at the end. Not only do you lose the ability to correct the GPIOs at the end of the chain, but you lose the ability to figure out what the hold violations are past that point, so those GPIOs become unusable. That's when GPIOs get marked "H_UNKNOWN". If your user project is making use of a pin that got marked "H_UNKNOWN", then that chip is going to be untestable.