< htamas> A relatively brief explanation of hold violations open-source-silicon.dev #mpw-3-silicon

<@U02G1ME9XCL>: A relatively brief explanation of...

Tim Edwards

07/20/2023, 9:11 PM

@htamas: A relatively brief explanation of hold violations and what I mean by "independent" vs. "dependent" hold violations: A typical simple digital design runs on a single clock, and that clock is distributed all around the design layout, and the clock can be assumed in the best case to arrive everywhere at the same time. Generally speaking, hold violations tend to occur where that assumption fails and clocks get skewed relative to each other at various points around the system. In the typical hold violation case, two registers are in series so that the output of one register is captured at the other. The registers are supposed to be clocked at the same time, and any change in the output of the first register will be ready to be latched into the 2nd register on the following clock cycle. However, if there is a large enough delay between the arrival of the clock at the 1st register vs. the arrival of the clock at the 2nd register, then a change in the output of the 1st register can be out the door and ready at the input of the 2nd register before the clock edge from the same clock cycle arrives there, and the wrong value gets latched into the 2nd register. You can find lots of descriptions of this kind of hold violation from a quick web search. What you usually won't find is the following detail: Delays between cells are calculated differently for rising edges and falling edges. Because rising and falling edges are generated by different transistors, they can have very different characteristics, leading to rather different delays. Since a hold violation requires that clock arrival times get skewed by some delay amount, it is quite possible for a hold violation to occur on a rising data edge and not to occur on a falling data edge, or vice versa. That means that whether or not a hold violation occurs is data dependent, so I call that effect a "dependent hold violation". If the hold violation happens regardless of whether the data edge is rising or falling, I call that an "independent hold violation". What is remarkable (and annoying) to me is that the hold violations between all of the GPIO blocks managed to end up right at the edge between being an independent or a dependent hold violation, such that every single GPIO channel (except 0 and 37, which are on the ends) of every single chip has about a 50/50 random chance of being one or the other, and every single chip is different, and there's no way to predict it; it can only be determined by brute-force testing.

👍 2

htamas

07/21/2023, 1:45 AM

@Tim Edwards Ok, it's getting clearer. I understand hold violations in general as I had to work with them while hardening my designs. The tricky part is the dependency. Based on your explanation it would seem that there are independent violations plus two different kinds of dependent violations, one for rising edges and one for falling edges. Do I get it right that only one of the two occurs in practice and a dependent violation always means, say, a hold violation for falling edges? How does the Nucleo firmware work around independent and dependent hold violations? I can see writing some magic numbers into

reg_mprj_xfer

with appropriate delays, but I don't feel any wiser. Also, what happens at the chip level when the characterization returns

H_UNKNOWN

Matt Venn

07/21/2023, 1:38 PM

Great reply Tim, think I finally understand it now!

Tim Edwards

07/21/2023, 1:48 PM

@htamas: On every chip I have tested, there are only two conditions that occur between any pair of GPIO configuration blocks, which you correctly identified as: (1) an independent hold violation, and (2) a dependent hold violation on falling edges. I have never observed the other two (no violation, or dependent hold on rising edges). That I only see one kind of dependent hold violation makes sense, since the hold violation occurs in the equivalent spot for every GPIO configuration block, so the digital circuits generating the data and clock signals are the same cell type, and while the absolute delays may vary by wiring path, the relative delays between rising and falling edges will be the same for every instance. The violations occur in two long shift registers running up the sides of the chip to access all the GPIO pads. The shift registers work by holding the configuration values in the housekeeping block inside the processor, then copying them to a position near the GPIO pin using the shift register. So the GPIO configuration is two long serial bitstreams of (13 * 19 = 247) bit each, 13 bits per GPIO times 19 GPIOs per side. When an independent hold violation occurs, the bits "slip" forward by one position in the chain. Effectively, that means that the first of each set of 13 configuration bits gets lost. To compensate for that, the initial configurations can be shifted so that the other 12 configuration bits end up in the right position. The words written to the configuration hold registers in the housekeeping module are then shifted versions of what was supposed to be in those registers, and the data cross over into neighboring configuration registers. When a dependent hold violation occurs, the bits slip forward by one position, but only if there is a 1->0 transition. Otherwise it works as intended. The net effect of that is that when the bit stream passes through the point where there is a dependent hold violation, the last "1" bit in any run of 1s gets stripped off. So if you start with the bit stream "11100", after it goes through one dependent hold violation it becomes "11000", and after it goes through two dependent hold violations, it becomes "10000". That condition can be compensated for exactly by adding additional "1" bits to then end of a run of 1s for every dependent hold violation that bit will pass through in the shift register before it reaches its final destination. So there are two issues that crop up: (1) For every independent hold violation, a bit gets lost: The high bit of one configuration register (which determines the output mode) is always equal to the low bit of the next configuration register (which determines if the GPIO is management-controlled or user project-controlled). The trick here is figuring out how each GPIO can be configured so that those two bits can be the same value. Sometimes that forces an output to be configured in a weak pull-up or weak pull-down mode, but otherwise it is usually possible to find a set of configurations that works. (2) The dependent hold violations can be corrected for exactly only up to a point---if you have to keep adding "1" bits, then you eventually run out of bits that you can add. If you have a run "10001", then you can correct for two dependent hold violations by making the run "11101", but correcting for three dependent hold violations is impossible because the bitstream would be "11111", and suddenly there is no longer a falling edge transition to be a hold violation. Fortunately, there are plenty of valid configurations that have long runs of zeros, but if the chip has more than, I think, ten dependent hold violations on either side, then it becomes impossible to correct the ones at the end. Not only do you lose the ability to correct the GPIOs at the end of the chain, but you lose the ability to figure out what the hold violations are past that point, so those GPIOs become unusable. That's when GPIOs get marked "H_UNKNOWN". If your user project is making use of a pin that got marked "H_UNKNOWN", then that chip is going to be untestable.

Tim Edwards

07/21/2023, 1:53 PM

@htamas: FYI, what I described above is to adjust the initial configuration to compensate for the errors. The code we sent uses another, even more awkward mechanism, which is to use the bit-banging method. Instead of writing values to the configuration, the software drives the shift register directly through the

reg_mprj_xfer

. I don't like that method at all; it's slow and opaque. I wrote a routine that uses the other method and reduces the calibration time from around 20 minutes to less than 1 second, but nobody seems to be interested in it.

Philipp Gühring

07/21/2023, 3:28 PM

Not quite true, I have been switching from the 20 minutes to the 1 second version, and I was very happy about the speedup! And I also explained it to Yatharth, so there are at least 2 happy users of the new method :-)

Tim Edwards

07/21/2023, 3:32 PM

@Philipp Gühring: Oh, that's nice to know, thanks!

Philipp Gühring

07/21/2023, 4:04 PM

Can these hold-violations be simulated and visualized in Waveforms? Are there tools to do that? Opensource?

htamas

07/21/2023, 6:10 PM

@Tim Edwards Thanks for the thorough explanation, I think now I understand what happens in the shift register and the theory behind the characterization and the workaround. However, based on this theory I would expect to always see the same fixed number of H_DEPENDENTs in a chain before everything turns into H_UNKNOWNs, but in practice I can see sometimes 3, other times 4 or 5 H_DEPENDENTs before the H_UNKNOWNs appear and in some cases up to 8 H_DEPENDENTs without any H_UNKNOWNs in the chain. Does that sound like a bug or am I missing something here? By the way, this is the "official" characterization, not your improved one. Also, sometimes the two versions of the characterization give incompatible answers. For instance, running them on chip #2 from my MPW-3 D7 slot at 1.6V I get: "official" characterization low chain - dependent: 3, 4, 5, unknown: 10-18 high chain - dependent: 33, 30, 27, 26 fast characterization low chain - dependent: 3, 4, 5, 6, 9, 10, 13, 18 high chain - dependent: 33, 30, 27, 26, 25, 22

Tim Edwards

07/21/2023, 6:51 PM

@htamas: Oh, there is another condition other than too many dependent hold violations that can cause an H_UNKNOWN state, and that is if the hold violation is so close to the edge between working and not working that it's basically a toss-up which way it will go on any given clock. That makes the GPIOs untestable past that point. Some percentage of chips fall in that category and I have not tested enough parts to know what that percentage is. My "fast" characterization does not have the concept of an "unknown" state because it is using a single signal to convey the result (dependent or independent). I would need to come up with something a bit more sophisticated to communicate a result of two bits per pin. At least with the fast characterization it is convenient enough to run multiple times and see if the result changes at all between tests. If it persists in claiming that there are two extra dependent hold violations at the end of the high chain, it's worth testing which result is the correct one.

2 Views

Open in Slack

Previous Next