how consistent and predictable is the propagation ...
# general
a
how consistent and predictable is the propagation delay of a signal in a chip design? can you use distance-induced delays to create pseudo-sequential logic within a single clock cycle? (possibly avoiding the need for a higher clock speed in some use-cases)
perhaps this is "possible" but not actually practical, because it would be too difficult to design?
a
The issue is the on chip difference. When you have one "best" case and "worst" case everything is easy. But on the chip the delay changes from 75% to 125%. Different parts of the chip may have different delays => It would be hard to design a chip that utilizes distance induced delay based sequential logic. The befits are also qustionable. YOu gain like 5% less delay since you dont have setup and clk-to-q delays, but is it actually worth the effort?
a
the benefit i'm thinking of is on a process that simply can't handle higher clock speeds. is that not as much of a problem as i'm thinking, or does this not really help that at all?
and my understanding is that sequential logic (generally) saves area compared to equivalent combinatorial logic, since you can reuse gates
but if delay isn't predictable enough, this wouldn't be feasible at all, so that would be an answer
a
When you have a combinational logic that needs to span multiple clock cycle multicycle paths are used. In cases when this is not feasable then the aditional pipelining and retiming is used. "and my understanding is that sequential logic (generally) saves area compared to equivalent combinatorial logic, since you can reuse gates" This sentence does not make sense. "but if delay isn't predictable enough, this wouldn't be feasible at all, so that would be an answer"Its predictable, butn since ou need additional area to support the difference of the delays, then it becomes impractical.
a
This sentence does not make sense.
I have effectively no electronics design experience, so that's not surprising. Though perhaps I'm just not communicating this correctly.
a minimal example i can think of is a single-cycle multi-bit adder (more area), vs using a single-bit adder (less area) over and over (once per cycle) to do the same computation. forgive me if this still makes no sense.
but since you need additional area to support the difference of the delays, then it becomes impractical
if that's what would be required for the timing to work, then fair enough
a
Okay, let me try to explain why it does not make sense. "a minimal example i can think of is a single-cycle multi-bit adder (more area), vs using a single-bit adder (less area) over and over (once per cycle) to do the same computation." In that particular case it makes sense to have a single bit adder. But the question becomes that the flip flops needed to store the results are the biggest area contributor. The adder is pretty small in area. Another issue is the flip flop delay which stays constant becomes a major contributor to the clock. The reason you want a small combination logic is because they contribute to the clock frequency. More combinational depth => less frequency. When flip flops become major contributor to delay, that is good for perfomance BUT since you have more depth then you need more area. Everything is a tradeoff between power-perfomance-area (price). You need to pick your priorities and balance them out. Besides the points above, it is also rare to use multicycle paths, since CPU want longer pipelines AND the ability to cancel the mispredicted paths. You cant do it with multicycle paths, but it can be achieved with pipelines.
a
ok, thanks!
a
I forgot the key points. In example of the adder you have following options: 1. One cycle adder. If the adder is not in the critical path but the circuit needs to be simple for reader of the verilog code. Pretty frequent occasion. Medium area usage. 2. Multi cycle adder. Makes sense if you are calculating once in many cycles. Smallest possible delay, but not actually used since it is rare occasion. It may be the case that you use it, but for example in CPU it is not going to be used because in CPU it is likely to become the weak spot for the total perfomance. You want as many operations PER CYCLE as possible not the other way around. low area usage. 3. Multicycle path. Again, very useful, but rare occasion when you need to use it. Medium area usage but you dont need flip flops to store result. Need to be careful becase this is specified in different file SDC, therefore it maybe a risky design choice. The benefits are also questinable, because in long chains it becomes a big area contributor. 4. Pipelined adder. It accepts one pair of data inputs per cycle and spits out one result N cycles later for each cycle it accepted any input. Best perfomance, but big impact on area and the flip flops become the major critical path contributor. Most frequently used.
👍 1
s
It seems like this discussion is sort of leading to wave-pipelining. https://www.cs.princeton.edu/courses/archive/fall01/cs597a/wave.pdf This removes intermediate result registers and lets two signals propagate through combinational logic simultaneously. It can work, but is very difficult to get right due to process variation.
☝️ 1
a
wow, neat! this was published in 1998?
s
There always seems like a two paragraph section on it on digital logic textbooks. I think process scaling has made it unattractive for automated tools though.