What's interesting is that AFAICT from my mpw-1 de...
# mpw-one-silicon
t
What's interesting is that AFAICT from my mpw-1 design (1) the tree is "balanced" in the sense there is the same number of buffers in each branch of the hold violations. However the different parasitics of each branch made it unbalanced delay wise ... and (2) there was some SPEF extraction so at least "some" parasitics were extracted from the design.
For instance :
Copy code
Startpoint: _19752_ (rising edge-triggered flip-flop clocked by wb_clk_i)
Endpoint: _19222_ (rising edge-triggered flip-flop clocked by wb_clk_i)
Path Group: wb_clk_i
Path Type: min

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock wb_clk_i (rise edge)
   0.00    0.00   clock source latency
   0.01    0.01 ^ wb_clk_i (in)
   0.08    0.09 ^ clkbuf_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_16)
   0.11    0.19 ^ clkbuf_1_0_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.09    0.28 ^ clkbuf_1_0_1_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.08    0.36 ^ clkbuf_1_0_2_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.10    0.46 ^ clkbuf_1_0_3_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.20    0.66 ^ clkbuf_2_0_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.09    0.75 ^ clkbuf_3_1_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.10    0.85 ^ clkbuf_3_1_1_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.13    0.99 ^ clkbuf_4_3_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.09    1.08 ^ clkbuf_5_7_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.09    1.17 ^ clkbuf_6_15_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.16    1.33 ^ clkbuf_7_31_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.00    1.33 ^ _19752_/CLK (sky130_fd_sc_hd__dfxtp_4)
   0.25    1.58 v _19752_/Q (sky130_fd_sc_hd__dfxtp_4)
   0.06    1.64 ^ _11426_/Y (sky130_fd_sc_hd__nor2_4)
   0.09    1.73 ^ _11427_/X (sky130_fd_sc_hd__a211o_4)
   0.02    1.75 v _11428_/Y (sky130_fd_sc_hd__inv_2)
   0.00    1.75 v _19222_/D (sky130_fd_sc_hd__dfxtp_4)
           1.75   data arrival time

   0.00    0.00   clock wb_clk_i (rise edge)
   0.00    0.00   clock source latency
   0.02    0.02 ^ wb_clk_i (in)
   0.20    0.22 ^ clkbuf_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_16)
   0.28    0.50 ^ clkbuf_1_1_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.27    0.76 ^ clkbuf_1_1_1_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.24    1.00 ^ clkbuf_1_1_2_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.24    1.24 ^ clkbuf_1_1_3_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.49    1.73 ^ clkbuf_2_3_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.35    2.08 ^ clkbuf_3_7_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.35    2.43 ^ clkbuf_3_7_1_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.40    2.83 ^ clkbuf_4_14_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.30    3.13 ^ clkbuf_5_28_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.24    3.37 ^ clkbuf_6_56_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.90    4.27 ^ clkbuf_7_112_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_1)
   0.00    4.28 ^ _19222_/CLK (sky130_fd_sc_hd__dfxtp_4)
  -0.13    4.14   clock reconvergence pessimism
  -0.01    4.14   library hold time
           4.14   data required time
---------------------------------------------------------
           4.14   data required time
          -1.75   data arrival time
---------------------------------------------------------
          -2.39   slack (VIOLATED)
Mmm .. digging a little more into this, the above might be unecessarely pessimistic.
Because it uses minimum timing for one path and max timing for the other, but realistically a part of the chip can't be at 100C while the other is at negative 40C ...
Running it again with only using typicall timing is ... well still a violation but less so ( 0.4 ns )
m
@tnt what are the .libs you are using in your design?
t
This was
sky130_fd_sc_hd__ff_n40C_1v95.lib
/
sky130_fd_sc_hd__ss_100C_1v60.lib
(note that I'm runnign this on my mpw-1 design using the mpw-1 tools, more as an academic exercice of "what should I have noticed a year ago and missed ...)
m
understood and thanks. but you used those 2 corners for 2 different libs? they can't be used for one lib? or am I misunderstanding something
t
The default sdc doesboth a
read_liberty -min
and
read_liberty -max
to load both corners. And then it uses the slow one for setup analysis and the fast one for hold analysis. So far makes sense. However it seems that when doing clock network propagation it uses different corners for the source and destination clocks which sounds needlessly pessimistic.
m
ahh .. can you point me where you see that please!
t
Look in the report I posted above.
the first segment of the source and clock path delay is the same segment but is reported with different delays :
Copy code
0.02    0.02 ^ wb_clk_i (in)
   0.20    0.22 ^ clkbuf_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_16)
Copy code
0.01    0.01 ^ wb_clk_i (in)
   0.08    0.09 ^ clkbuf_0_wb_clk_i/X (sky130_fd_sc_hd__clkbuf_16)
m
Hmm are you posting two different timing reports? I am actually not following (sorry). If you are just reporting timing (STA) then you ll get a pessimistic timing with 100C and optimistic with 40C (nominal voltage). I think I missed something in your description (I have read it twice though)
This is a single report entry. But there is the clock tree path delay for the source FF and the destination FF. And the beginning of those path have common segments.
m
Thanks! So you are saying in the same report the reported values are different? And the tool is mixing up both? Have you tried this with a recent version of OR? Can I have access to your testcase?
One issue I noticed during my own experiments is that some of the paths weren't correctly annotated and were missed by timing analysis. Those paths weren't critical though. But this is based on OR from August.
m
@Tom Spyrou there is confusion about how corners are handled in sta - I think you could help clarify how multi-corner analysis should be done. Perhaps we even need to add to the docs
m
@tnt using different .libs for the path that they have in common would be directly wrong more than pessimistic, right? (I'm new to all this)
m
STA will handle the common clock portion correctly - the analysis is called clock reconvergence pessimism removal (crpr).
The key is to differentiate on-die variation (variation across a single die) from multi-corner analysis.
m
thanks! I was missing that info about crpr.
t
@Matt Liberty Yes, I saw the CRPR in OpenSTA, but even for the non-common part it's using the different min/max libs previously loaded. And as you say, I could see how this is useful if you load min/max libs representing on-die variations, but do we have those ? AFAICT the only ones we have are timings for the various corners and using those is overly pessimistic (that's what I was pointing out). I think this also means the various reports should be done in several OpenSTA calls and not in the same script since you need to load different libs. (unless re-issuing a
read_liberty -min
overrides the previous one ?).
m
Yes there is an on-going discussion about getting OL to properly do multi-corner analysis. The current setup is treating corners as on-die variation which is wrong and gives too much pessimism.
2