What I'm mostly curious about is why that wasn't c...
# mpw-one-silicon
t
What I'm mostly curious about is why that wasn't caught by the timing analysis ?
πŸ‘ 3
t
That is of course what we are delving into right now. The main issue was that Open RCX, the OpenROAD tool for extracting wire parasitics, was not available at the time that MPW-one was made. Magic extracts pretty detailed parasitics, but only in SPICE. I did figure out that I could do pretty accurate timing simulation with IRSIM, but it took me a long time to create the setup for sky130, and there were so many unknowns down that path that I would not be able to trust the results without correlating them against some other tools.
t
Interesting, I wouldn't have expected parasitics to cause hold violations.
Although I guess in the clock tree, unexpected unbalanced delays in the branches will cause skew ... and then ... hold violations. Is that what's going on here ?
t
The hold violations are all caused by clock skew. So the parasitics just make the clock skew worse.
πŸ‘Œ 1
The whole issue was caused by three things: (1) A bad algorithm for generating clock trees, (2) our inability to analyze timing with parasitics, and (3) our putting way too much trust in the tools.
πŸ‘ 4
t
I see. Thanks for the details !
g
out of curiosity was there a specific bug in the clock tree generation? or just general skew issues
t
It seems that the clock tree generator did not understand the concept of a balanced clock tree.
πŸ˜‚ 3
😒 2
g
ouch!
j
Seems like a pretty fundamental misunderstanding! Sheesh...
t
huh ... I mean, even without parasitics, I would have expected just the gate delay of an unbalanced tree to show up a pretty high skew during timing analysis. Or do you mean balanced in the sense of placement ? (TBH I don't remember if the clock tree is done after placement here to try and make it "spatially" good)
m
I had many issues with MPW two and trying to get a reasonable clock tree. And I was still skeptical
t
The openlane team is now working on adding methodology to the clock tree generation to use results of post-layout extracted timing to make adjustments to the timing of clocks at the leaf nodes via engineering change orders. That should be available for MPW-three.
πŸ‘ 1
m
@Matthew Guthaus could you comment further on what a reasonable clock tree is, and why you were skeptical?
m
@Matt Venn it was mostly dealing with multiple different clocks. For example, if you want to create your own clock in the user space. If you don't get it recognized by OpenLane then Yosys will buffer it using its high fanout buffering algorithm which is not intended for clocks.
Or in my case, I wanted to have multiple clocks depending on whether we were using the GPIO or LA to interface for testing
m
but when you saw the buffered clock tree - it looked ok? I'm just wondering if there was something we could have done to have caught this earlier
m
I didn't look at it, just the timing reports
Honestly, I didn't do as much checking of things given the time crunch πŸ˜•
m
all my projects passed timing, but I don't think that actually meant much with the missing values
t
What I find weird is that the parasitics made so much of a difference. Even without accounting for exact wire delays, there are default values for net delays and the clock buffers also have delays in the library and those should have already highlighted a fairly high skew in the clock.
Looking at my own project timing report it seems it didn't account at all for the clock tree.
0.00    0.00   clock network delay (ideal)
πŸ‘€ 1
I guess I should have spotted that πŸ˜•
m
Yeah, mine actually had ideal clock as well. 0.00 30.00 30.00 clock io_in[17] (rise edge) 0.00 30.00 clock network delay (ideal) (this was using a GPIO as the clock)
This is probably not good news for mine since I have a really big scan chain and hold times can very easily break that.
m
I'll check my timing reports and then re-run with the mpw3 tools, then we should at least get some info about whether that issue is fixed in mpw-3 tagged tools
image.png
./reports/synthesis/opensta_spef.min_max.rpt
that was MPW1
MPW3 it's looking like the same for synth timing, and after extraction the clock tree isn't found
so no report
@Matthew Guthaus even if it had worked you wouldn't have been able to set the io up as an input because of the caravel cts issues
m
Why is that? I spent time with some TCL code to do this
m
the only way to setup the GPIO is to use caravel's picorv32 to load some firmware to setup the IOs. But that doesn't work
m
Oh, I'm not even commenting on that.
Yes, of course.
(Mine was on MPW two as well)
m
RE STA timing on extracted CTS of the design I'm just checking
I can see CTS is being made
image.png
then in step 25 when it runs opensta again on the routed design
image.png
no paths found
hmm
although it does print this in the logs
image.png
but I don't get this as a report. Either way, clock delay is still ideal
m
I think that is reporting violating paths which aren't found
m
ok ok.
but I think from this we can conclude that mpw3 tools will have the same inability to detect clock tree issues
m
yep
m
would be good if someone else could confirm.
Also, if someone could check these instructions for how to check: https://docs.google.com/document/d/1HENZGBncHYqhnuAxl9aIT4QG0rPr6-tEinpkSimsIOI/edit?usp=sharing
t
@Matt Venn Maybe try adding
set_clock_skew -propagated
to the
sta.tcl
script
m
I'm also interested to hear back from @Tim Edwards about how he actually tested the design and saw hold violations. It would be good to repeat that test
image.png
just looking at https://github.com/The-OpenROAD-Project/OpenLane/tree/master/scripts, the sta.tcl isn't there anymore
t
Err,
set_propagated_clock wb_clk_i
maybe.
It's in
openroad/or_sta.tcl
I think
since more stuff got moved over to the openroad version rather than independent one IIUC.
m
image.png
so I will need to update my docker image I guess
t
Heh, I wasn't far πŸ˜…
πŸ™Œ 1
m
I just added to my script here and see what happens
same result, but probably I am missing stuff without updating openlane image
t
Got it to generate for mpw-1 with propagated clocks.
Copy code
Startpoint: _19752_ (rising edge-triggered flip-flop clocked by wb_clk_i)
Endpoint: _19222_ (rising edge-triggered flip-flop clocked by wb_clk_i)
Path Group: wb_clk_i
Path Type: min

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock wb_clk_i (rise edge)
   1.33    1.33   clock network delay (propagated)
   0.00    1.33 ^ _19752_/CLK (sky130_fd_sc_hd__dfxtp_4)
   0.25    1.58 v _19752_/Q (sky130_fd_sc_hd__dfxtp_4)
   0.06    1.64 ^ _11426_/Y (sky130_fd_sc_hd__nor2_4)
   0.09    1.73 ^ _11427_/X (sky130_fd_sc_hd__a211o_4)
   0.02    1.75 v _11428_/Y (sky130_fd_sc_hd__inv_2)
   0.00    1.75 v _19222_/D (sky130_fd_sc_hd__dfxtp_4)
           1.75   data arrival time

   0.00    0.00   clock wb_clk_i (rise edge)
   4.28    4.28   clock network delay (propagated)
  -0.13    4.14   clock reconvergence pessimism
           4.14 ^ _19222_/CLK (sky130_fd_sc_hd__dfxtp_4)
  -0.01    4.14   library hold time
           4.14   data required time
---------------------------------------------------------
           4.14   data required time
          -1.75   data arrival time
---------------------------------------------------------
          -2.39   slack (VIOLATED)
m
great!
how did you do it?
t
Added
set_propagated_clock [all_clocks]
to the SDC file.
Here I manually re-ran
sta
executing each command from
sta.tcl
"by hand" replacing the env var appropriately from the logs so I could execute it on the "run" I had from a year ago.
πŸ™Œ 1
m
would you mind adding that to the gdoc?
t
But if you modify
scripts/base.sdc
and re-run that should work.
And just add the
set_propagated_clock
after the
create_clock
m
what is the sdc file?
t
It's the file describing the clock constaints.
m
I only set the clock constraints in the config.tcl
t
the
base.sdc
file is in openlane itself.
scripts/base.sdc
βœ… 1
(in mpw-1 at least ...)
What's interesting is that although I get like 3ns clock skew, the number of clock buffers to go from
wb_clk_i
to both clock inputs of the FF of the hold violation is the same. 13 buffers in each path.
Copy code
% report_clock_skew
Clock wb_clk_i
Latency      CRPR       Skew
_19103_/CLK ^
   4.17
_19106_/CLK ^
   1.05     -0.13       2.99
That seems to report the minimum and maximum propagation.
m
image.png
openlane still finishes and doesn't report any error
t
openlane doesn't stop on timing error AFAIR
Also using
-format full_clock_expanded
on the
report_checks
command adds the details of the clock path which is nice.
t
@Matt Venn: I am still reading through this long thread, but to answer one of your questions: We detected and measured hold violations in two different ways. The first was using IRSIM, which needs a Tcl script setup to get working which I put in the caravel repository under the irsim/ directory; and the second was using OpenROAD's newer tool Open RCX to get parasitic extraction for the mgmt_core module, and then plugging those values back into gate-level simulation. But iverilog doesn't do delay annotation, so we used another verilog simulator called "cvc" that does, and then I had to write a script change to open_pdks that will put the "specify" section back into the verilog for all of the standard cells.
m
Cool, lots to learn here. Have you seen we can also see violations by adding the extra line to the openlane scripts? But I don't really know what I'm doing here.
t
I am not familiar with the SDC script commands for setting up the timing constraints, but you need to both propagate the clock, and to tell OpenSTA to output all violations (or 1000, or whatever; the default seems to be 1, which isn't very helpful).
t
@Tim Edwards yeah, something else I noticed is that if you read min/max timing liberty file, it seems to use those in the same path. so the source clock delay is made with one corner and destination clock delay with another corner which sounds a bit pessimistic to me.
t
@tnt : Not sure about that. I know that generally, setup violations will be calculated at slow/low/hot while hold violations are calculated at fast/high/cold. If two different corners are used for the same calculation, though, I'd agree that's needlessly pessimistic.
m
@tnt I was able to analyze my design using propagated clocks, but interestingly there was a problem because inputs/outputs seem to always use "ideal" clocks even when you propagate clocks.
Copy code
Startpoint: io_in[15] (input port clocked by io_in[17])
Endpoint: _3004_ (rising edge-triggered flip-flop clocked by io_in[17])
Path Group: io_in[17]
Path Type: min

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                  0.00    0.00    0.00   clock io_in[17] (rise edge)
                          0.00    0.00   clock network delay (ideal)
                          0.00    0.00 v input external delay
                  0.00    0.00    0.00 v io_in[15] (in)
     2    0.00                           io_in[15] (net)
                  0.00    0.00    0.00 v hold564/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  0.04    0.35    0.36 v hold564/X (sky130_fd_sc_hd__dlygate4sd3_1)
     2    0.00                           net564 (net)
                  0.04    0.00    0.36 v hold4/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  0.04    0.37    0.72 v hold4/X (sky130_fd_sc_hd__dlygate4sd3_1)
     2    0.00                           net4 (net)
                  0.04    0.00    0.72 v hold563/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  1.40    1.35    2.07 v hold563/X (sky130_fd_sc_hd__dlygate4sd3_1)
     2    0.44                           net563 (net)
                  1.42    0.12    2.20 v _1317_/B (sky130_fd_sc_hd__and2b_2)
                  0.03    0.51    2.71 v _1317_/X (sky130_fd_sc_hd__and2b_2)
     2    0.00                           _0825_ (net)
                  0.03    0.00    2.71 v hold567/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  0.04    0.36    3.07 v hold567/X (sky130_fd_sc_hd__dlygate4sd3_1)
     2    0.00                           net567 (net)
                  0.04    0.00    3.07 v hold6/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  0.83    0.90    3.97 v hold6/X (sky130_fd_sc_hd__dlygate4sd3_1)
     4    0.26                           net6 (net)
                  0.83    0.00    3.97 v hold566/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  1.00    1.12    5.09 v hold566/X (sky130_fd_sc_hd__dlygate4sd3_1)
     6    0.32                           net566 (net)
                  1.30    0.43    5.53 v _1702_/A (sky130_fd_sc_hd__buf_1)
                  0.04    0.30    5.82 v _1702_/X (sky130_fd_sc_hd__buf_1)
     2    0.00                           _0946_ (net)
                  0.04    0.00    5.82 v hold19/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  0.65    0.74    6.56 v hold19/X (sky130_fd_sc_hd__dlygate4sd3_1)
    10    0.20                           net19 (net)
                  0.73    0.17    6.73 v _1703_/A (sky130_fd_sc_hd__buf_1)
                  0.45    0.53    7.26 v _1703_/X (sky130_fd_sc_hd__buf_1)
    10    0.11                           _0947_ (net)
                  0.45    0.01    7.27 v _1710_/A (sky130_fd_sc_hd__buf_1)
                  0.08    0.21    7.48 v _1710_/X (sky130_fd_sc_hd__buf_1)
    10    0.02                           _0949_ (net)
                  0.08    0.00    7.49 v _1712_/B (sky130_fd_sc_hd__and2_2)
                  0.03    0.14    7.62 v _1712_/X (sky130_fd_sc_hd__and2_2)
     2    0.00                           _0546_ (net)
                  0.03    0.00    7.62 v _3004_/D (sky130_fd_sc_hd__dfxtp_2)
                                  7.62   data arrival time

                  0.00    0.00    0.00   clock io_in[17] (rise edge)
                         19.86   19.86   clock network delay (propagated)
                          0.00   19.86   clock reconvergence pessimism
                                 19.86 ^ _3004_/CLK (sky130_fd_sc_hd__dfxtp_2)
                         -0.01   19.84   library hold time
                                 19.84   data required time
-----------------------------------------------------------------------------
                                 19.84   data required time
                                 -7.62   data arrival time
-----------------------------------------------------------------------------
                                -12.22   slack (VIOLATED)
(Yes, I'm using a GPIO pin as the clock for my design for this test mode)
t
@Matthew Guthaus: I assume it uses an ideal clock because it does not have information about the drive or load at the pin. You can specify default assumptions for such pins, although it would be better to have a liberty file definition for the GPIO. I assume that the pin input comes from the gpio_control_block cell, so you could get pretty reasonable assumptions about the slew of the signal, at least.
m
@Tim Edwards I understand what your are saying but I think it's a little more general of a problem. The clock gpio in has a big clock network delay compared to the data gpio in. Since the data input is "clocked" by a virtual clock signal without actual implementation (since it is external to my project and off chip). I'm just not sure how to model it in STA. Maybe I should just remove that input delay so it only analyzes flop to flop timing..
@tnt clock reconvergence pessimism removal should get rid of the pessimism from two corners by removing the shared part. This likely needs to be enabled, however.
t
@Matthew Guthaus: All of the GPIO pads share the same circuitry, so the only significant difference between timing of signals on external pins would be due to the finite drive strength of the gate. In this case, that gate would be the
sky130_fd_sc_hd__buf_2
connected to
user_gpio_in
in the cell
gpio_control_block
. I don't know offhand how to set up the STA configuration to tell it to assume the drive strength of a
buf_2
cell on all inputs, though.
Ideally, I guess, there should be a liberty file for the caravel padframe that specifies all the timing between the chip pads and the internal connections.
m
This isn't an issue with the GPIO models. I'm modeling user_project_wrapper in isolation. The issue is that the external clock is ideal but the internal clock is not. This results in a lot of clock skew.
If I just exclude set_input_delay it will not consider input timing paths.
t
When I wrote vesta for qflow, I just had it report input-pin-to-flop and flop-to-output-pin timing values and let the end-user decide whether or not they're acceptable. There's no way to automatically establish timing between externally-applied signals automatically, but I would assume that for OpenSTA there's some way to do that in the SDC file. Otherwise, yes, you either have to tell it to ignore input timing paths or get very specific and detailed about which paths are false timing paths.