<@U018LA3KZCJ> <@U01AX8X5N58> I'm trying to resolv...
# design-review
a
@User @User I'm trying to resolve hold violations I see when doing hierarchical STA on my entire design. All openlane runs including the top level integration pass STA without any hold violations, the issue is the interface between the top level and the macros.
Here's an example:
Copy code
Startpoint: _134694_ (rising edge-triggered flip-flop clocked by user_clock2)
Endpoint: microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/_603_
          (rising edge-triggered flip-flop clocked by user_clock2)
Path Group: user_clock2
Path Type: min
Corner: tt

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock user_clock2 (rise edge)
                          0.00    0.00   clock source latency
                  0.47    0.34    0.34 ^ user_clock2 (in)
     1    0.10                           user_clock2 (net)
                  0.49    0.00    0.34 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
                  0.40    0.26    0.60 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
     1    0.39                           net587 (net)
                  0.69    0.28    0.88 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.37    0.42    1.30 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     4    0.37                           clknet_0_user_clock2 (net)
                  0.47    0.14    1.44 ^ clkbuf_2_0_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.28    0.37    1.82 ^ clkbuf_2_0_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     2    0.16                           clknet_2_0_0_user_clock2 (net)
                  0.28    0.03    1.84 ^ clkbuf_3_1_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.76    0.55    2.39 ^ clkbuf_3_1_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     4    0.45                           clknet_3_1_0_user_clock2 (net)
                  0.80    0.12    2.52 ^ clkbuf_5_4__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.44    0.53    3.04 ^ clkbuf_5_4__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    19    0.44                           clknet_5_4__leaf_user_clock2 (net)
                  0.45    0.05    3.09 ^ clkbuf_leaf_6_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.11    0.28    3.37 ^ clkbuf_leaf_6_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    25    0.09                           clknet_leaf_6_user_clock2 (net)
                  0.12    0.00    3.37 ^ _134694_/CLK (sky130_fd_sc_hd__dfxtp_2)
                  0.11    0.39    3.76 v _134694_/Q (sky130_fd_sc_hd__dfxtp_2)
     1    0.04                           microwatt_0.soc0.processor.execute1_0.multiply_0._00_[116] (net)
                  0.11    0.00    3.76 v microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/input111/A (sky130_fd_sc_hd__clkbuf_1)
                  0.03    0.11    3.87 v microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/input111/X (sky130_fd_sc_hd__clkbuf_1)
     1    0.00                           microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/net111 (net)
                  0.03    0.00    3.87 v microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/_603_/D (sky130_fd_sc_hd__dfxtp_4)
                                  3.87   data arrival time

                          0.00    0.00   clock user_clock2 (rise edge)
                          0.00    0.00   clock source latency
                  0.47    0.38    0.38 ^ user_clock2 (in)
     1    0.10                           user_clock2 (net)
                  0.49    0.00    0.38 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
                  0.40    0.29    0.66 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
     1    0.39                           net587 (net)
                  0.69    0.31    0.97 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.37    0.47    1.44 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     4    0.37                           clknet_0_user_clock2 (net)
                  0.47    0.16    1.59 ^ clkbuf_2_0_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.28    0.41    2.01 ^ clkbuf_2_0_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     2    0.16                           clknet_2_0_0_user_clock2 (net)
                  0.28    0.03    2.04 ^ clkbuf_3_1_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.76    0.61    2.64 ^ clkbuf_3_1_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     4    0.45                           clknet_3_1_0_user_clock2 (net)
                  0.80    0.14    2.78 ^ clkbuf_5_4__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.44    0.58    3.36 ^ clkbuf_5_4__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    19    0.44                           clknet_5_4__leaf_user_clock2 (net)
                  0.45    0.06    3.42 ^ clkbuf_leaf_35_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.13    0.32    3.74 ^ clkbuf_leaf_35_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    25    0.11                           clknet_leaf_35_user_clock2 (net)
                  0.13    0.00    3.74 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_0_clk/A (sky130_fd_sc_hd__clkbuf_16)
                  0.05    0.17    3.91 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_0_clk/X (sky130_fd_sc_hd__clkbuf_16)
     2    0.02                           microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clknet_0_clk (net)
                  0.05    0.00    3.91 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_1_1_0_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  0.05    0.12    4.02 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_1_1_0_clk/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clknet_1_1_0_clk (net)
                  0.05    0.00    4.02 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_1_1_1_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  0.18    0.21    4.23 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_1_1_1_clk/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.03                           microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clknet_1_1_1_clk (net)
                  0.18    0.00    4.23 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_2_3_0_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  0.20    0.26    4.50 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_2_3_0_clk/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.04                           microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clknet_2_3_0_clk (net)
                  0.20    0.00    4.50 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_3_7_0_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  0.96    0.81    5.31 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_3_7_0_clk/X (sky130_fd_sc_hd__clkbuf_2)
    14    0.18                           microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clknet_3_7_0_clk (net)
                  0.96    0.02    5.34 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_leaf_48_clk/A (sky130_fd_sc_hd__clkbuf_16)
                  0.08    0.32    5.66 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clkbuf_leaf_48_clk/X (sky130_fd_sc_hd__clkbuf_16)
    12    0.03                           microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clknet_leaf_48_clk (net)
                  0.08    0.00    5.66 ^ microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/_603_/CLK (sky130_fd_sc_hd__dfxtp_4)
                          0.25    5.91   clock uncertainty
                         -0.32    5.59   clock reconvergence pessimism
                         -0.03    5.56   library hold time
                                  5.56   data required time
-----------------------------------------------------------------------------
                                  5.56   data required time
                                 -3.87   data arrival time
-----------------------------------------------------------------------------
                                 -1.68   slack (VIOLATED)
The problem appears to be that the data delay to the macro FF is much shorter than the clock tree delay within the macro
m
What is the difference between "doing hierarchical STA" and "top level integration" ?
a
top level integration means I ran openlane to create my final design. That passed STA
But we have no information about the macro, since there is no liberty file generation yet
m
What was the macro timing modeling in the integration run?
so it was just a verilog module with no guts and no timing model?
a
That is where I might be going wrong, what should I be doing?
The macro inputs to openlane are a verilog file and a def file
m
We don't have a timing model generator so there is no hierarchical timing flow. It sounds like you just have a hole in your design from STA's perspective
a
Any way to resolve it? I cant add a SDC timing constraint on a macro I/OI can I?
m
@User will know better but I guess set_min_delay might work?
or make a very simple .lib by hand for the block
I'm glad at least you caught the problem
๐Ÿ‘ 1
a
efabless appear to do something in their SDC file, investigating: https://github.com/efabless/caravel_mgmt_soc_litex/blob/main/openlane/mgmt_core/base.sdc#L13
But I think they are hooking up the DFFRAM via external I/Os so they can constrain them via the SDC file
m
Tom (or Cherry) are the experts on SDC
a
Great, thanks @User
m
np, good luck
a
Thanks!
t
I think the best thing to do is ensure that the macro compilation is done with constraints that see the clock skew. In the macro openroad run, does the sdc constraint the inputs? If so, the set input delay min constraint or the clock uncertainty can be tightened to add buffers in the macro run. Another option is to run openroad repair design flat at the end of the flow to fix these. This may need to be added as an openlane option in the future.
a
@User This is an issue with DFFRAM, which until recently ignored hold violations. @User and @User have been working on fixing it, but there are a number of configurations that are still broken. https://github.com/Cloud-V/DFFRAM/issues/139 has the details.
There is no concept of adding delays in DFFRAM, and it doesn't run the resizer at all to do timing repair. It sounds like DFFRAM needs to add some form of configurable delay to resolve the hold violations. I was trying to find a way around this issue, to resolve my MPW5 tape out issue. Assuming I can't easily modify DFFRAM (I tend to break the placer whenever I try modifying it ๐Ÿ™‚ ), then I need some other method to get 3-4ns of delay on the data path into the DFFRAM.
d
We can schedule a working session if you'd like, Anton
a
Thanks @User! Right now I am just trying to add a number of delay stages to
Di0
in
RAM512
. I've got my regex for those stages in
rx.yml
I'm just not sure of what to do next. I presume I'm modifying
HigherLevelPlaceable
?
d
You would be correct. I'm fairly tired today, so are you available earlier tomorrow? I can walk you through modifying it and heck, even get a little feedback on how it can be made easier
a
@User I might have some time tomorrow, otherwise Monday. I'm in Australia, where it is 6AM Friday right now so earlier in the day is unlikely :)
Thanks!
d
No problem! Let's shoot for Monday then, I'm up late then anyway because I've got an evening lecture
๐Ÿ‘ 1
t
Try putting a set max delay -min constraint -to all of the dffram data inputs in the sdc. That simulates what would happen if there were a li erty timing check.
a
@User am I allowed to specify SDC timing constraints on a macro? This is how the DFFRAM is being integrated:
Copy code
RAM512 memory_0 (
`ifdef USE_POWER_PINS
    .VPWR(vccd1),
    .VGND(vssd1),
`endif
    .A0(addr_buf),
    .CLK(clk),
    .Di0(din_buf),
    .Do0(_1_),
    .EN0(_2_),
    .WE0(sel_qual)
  );
t
Put it in the top sdc pointing to the instances pins from that view.
a
Thanks @User will try that now
t
Wildcards will help.
a
Thanks for the suggestion @User, this looks to have worked:
Copy code
+# Work around hold violations in DFFRAMs by forcing a delay on all inputs
+set ram512_delay 5
+set_min_delay -to microwatt_0.soc0.bram.bram0.ram_0.memory_0/EN0* $ram512_delay
+set_min_delay -to microwatt_0.soc0.bram.bram0.ram_0.memory_0/A0* $ram512_delay
+set_min_delay -to microwatt_0.soc0.bram.bram0.ram_0.memory_0/Di0* $ram512_delay
+set_min_delay -to microwatt_0.soc0.bram.bram0.ram_0.memory_0/WE0* $ram512_delay
+
+set ram32_delay 5
+set_min_delay -to microwatt_0.soc0.processor.dcache_0.rams:1.way.cache_ram_0/A0* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.dcache_0.rams:1.way.cache_ram_0/A1* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.dcache_0.rams:1.way.cache_ram_0/Di0* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.dcache_0.rams:1.way.cache_ram_0/EN0* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.dcache_0.rams:1.way.cache_ram_0/EN1* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.dcache_0.rams:1.way.cache_ram_0/WE0* $ram32_delay
+
+set_min_delay -to microwatt_0.soc0.processor.icache_0.rams:1.way.cache_ram_0/A0* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.icache_0.rams:1.way.cache_ram_0/A1* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.icache_0.rams:1.way.cache_ram_0/Di0* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.icache_0.rams:1.way.cache_ram_0/EN0* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.icache_0.rams:1.way.cache_ram_0/EN1* $ram32_delay
+set_min_delay -to microwatt_0.soc0.processor.icache_0.rams:1.way.cache_ram_0/WE0* $ram32_delay
@User I notice a timing issue in the multiplier macro. This macro shouldn't have any hold violations in it, so I'm a bit surprised I see them when doing my full STA:
Copy code
Startpoint: _130429_ (rising edge-triggered flip-flop clocked by user_clock2)
Endpoint: microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/_554_
          (rising edge-triggered flip-flop clocked by user_clock2)
Path Group: user_clock2
Path Type: min
Corner: tt

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock user_clock2 (rise edge)
                          0.00    0.00   clock source latency
                  0.50    0.36    0.36 ^ user_clock2 (in)
     1    0.11                           user_clock2 (net)
                  0.52    0.00    0.36 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
                  0.44    0.35    0.71 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
     1    0.44                           net637 (net)
                  0.74    0.29    1.00 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.33    0.44    1.43 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     4    0.32                           clknet_0_user_clock2 (net)
                  0.37    0.09    1.52 ^ clkbuf_2_2_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.47    0.47    2.00 ^ clkbuf_2_2_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     4    0.28                           clknet_2_2_0_user_clock2 (net)
                  0.47    0.03    2.03 ^ clkbuf_4_11_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.20    0.33    2.36 ^ clkbuf_4_11_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     2    0.11                           clknet_4_11_0_user_clock2 (net)
                  0.20    0.01    2.37 ^ clkbuf_5_22__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.16    0.25    2.62 ^ clkbuf_5_22__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     7    0.14                           clknet_5_22__leaf_user_clock2 (net)
                  0.16    0.01    2.63 ^ clkbuf_leaf_87_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.13    0.22    2.85 ^ clkbuf_leaf_87_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    25    0.11                           clknet_leaf_87_user_clock2 (net)
                  0.13    0.00    2.85 ^ _130429_/CLK (sky130_fd_sc_hd__dfxtp_1)
                  0.04    0.32    3.17 v _130429_/Q (sky130_fd_sc_hd__dfxtp_1)
     1    0.01                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0._00_[67] (net)
                  0.04    0.00    3.17 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/U_HOLD_FIX_BUF_0_21/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  0.08    0.56    3.74 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/U_HOLD_FIX_BUF_0_21/X (sky130_fd_sc_hd__dlygate4sd3_1)
     1    0.01                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/net_HOLD_NET_0_21 (net)
                  0.08    0.00    3.74 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/input87/A (sky130_fd_sc_hd__buf_2)
                  0.03    0.13    3.87 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/input87/X (sky130_fd_sc_hd__buf_2)
     1    0.00                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/net87 (net)
                  0.03    0.00    3.87 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/hold248/A (sky130_fd_sc_hd__dlygate4sd3_1)
                  0.11    0.58    4.45 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/hold248/X (sky130_fd_sc_hd__dlygate4sd3_1)
     1    0.02                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/net1178 (net)
                  0.11    0.00    4.45 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/hold249/A (sky130_fd_sc_hd__clkbuf_2)
                  0.11    0.19    4.64 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/hold249/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.03                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/net1177 (net)
                  0.11    0.00    4.64 v microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/_554_/D (sky130_fd_sc_hd__dfxtp_4)
                                  4.64   data arrival time

                          0.00    0.00   clock user_clock2 (rise edge)
                          0.00    0.00   clock source latency
                  0.50    0.40    0.40 ^ user_clock2 (in)
     1    0.11                           user_clock2 (net)
                  0.52    0.00    0.40 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
                  0.44    0.38    0.79 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
     1    0.44                           net637 (net)
                  0.74    0.32    1.10 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.33    0.48    1.58 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     4    0.32                           clknet_0_user_clock2 (net)
                  0.37    0.10    1.68 ^ clkbuf_2_2_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.47    0.52    2.21 ^ clkbuf_2_2_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     4    0.28                           clknet_2_2_0_user_clock2 (net)
                  0.47    0.03    2.24 ^ clkbuf_4_10_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_8)
                  0.29    0.43    2.67 ^ clkbuf_4_10_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_8)
     2    0.17                           clknet_4_10_0_user_clock2 (net)
                  0.30    0.03    2.70 ^ clkbuf_5_20__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.48    0.52    3.22 ^ clkbuf_5_20__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    16    0.48                           clknet_5_20__leaf_user_clock2 (net)
                  0.48    0.02    3.24 ^ clkbuf_leaf_82_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.14    0.33    3.57 ^ clkbuf_leaf_82_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    25    0.13                           clknet_leaf_82_user_clock2 (net)
                  0.14    0.00    3.57 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_0_clk/A (sky130_fd_sc_hd__clkbuf_16)
                  0.05    0.17    3.74 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_0_clk/X (sky130_fd_sc_hd__clkbuf_16)
     2    0.02                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clknet_0_clk (net)
                  0.05    0.00    3.75 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_1_1_0_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  0.05    0.12    3.86 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_1_1_0_clk/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clknet_1_1_0_clk (net)
                  0.05    0.00    3.86 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_1_1_1_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  0.17    0.20    4.06 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_1_1_1_clk/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.03                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clknet_1_1_1_clk (net)
                  0.17    0.00    4.07 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_2_2_0_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  0.19    0.26    4.33 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_2_2_0_clk/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.03                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clknet_2_2_0_clk (net)
                  0.19    0.00    4.33 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_3_5_0_clk/A (sky130_fd_sc_hd__clkbuf_2)
                  1.12    0.93    5.26 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_3_5_0_clk/X (sky130_fd_sc_hd__clkbuf_2)
    15    0.21                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clknet_3_5_0_clk (net)
                  1.12    0.01    5.26 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_leaf_73_clk/A (sky130_fd_sc_hd__clkbuf_16)
                  0.08    0.34    5.60 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clkbuf_leaf_73_clk/X (sky130_fd_sc_hd__clkbuf_16)
     8    0.03                           microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/clknet_leaf_73_clk (net)
                  0.08    0.00    5.60 ^ microwatt_0.soc0.processor.with_fpu.fpu_0.fpu_multiply_0.multiplier/_571_/CLK (sky130_fd_sc_hd__dfxtp_4)
                          0.25    5.85   clock uncertainty
                         -0.21    5.64   clock reconvergence pessimism
                         -0.05    5.59   library hold time
                                  5.59   data required time
-----------------------------------------------------------------------------
                                  5.59   data required time
                                 -4.64   data arrival time
-----------------------------------------------------------------------------
                                 -0.95   slack (VIOLATED)
Data changes at the macro at 3.17, but the clock is later at 3.57. My question is, why didn't the resizer add delay buffers to fix this? Do all macros integrated in the design need constraints in the SDC file? I guessed it would fix this for me without an explicit constraint (just like it would for a FF)
t
The resizer will fix it if when it is run the path is constrained. If the macro has no timing model at the top level, then the path is not constrained. Openlane could add a final resizer run in the signoff script, which is also commonly done in the proprietary flows.
a
@User Ahh! So until we have the ability to create liberty files in openroad, all macros will need explicit constraints at the top level?
t
Yes that is right. A workaround is also to run a final repair design after the design is flattened and just before signoff.
a
How does that work? Do you have to then go back to detailed placement, global/detailed routing? Or are the tools intelligent enough to do local fixups?
t
We can repair using incremental global routing leaving the need to redo detailed routing. This is already in use in openlane.
a
Makes sense, so we'd use the existing ECO flow? I guess we just need the verilog and SPEF files for the macros. I couldn't get hierarchical STA working with openlane (works with standalone sta). From @User:
You need to run it in standalone 'sta'.  OR doens't support verilog hierarchy but sta does
m
@User if we go back to verilog there will be no placement or routing.
a
@User I assumed you need the gate level verilog for the macros to do the hierarchical STA. Not thinking of going back to the original verilog.
m
regardless if you plan to do any optimization you won't want to go back to verilog. its only good for verification with spef
a
@User Ok. I only mention verilog because that is what the hierarchical STA doc mentions (read macro verilog+ macro SPEF). The discussion here is about using the output of hierarchical STA to drive ECO fixes.
m
for timing signoff not design changes
a
Ahh yes
@User while this works:
Copy code
set_min_delay -to microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/a $multiplier_delay
I was wondering if I could specify the constraint with reference to the clock that is presented to the macro:
Copy code
set_min_delay -from microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clk -to microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/a $multiplier_delay
I couldn't get that to work.
@User I notice very large max slew and max capacitance violations on wires to/from the DFFRAM hard macro. I presume I need to add constraints. Instead of
set_load
it looks like I can use
set_max_capacitance
on a net. I'm not sure how to constrain the hard macro inputs though (
set_driving_cell
but on a net)
t
Set driving cell is for primary inputs. You can use set max capacitance on the net attached to the input pin of the cell.
a
Thanks, Tom I will try that
Also, is it reasonable to
set_min_delay
a net with reference to another net, eg:
Copy code
set_min_delay -from microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/clk -to microwatt_0.soc0.processor.execute1_0.multiply_0.multiplier/a $multiplier_delay
t
You can use get_nets -of_objects [get_pins instance/*] to get the nets automatically from the pins.
a
Oh. I wonder if that is a problem with my
set_min_delay
usage, I'm not using the nets
t
That set min delay should but it may not depending on OpenSTA and how perfect its sdc support is.
a
Thanks!