How can I debug a segfault with re placement during executin open-source-silicon.dev #openroad

How can I debug a segfault with re-placement? duri...

Matthew Guthaus

11/09/2021, 7:41 PM

How can I debug a segfault with re-placement? during executing: "openroad -exit /openlane/scripts/openroad/or_replace.tcl |& tee >&@stdout /project/openlane/user _project_wrapper/runs/user_project_wrapper/logs/placement/16-replace.log" Last 10 lines: child killed: segmentation violation

Matt Liberty

11/09/2021, 7:42 PM

is it an out of memory situation in docker? child killed sounds like an external agent

Matt Liberty

11/09/2021, 7:43 PM

large designs can trigger that

Matthew Guthaus

11/09/2021, 7:43 PM

possibly.

Matt Liberty

11/09/2021, 7:43 PM

OL has a DOCKER_MEMORY variable to allow a user defined amount

👍 1

Matthew Guthaus

11/09/2021, 7:46 PM

Hm, seems like it defaults to 64G

Matt Liberty

11/09/2021, 7:48 PM

a big enough design could exceed the limit. What is the tail of 16-replace.log

Matthew Guthaus

11/09/2021, 7:48 PM

This is 10 macros plus a few hundred gates

Matt Liberty

11/09/2021, 7:50 PM

so probably not memory then.

Matt Liberty

11/09/2021, 7:50 PM

log tail?

Matthew Guthaus

11/09/2021, 7:50 PM

There's a bunch of infos that have always existed: [INFO GRT-0209] Ignoring an obstruction on layer met5 outside the die area. [INFO GRT-0209] Ignoring an obstruction on layer met5 outside the die area. [INFO GRT-0209] Ignoring an obstruction on layer met5 outside the die area. [INFO GRT-0209] Ignoring an obstruction on layer met5 outside the die area. [INFO GRT-0209] Ignoring an obstruction on layer met5 outside the die area. Lots of warnings about pins outside the area (from the caravel harness)

Matt Liberty

11/09/2021, 7:53 PM

It could be in the global router (which placement calls) based on this. Can you try without PL_ROUTABILITY_DRIVEN ? That would narrow it down

Matt Liberty

11/09/2021, 7:54 PM

Either way I think we'll need a testcase to debug if possible.

Matthew Guthaus

11/09/2021, 7:54 PM

Yeah. I'm re-running my MPW2 design with the new timing updates, so I'll debug a bit more and provide something.

Matthew Guthaus

11/09/2021, 7:57 PM

Same error when PL_ROUTABILITY_DRIVEN is set to false

Matt Liberty

11/09/2021, 7:57 PM

what is the log tail then?

Matthew Guthaus

11/09/2021, 7:58 PM

Nothing seems to have changed...

Matt Liberty

11/09/2021, 7:58 PM

That doensn't make sense. You should get no GRT messages in that case

Matthew Guthaus

11/09/2021, 7:59 PM

Just this in my config, right? set ::env(PL_ROUTABILITY_DRIVEN) 0

Matthew Guthaus

11/09/2021, 8:00 PM

Oh the default is false anyways

Matt Liberty

11/09/2021, 8:02 PM

or_replace.tcl is a mess. I think it doesn't turn off correctly.

Matt Liberty

11/09/2021, 8:05 PM

https://github.com/The-OpenROAD-Project/OpenLane/issues/697

Matt Liberty

11/09/2021, 8:06 PM

Can you provide a test case for this and open an issue?

Matt Liberty

11/09/2021, 8:06 PM

https://github.com/The-OpenROAD-Project/OpenLane/blob/master/docs/source/using_or_issue.md

Matthew Guthaus

11/10/2021, 12:25 AM

Unfortunately, it seems to not fail any longer and gets to stage 29-opendp but then also segfaults

Matt Liberty

11/10/2021, 12:25 AM

what changed?

Matthew Guthaus

11/10/2021, 12:26 AM

I modified some verilog tests. That is all

Matthew Guthaus

11/10/2021, 12:26 AM

Completely unrelated

Matt Liberty

11/10/2021, 12:26 AM

that sounds suspicious.... what is the opendp failure?

Matthew Guthaus

11/10/2021, 12:26 AM

I also ran make clean?

Matthew Guthaus

11/10/2021, 12:30 AM

The opendp failure is related to escaping some signal names. My clock is io_in[17]. I need to escape the [] in config.tcl but not in base.sdc or else STA won't recognize the clock. However, or_opendp.tcl complains if I don't have it escaped:

Copy code

invalid command name "17"
    while executing
"17"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 ${cmd}"
    (procedure "set_log" line 3)
    invoked from within
"set_log ::env($index) $escaped_env_var $::env(GLB_CFG_FILE) 1"
    (procedure "save_state" line 9)
    invoked from within
"save_state"
    (procedure "flow_fail" line 6)
    invoked from within
"flow_fail"
    (procedure "try_catch" line 25)
    invoked from within
"try_catch $::env(OPENROAD_BIN) -exit $::env(SCRIPTS_DIR)/openroad/or_opendp.tcl |& tee $::env(TERMINAL_OUTPUT) [index_file $::env(opendp_log_file_tag)..."

Matthew Guthaus

11/10/2021, 12:31 AM

This actually could be what changed for global routing too...

Matt Liberty

11/10/2021, 12:32 AM

opendb doesn't do anything with timing.

Matt Liberty

11/10/2021, 12:32 AM

opendp rather

Matthew Guthaus

11/10/2021, 12:32 AM

Why did it read my SDC?

Matt Liberty

11/10/2021, 12:34 AM

what is in the opendp log?

Matt Liberty

11/10/2021, 12:34 AM

I don't see a read_sdc in or_opendp.tcl

Matthew Guthaus

11/10/2021, 12:35 AM

Just a segfault with no real context

Matt Liberty

11/10/2021, 12:35 AM

where is invalid command name "17" coming from?

Matthew Guthaus

11/10/2021, 12:35 AM

That is the text in my signal name: io_in[17]

Matt Liberty

11/10/2021, 12:36 AM

I understand but who is trying to execute that name?

Matthew Guthaus

11/10/2021, 12:37 AM

or_opendp.tcl... Rerunning to get the remainder of the stack trace I didn't paste.

Matt Liberty

11/10/2021, 12:38 AM

Do you see anything in that script that would access the name of your clock? I'm looking at the version in master and I don't see anything

Matthew Guthaus

11/10/2021, 12:38 AM

Copy code

invalid command name "17"
    while executing
"17"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 ${cmd}"
    (procedure "set_log" line 3)
    invoked from within
"set_log ::env($index) $escaped_env_var $::env(GLB_CFG_FILE) 1"
    (procedure "save_state" line 9)
    invoked from within
"save_state"
    (procedure "flow_fail" line 6)
    invoked from within
"flow_fail"
    (procedure "try_catch" line 25)
    invoked from within
"try_catch $::env(OPENROAD_BIN) -exit $::env(SCRIPTS_DIR)/openroad/or_opendp.tcl |& tee $::env(TERMINAL_OUTPUT) [index_file $::env(opendp_log_file_tag)..."
    (procedure "detailed_placement_or" line 6)
    invoked from within
"detailed_placement_or"
    (procedure "run_routing" line 32)
    invoked from within
"run_routing"
    (procedure "run_routing_step" line 10)
    invoked from within
"[lindex $step_exe 0] [lindex $step_exe 1] "
    (procedure "run_non_interactive_mode" line 43)
    invoked from within
"run_non_interactive_mode {*}$argv"
    invoked from within
"if { [info exists flags_map(-interactive)] || [info exists flags_map(-it)] } {
        puts_info "Running interactively"
        if { [info exists arg_values(-file)..."
    (file "/openlane/flow.tcl" line 356)
make[1]: *** [Makefile:43: user_project_wrapper] Error 1
make[1]: Leaving directory '/home/mrg/openram_testchip/openlane'
make: *** [Makefile:70: user_project_wrapper] Error 2

Matthew Guthaus

11/10/2021, 12:39 AM

The log output is useless.

Matthew Guthaus

11/10/2021, 12:40 AM

It's reporting overlaps then segfaults [WARNING DPL-0005] Overlap check failed (16972).

Matthew Guthaus

11/10/2021, 12:41 AM

repeater448 overlaps ANTENNA_repeater448_A repeater449 overlaps ANTENNA_repeater449_A repeater451 overlaps ANTENNA_repeater451_A [ERROR]: during executing: "openroad -exit /openlane/scripts/openroad/or_opendp.tcl |& tee >&@stdout /project/openlane/user_project_wrapper/runs/user_project_wrapper/logs/placement/29-opendp.log" [ERROR]: Exit code: 1 [ERROR]: Last 10 lines: child process exited abnormally [ERROR]: Please check openroad log file [ERROR]: Dumping to /project/openlane/user_project_wrapper/runs/user_project_wrapper/error.log

Matt Liberty

11/10/2021, 12:42 AM

I guess the 17 is a red herring. For the crash a test case is best as I can't guess from this what the problem is

Matt Liberty

11/10/2021, 12:43 AM

you mentioned having macros - are there placement sites in the channels ?

Matt Liberty

11/10/2021, 12:43 AM

I have seen a case recently where the channel was so narrow no instances could be placed there

Matthew Guthaus

11/10/2021, 12:44 AM

It's big. This successfully routed for MPW2

Matthew Guthaus

11/10/2021, 12:45 AM

One dumb question. There is now a config.json in addition to config.tcl with duplicate information. Why are they both there?

Matt Liberty

11/10/2021, 12:45 AM

sorry but I guess I need to look at it

Matt Liberty

11/10/2021, 12:46 AM

I am not much of an openlane expert, I mostly work on openroad. @User can you explain config.json vs config.tcl?

Matthew Guthaus

11/10/2021, 12:46 AM

Thanks for your help. I'll add a test case and/or debug a bit more

Matt Liberty

11/10/2021, 12:47 AM

donn

11/10/2021, 8:00 AM

The .json is there so users can be allowed to customize things on platforms where freely modifiable Tcl would constitute a security concern, for example the efabless platform and the OpenLane cloud runner. It’s just an alternative that you’re free to pick.

Matthew Guthaus

11/10/2021, 11:55 AM

@User what happens if both are there like in the example?

donn

11/10/2021, 11:57 AM

tcl's prioritized

donn

11/10/2021, 11:59 AM

Do note that I mean only Tcl will be loaded. JSON will be ignored entirely. If the tcl config's missing, it will attempt to load a json config. If the json config's missing as well, flow.tcl will throw an error.

Matthew Guthaus

11/10/2021, 5:12 PM

@User yeah, thanks for the clarification.

Matthew Guthaus

11/10/2021, 7:41 PM

After wrestling with a fresh install of openlane/pdk, I can reproduce this with or_opendp again:

Copy code

invalid command name "17"
    while executing
"17"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 ${cmd}"
    (procedure "set_log" line 3)
    invoked from within
"set_log ::env($index) $escaped_env_var $::env(GLB_CFG_FILE) 1"
    (procedure "save_state" line 9)
    invoked from within
"save_state"
    (procedure "flow_fail" line 6)
    invoked from within
"flow_fail"
    (procedure "try_catch" line 25)
    invoked from within
"try_catch $::env(OPENROAD_BIN) -exit $::env(SCRIPTS_DIR)/openroad/or_opendp.tcl |& tee $::env(TERMINAL_OUTPUT) [index_file $::env(opendp_log_file_tag)..."
    (procedure "detailed_placement_or" line 6)
    invoked from within
"detailed_placement_or"
    (procedure "run_routing" line 32)
    invoked from within
"run_routing"
    (procedure "run_routing_step" line 10)
    invoked from within
"[lindex $step_exe 0] [lindex $step_exe 1] "
    (procedure "run_non_interactive_mode" line 43)
    invoked from within
"run_non_interactive_mode {*}$argv"
    invoked from within
"if { [info exists flags_map(-interactive)] || [info exists flags_map(-it)] } {
        puts_info "Running interactively"
        if { [info exists arg_values(-file)..."
    (file "/openlane/flow.tcl" line 356)
make[1]: *** [Makefile:43: user_project_wrapper] Error 1
make[1]: Leaving directory '/home/mrg/openram_testchip/openlane'
make: *** [Makefile:70: user_project_wrapper] Error 2

The "17" is in the name of my clock in my base.sdc or my config.tcl file:

Copy code

set ::env(CLOCK_PORT) {io_in[17]}

If I don't use the base.sdc, it still does the above so it must be something with the config.tcl. I have the name escaped there:

Copy code

set ::env(CLOCK_PORT) {io_in\[17\]}

If I look at the generated SDC files, however, the name is unescaped:

Copy code

mrg@diode ~/openram_testchip/openlane/user_project_wrapper/runs/user_project_wrapper (main)$ find . -name \*.sdc -exec grep create_clock {} \; -print
create_clock -name io_in[17] -period 30.0000 [get_ports {io_in[17]}]
./results/cts/user_project_wrapper.cts.sdc
create_clock -name io_in[17] -period 30.0000 [get_ports {io_in[17]}]
./tmp/floorplan/4-verilog2def.sdc
create_clock -name io_in[17] -period 30.0000 [get_ports {io_in[17]}]
./tmp/placement/23-resizer_timing.sdc
create_clock -name io_in[17] -period 30.0000 [get_ports {io_in[17]}]
./tmp/placement/21-resizer_timing.sdc
create_clock -name io_in[17] -period 30.0000 [get_ports {io_in[17]}]
./tmp/placement/16-resizer.sdc

So there are two questions: 1. why isn't write_sdc escaping the name properly? 2. why is or_opendp using the SDC at all?

Matthew Guthaus

11/10/2021, 7:41 PM

@User ^^

Matt Liberty

11/10/2021, 7:52 PM

@User

Matthew Guthaus

11/10/2021, 7:54 PM

@User This may actually be a red herring like @User mentioned before. This looks like it is part of the "save_state" function which is probably trying to write out the SDC (or config.tcl) after an error. opendp probably doesn't use the SDC (or clock at all) but the fail triggers this saving. I'm running it now without any clock defined to see if I can identify solve the real error.

Matthew Guthaus

11/10/2021, 11:19 PM

OH, so this failure is actually during routing when it is trying to legalize the diodes. This is why timing is enabled...

Matthew Guthaus

11/10/2021, 11:21 PM

GAH, and it can't legalize the diodes because they are "sprayed" all over the macros and can't be moved outside of the macros.

Matt Liberty

11/10/2021, 11:23 PM

which DIODE_INSERTION_STRATEGY are you using?

Matthew Guthaus

11/10/2021, 11:25 PM

I was using spray because the others caused problems during the MPW2 tool flow

Matthew Guthaus

11/10/2021, 11:25 PM

But spray won't work if you have macros now

Matt Liberty

11/10/2021, 11:25 PM

which one is spray?

Matthew Guthaus

11/10/2021, 11:26 PM

Matt Liberty

11/10/2021, 11:26 PM

1. "A diode is inserted for each PIN and connected to it. ?

Matthew Guthaus

11/10/2021, 11:26 PM

I'm uncertain how the others would work. If they put a diode on a macro it won't work

Matthew Guthaus

11/10/2021, 11:26 PM

"Spray diodes"

Matthew Guthaus

11/10/2021, 11:26 PM

Specifies the insertion strategy of diodes to be used in the flow. 0 = No diode insertion, 1 = Spray diodes, 2 = insert fake diodes and replace them with real diodes if needed. 3= use FastRoute Antenna Avoidance flow, 4 = Use Sylvian's Custom Script for diode insertion on design pins and smartly inserting needed diodes inside the design, 5 = a mix of strategy 2 and 4. (Default: 3)

Matt Liberty

11/10/2021, 11:26 PM

I'm looking at https://github.com/The-OpenROAD-Project/OpenLane/blob/master/docs/source/hardening_macros.md

Matthew Guthaus

11/10/2021, 11:26 PM

3 used to not work for some reason

Matt Liberty

11/10/2021, 11:27 PM

maybe you have an older version?

Matthew Guthaus

11/10/2021, 11:27 PM

https://github.com/The-OpenROAD-Project/OpenLane/blob/master/configuration/README.md

Matthew Guthaus

11/10/2021, 11:27 PM

Yes, I had an older version during MPW2 🙂

Matthew Guthaus

11/10/2021, 11:28 PM

Looks like those are in conflict with eachother. Maybe spray puts them randomly and then tries to connect them to each pin?

Matthew Guthaus

11/10/2021, 11:29 PM

I think 3 used to not work because they were using a different router before, if I recall?

Matthew Guthaus

11/10/2021, 11:30 PM

TritonRoute vs FastRoute?

Matt Liberty

11/10/2021, 11:31 PM

I'm not sure of the history. @User would you clarify the doc discrepancy between the README and hardening_macros on

DIODE_INSERTION_STRATEGY

described above

Matt Liberty

11/10/2021, 11:32 PM

from what I can see in routing.tcl it looks to like 1 is closer to "A diode is inserted for each PIN and connected to it. " versus spraying but I guess that doesn't match your experience

Matt Liberty

11/10/2021, 11:32 PM

it looks like it is trying to put a diode on each pin directly and then let detailed placement legalize it

Matt Liberty

11/10/2021, 11:33 PM

that's potentially a lot of diodes so it might not be possible to legalize them in a dense enough design area

Matt Liberty

11/10/2021, 11:33 PM

however I would expect that would lead to diodes on the pins not deep inside the macros

Matthew Guthaus

11/10/2021, 11:59 PM

I see. I removed the diodes entirely and it seems to still have issues legalizing clock buffers (and filler?). I need to figure out that red herring save_state bug too though. I'm finally digging more into the openlane/openroad flow so I have a better understanding of things under the hood now.

Matt Liberty

11/10/2021, 11:59 PM

there should be no fillers when you are running detailed placement. They should happen after otherwise there will be no empty sites

Matthew Guthaus

11/11/2021, 12:00 AM

I get lots of:

Copy code

repeater432 overlaps FILLER_2_3245
 repeater433 overlaps FILLER_406_3269
 repeater435 overlaps FILLER_631_2825
 repeater437 overlaps FILLER_2_4701

before it unelegantly gives up

Matthew Guthaus

11/11/2021, 12:00 AM

This is all after routing

Matt Liberty

11/11/2021, 12:00 AM

if you are inserting diodes after routing then you should delay filler insertion to after that

Matt Liberty

11/11/2021, 12:00 AM

the design will be 100% full after filler insertion and nothing else will fit

Matthew Guthaus

11/11/2021, 12:01 AM

That might be an openlane issue. This is the relevant stack:

Copy code

invoked from within
"detailed_placement_or"
    (procedure "run_routing" line 32)
    invoked from within
"run_routing"
    (procedure "run_routing_step" line 10)

Matthew Guthaus

11/11/2021, 12:02 AM

There are calls to ins_fill_cells after ins_diode_cells but before detailed_place_or

Matthew Guthaus

11/11/2021, 12:02 AM

So it may take up the space and not be able to legalize

Matthew Guthaus

11/11/2021, 12:04 AM

Yeah, it runs ins_fill_cells before legalization of the diodes. That is the problem.

Matthew Guthaus

11/11/2021, 12:04 AM

@User ^^

Matt Liberty

11/11/2021, 1:07 AM

that sounds worth an issue if you don't hear from @User

Mitch Bailey

11/12/2021, 4:46 PM

@User @User Maybe this has been resolved, but I was able to reproduce the

Copy code

invalid command name "17"
    while executing
"17"

error and have a work around. It occurs when there is an existing

<design>/runs/<tag>/config.tcl

file. Deleting this file works for me. When this file is created, the clock port is defined as below, but it looks like the routine that reads this and rewrites it can't handle

Copy code

set ::env(CLOCK_PORT) "io_in\[17\]"

I believe the permanent solution is to patch the

save_state

routine in

scripts/tcl_commands/all.tcl

with the following

Copy code

set escaped_env_var [string map {\[ \\\[} $escaped_env_var]
            set escaped_env_var [string map {\] \\\]} $escaped_env_var]

I'll submit a PR once I test it.

Matthew Guthaus

11/12/2021, 4:57 PM

Hi @User I had gotten to that point as well and even had that same fix but I don't see it resolving the issue. Sometimes I feel like the scripts are cached somewhere and don't seem to update when I run though...

Mitch Bailey

11/12/2021, 6:28 PM

@User I found another place

proc prep

all.tcl

that looks like it's trying to write out the

config.tcl

file. However, the same type of fix doesn't work as expected. I'll dig deeper tomorrow. Incidentally, I've noticed that the

config.tcl

file has a lot of duplicate entries and sometimes the values don't match.

👍 1

Matthew Guthaus

11/12/2021, 7:06 PM

@User Our day is starting so I'll let you know what I find today.

4 Views

Open in Slack

Previous Next