For some long-running CACE sims, I've seen this er...
# chipalooza
b
For some long-running CACE sims, I've seen this error popping up in the live log after a simulation test case completes, and data for that run is lost. Any idea what could be closing the pipes?
t
It looks like something killed the process. Did you monitor memory use while it was running? It is possible that the process was killed by the kernel's out-of-memory (OOM) manager. This is possible for a long-running simulation if you have opted to save everything. Could always be something else, but that's my hunch.
b
Will monitor next time. Yeah, I think WSL limits available memory to 12 GB or so, so it could well be.
t
If it's WSL, though, I'm even less sure how the OOM management works.
b
It does seem related to memory. On my machine, sets of ~50 long (400000 rows) sims in CACE can run out of memory. It helps if I exit the CACE GUI between runs, but makes it difficult to collect a full datasheet summary. There's a note in Microsoft's documentation about how WSL 2 can consume excess memory because it does not promptly release cached pages. https://github.com/microsoft/WSL/issues/4166 I can confirm idle memory usage is higher after use, even without actively running a CACE job. I'm trying some configuration settings with a
.wslconfig
file, described here: https://learn.microsoft.com/en-us/windows/wsl/wsl-config. Will report if this resolves my troubles.
👍 1
t
I would suppose that a low-speed crystal oscillator transient startup would be a resource hog, but is there any way to reduce the data after the simulation? Or even during? The
linearize
command in ngspice might be useful here (or not; just spouting off ideas here. . .).
b
If I understand the manual right,
linearize
acts on a finished vector but it doesn't accelerate or compress the data mid-sim. The post-simulation data handling is small enough. But the process of running the simultaneous transient sims occupies significant memory. I haven't tried running single-threaded, but I expect the total runtime will be longer. I added a
.wslconfig
file to my Windows home directory with a couple lines that show improvements.
memory=20GB
gives a more generous chunk to WSL than the default 16GB for my system.
pageReporting=false
keeps Windows from nabbing memory from WSL whenever it can (I think).
autoMemoryReclaim=dropcache
is an experimental feature that seems to help the most. WSL seems to free up memory more aggressively after a job completes and I'm not seeing it climb over 6GB. EDIT: It does climb over, but only during a batch of runs. Once WSL 2 runs out of swap space and memory, the running processes fail or their pipes close. With these configuration settings, after 10-15 seconds, the swap and memory clears, and the last testbench run can be aborted from the CACE GUI. Resuming the run after memory clears at least gives a chance of success.
t
Okay, so I guess the problem really was just memory management. I'm glad you figured it out (and hope that was pretty much the whole problem).
b
It still creeps up with the second or third testbench I run. Watching
htop
, there's a cace-gui process that climbs by 40-80 MB in resident memory for every completed ngspice run. I thought ngspice did the math (e.g. calculate the mean of a vector, or calculate a .meas time/value) and sent a few words over to cace-gui with the
.data
file output it makes using
wrdata
, but it seems like there's more going on than that. When a testbench finishes, all the ngspice processes terminate. This releases 1-2G of memory. But mid-testbench, the memory clumping onto CACE is an issue. At the end of a testbench cycle, CACE does its thing and processes the data, then the resident memory in the cace-gui process with the highest priority hikes up, which carries forward into the next testbench. Eventually I'm all out of juice over the 45 or 135 corners. Since adding more comprehensive corners, I haven't been able to generate results for every testbench in one sitting. I can still create an updated datasheet, but will do a piecewise data summary in the GitHub README. Is there a way for me to pare down the data CACE handles? I'm trying "Do not create plot files" (although I wasn't plotting anything). I am definitely wondering if there's anything else I should try.
t
I really don't know because CACE is not supposed to be handling much data at all; if you did 135 corners, then it should just be handling 135 values (or a small multiple of that, depending on how many values your simulation outputs). It sounds like python in WSL isn't doing proper garbage collection, like it's grabbing memory for the buffered output from ngspice and not releasing it. You might try axing the ngspice output in CACE by changing
stdout=subprocess.PIPE
and
stderr=subprocess.PIPE
or
stderr=subprocess.STDOUT
to
stdout=subprocess.DEVNULL
and
stderr=subprocess.DEVNULL
. This will have one negative impact that a simulation that hits an error and drops back to the ngspice interpreter prompt will cause CACE to hang. But if it works, then I can add it as an option setting.
(The changes would be in
cace_simulate.py
lines 152 and 153; I don't think it's needed anywhere else.) I have a pretty low level of confidence that that will change anything. But I can't think of anywhere else that would be using so much memory. For that matter, the output of ngspice can't be using that much memory, either.