I have done a comparison between `ngspice` and serial `Xyce` open-source-silicon.dev #analog-design

I have done a comparison between `ngspice` and (se...

Stefan Schippers

09/09/2022, 5:06 PM

I have done a comparison between

ngspice

and (serial)

Xyce

on a big design. Note: this test circuit does not use sky130 models, but uses a generic 180nm Cmos process at 1.5V. Design has only 1 LV pmos and 1 LV nmos, only two models, BSIM3 level=49. The example is in the standard (standalone) xschem test schematics,

..../share/xschem/xschem_library/rom8k/rom8k.sch

, it is a 16KByte Rom Macro cell. Part of the rom array is populated with actual transistors so some read cycles are simulated and data is read out. Circuit statistics:

Copy code

***** Device Count Summary ...
       C level 1 (Capacitor)                   1035
       M level 9 (BSIM3)                      14287
       V level 1 (Independent Voltage Source)    22
       --------------------------------------------
       Total Devices                          15344
***** Setting up matrix structure...
***** Number of Unknowns = 6677

Xyce (serial) report:

Copy code

***** Total Simulation Solvers Run Time: 977.709 seconds
***** Total Elapsed Run Time:            980.18 seconds
*****
***** End of Xyce(TM) Simulation
*****

Ngspice report:

Copy code

Total elapsed time (seconds) = 771.449

So there is no much difference in execution time (at least no orders of magnitude). Simulation results are absolutely correct for both. I have set up the schematic such that it simulates unchanged both in ngspice and Xyce. @Steven Bos it would be nice if you can test this on your parallel

Xyce

installation. You need to run xschem as standalone (so not in a directory with sky130 xschemrc file), get the above models file and place it in the simulation directory as explained in the COMMANDS element of the schematic. Also you need to place the stimuli file from

.../share/doc/xschem/rom8k/stimuli.rom8k

into the simulation directory and follow instructions in the COMMANDS element. I have done some updates to xschem for better Xyce integration and more error checks (for example against corrupted or malformed .raw files) so please update if you want to try this test. CC (@Eric Keiter @Harald Pretl @Tim Edwards)

❤️ 1

👍 6

Steven Bos

09/09/2022, 5:45 PM

Cool @Stefan Schippers! I will test this with my setup. Do you have the openmp version of ngspice installed (and thus using a parallel version of ngspice) or without this flag? If with, how many threads are you using?

Stefan Schippers

09/09/2022, 8:57 PM

In this test and due to the models (BSIM3 rev.3.1) the ngspice simulation used only on one thread and so did Xyce, so they used the same computing power. I know ngspice with BSIM4 can use 2 threads, one for the matrix solver, and one for the device equations calculation. But in above test this was not the case. My ngspice is compiled with

--enable-openmp

From ngspice manual, "To state it clearly: OpenMP is installed inside the model equations of a particular model. It is available in BSIM3 versions 3.3.0 and 3.2.4, but not in any other BSIM3 model, in BSIM4 versions 4.5, 4.6.5, 4.7 or 4.8, but not in any other BSIM4 model".

Steven Bos

09/09/2022, 9:01 PM

Will ngspice behave differently with or without openmp compilation? I see some differences between xyce serial and xyce parallel with 1 core. To be precise: I have not inspected any functional changes (eg. precision) only some run time differences

Steven Bos

09/09/2022, 9:03 PM

I translated the stimuli file into a .cir file and downloaded the .7z file. Should I then extract the .7z file and rename a library file to models_rom8k.txt? I also pulled, config, make and installed the latest version

Stefan Schippers

09/09/2022, 9:07 PM

For the models, yes that is the way to go. I don't know if i can package the models (due to license issues) inside xschem so i provide the instructions to get them.

Steven Bos

09/09/2022, 9:07 PM

Ah good to know that in your test it forces single core compute for a fair comparison.

Steven Bos

09/09/2022, 9:08 PM

Which file should i rename to models_rom8k.txt?

Stefan Schippers

09/09/2022, 9:10 PM

I believe there are two transistors, NMOS and PMOS. Put these into a single file. Ensure the name of the NMOS model is NMOS and the pmos name is PMOS. Or get ith here :-)

models_rom8k.txt

Stefan Schippers

09/09/2022, 9:11 PM

sorry the names are cmosn and cmosp, respectively. But get the file i uploaded,

Steven Bos

09/09/2022, 9:11 PM

I get several messages in the info window. Is this correct?

infowindow_log.txt

Stefan Schippers

09/09/2022, 9:11 PM

yes no problem

Steven Bos

09/09/2022, 9:12 PM

Ok, ill make the model file and see if the results are similar to yours

Steven Bos

09/09/2022, 9:15 PM

it is running. If it run even close to yours this will take at least 2x 10 min

Stefan Schippers

09/09/2022, 9:16 PM

My laptop is a 13 years old core i3, so you will be faster 🙂

Steven Bos

09/09/2022, 9:19 PM

Indeed ngspice was a bit faster on my laptop:

Steven Bos

09/09/2022, 9:20 PM

image.png

Steven Bos

09/09/2022, 9:20 PM

correct waves (i think)

Stefan Schippers

09/09/2022, 9:20 PM

Did you see the launchers in the top schematic? there are2 for launching the simulation and 2 for loading the waveforms. I have set different netlist/raw filenames to avoid overwriting. xschem is structured such that you can run on the same schematic more simulations in parallel even on the same schematic if you do that with different output tiles. This is easy with tcl script if you look at the launchers embedded tcl code. From the GUI if you press the SImulate button it turns to red and is disabled until the simulation is finished. This because it would be terrible if a user clicks the button 10 times....

Stefan Schippers

09/09/2022, 9:21 PM

ok, just for test please read the last LDQ[15:0] pattern

Stefan Schippers

09/09/2022, 9:23 PM

Shift + mouse wheel inside the graph will zoom in/out the waves

Stefan Schippers

09/09/2022, 9:23 PM

while mouse wheel with mouse outside the graphs will zoom in/out the schematic

Steven Bos

09/09/2022, 9:24 PM

image.png

Stefan Schippers

09/09/2022, 9:26 PM

I meant the whole output data bus, LDQ[15:0], (LDQ in the graph ) but i am sure the results are correct

Steven Bos

09/09/2022, 9:28 PM

Great.

Steven Bos

09/09/2022, 9:28 PM

I do get some error when trying xyce

Steven Bos

09/09/2022, 9:29 PM

and yes i am using the launcher buttons for bot running the sim and loading data

Stefan Schippers

09/09/2022, 9:29 PM

when i was designer i often used C1A0 for a pass code, and F16A as Fail code. In Italian CIAO mean "Hello", and FIGA means ... ehm... "pussy"

Steven Bos

09/09/2022, 9:29 PM

hahahaha

Steven Bos

09/09/2022, 9:29 PM

i like those insider tidbits 😄

Stefan Schippers

09/09/2022, 9:30 PM

better than using FFFF or 0000 which often occur regardless of good/bad working parts 🙂

Stefan Schippers

09/09/2022, 9:32 PM

let me know the Xyce error... I have Xyce rev 7.5 installed,

Steven Bos

09/09/2022, 9:32 PM

error.log

Steven Bos

09/09/2022, 9:32 PM

maybe because my folder is not empty?

Steven Bos

09/09/2022, 9:34 PM

I also use xyce 7.5

Stefan Schippers

09/09/2022, 9:35 PM

Oh yes, may be i forgot to check that in. Please regenerate the stimuli cir file, but before edit the source file when doing Simulation -> Utile stimuli editor (GUI) and change in the window the line voltage VCC to voltage 1.5. Xyce does not accept parametrized pwl functions. Then press

Translate

Steven Bos

09/09/2022, 9:35 PM

Stefan Schippers

09/09/2022, 9:38 PM

The

Utile

things is a side project i used to create complex stimuli for spice. You can describe waveforms in a more convenient way, declare busses and use macros. The help button in the Utile window explains the language. When doing

Translate

everything is translated into single bit voltage sources with PWL functions. You can also set signals to Hi-Z state so it is quite useful for complex designs.

Steven Bos

09/09/2022, 9:41 PM

It is now running xyce serial. Interesting to know about that function, my simulation so far have only required simple waveforms. I mostly use the Pulse function for that

Steven Bos

09/09/2022, 9:43 PM

Interesting! Xyce was a fair bit slower than ngspice

Stefan Schippers

09/09/2022, 9:43 PM

yes, however doing a read cycle on a memory requires to set a 13 bit address bus, set a number of signals (clock, enable , output enable) that must have specific timings. Once the macros are set a read cycle in the source stimuli file is just a :

;     add en oe

;=====================

cycle 0000 1 1

Steven Bos

09/09/2022, 9:44 PM

The output of xyce

Steven Bos

09/09/2022, 9:44 PM

I wonder why in my circuits i get the complete opposite.

Steven Bos

09/09/2022, 9:45 PM

Let me run this test with xyce parallel

Stefan Schippers

09/09/2022, 9:45 PM

Anyway a few minutes for a circuit with 14000 MOS and 16k total devices is quite good

Stefan Schippers

09/09/2022, 9:46 PM

This is a transistor level simulation, however the circuit is quite 'digital' the only real analog part is the sense amplifier. May be the switching logic does not give Xyce a big advantage.

Steven Bos

09/09/2022, 9:47 PM

What name is the parallel version of xyce in simulator_commands?

Stefan Schippers

09/09/2022, 9:47 PM

while a deep analog circuit like ADC, PLL etc may give different results

Stefan Schippers

09/09/2022, 9:48 PM

Yo mus give the complete path of the simulator. Open the Simulation -> configure simulators and tools and set the right path/filename

Steven Bos

09/09/2022, 9:48 PM

Yes, that could be an interesting explanation. Maybe you can confirm that my DAC test was also performed correctly?

Eric Keiter

09/09/2022, 9:49 PM

Hi @Stefan Schippers and @Steven Bos this is interesting! Glad Xyce is working for you.

Stefan Schippers

09/09/2022, 9:49 PM

you also must edit the launcher to set the 4th simulator: set sim(spice,default) 3 ;# 4th simulator: Xyce

Stefan Schippers

09/09/2022, 9:49 PM

which must refer to the parallel Xyce

Stefan Schippers

09/09/2022, 9:50 PM

Hi, @Eric Keiter you guys did a great work!

Eric Keiter

09/09/2022, 9:50 PM

BTW, if you run parallel Xyce, this circuit is large enough that Xyce will probably attempt to use a parallel linear solver. However, at this scale, it is probably not large enough to beat the serial linear solver. So, I suggest trying the “parallel load, serial solve” option.

Eric Keiter

09/09/2022, 9:50 PM

@Stefan Schippers thanks!

Steven Bos

09/09/2022, 9:50 PM

Ah yes, i was editing the wrong launcher. Thanks, now running xyce parallel.

Stefan Schippers

09/09/2022, 9:50 PM

from 7.2 to 7.5 lot of improvements and now we can really use Xyce/ngspice on the same designs with minimal changes

👍 1

Eric Keiter

09/09/2022, 9:50 PM

To do “parallel load serial solve”, you can just add “.options LINSOL type=klu”

Steven Bos

09/09/2022, 9:51 PM

Oh @Eric Keiter interesting! How can I set that option

Eric Keiter

09/09/2022, 9:51 PM

To the netlist

Steven Bos

09/09/2022, 9:51 PM

ok will do after this test

👍 1

Eric Keiter

09/09/2022, 9:51 PM

That will force the linear solver to be KLU, which is a serial direct solver.

Stefan Schippers

09/09/2022, 9:51 PM

yes add the line in the Xyce command symbol

Eric Keiter

09/09/2022, 9:51 PM

If you do that, then only the device evaluations and the parser will be done in parallel.

Eric Keiter

09/09/2022, 9:53 PM

So, of course, based on Amdahl’s law, that will limit the theoretical parallel speed up you can get. But it is generally very hard to get a linear solver to scale perfectly for circuits. So, for moderate sized circuits keeping the linear solve serial-direct while making everything else parallel is usually best.

Eric Keiter

09/09/2022, 9:55 PM

@Stefan Schippers that is great to hear that the improvements from 7.2 to 7.5 have helped. I’ve been trying to track the various issues that have been reported here, and that definitely helped prioritize some things.

Steven Bos

09/09/2022, 9:56 PM

This was significantly faster than Xyce serial

👍 1

Eric Keiter

09/09/2022, 9:56 PM

@Steven Bos neat!

Steven Bos

09/09/2022, 9:56 PM

(the same as ngspice 169 sec exactly) I will now run with “.options LINSOL type=klu”

Stefan Schippers

09/09/2022, 9:56 PM

@Eric Keiter how does Xyce partition the circuit? looking at the matrix? I know from experience with other (commercial) simulators splitting the circuit for parallel is the very hard part, and if not done well (and it is difficult by looking at a spice netlist) It can get slower that a brute force flat solver

Stefan Schippers

09/09/2022, 9:58 PM

Anyway great simulator!

Stefan Schippers

09/09/2022, 9:58 PM

and 3' to simulate several read accesses on a 16KB ROM at transistor level is impressive.

Eric Keiter

09/09/2022, 9:59 PM

@Stefan Schippers Xyce does 2 different partitions, one for device evaluation, and one for the linear solver (if using a parallel linear solver). If using KLU the linear solve is just done on proc 0. For the device eval, the default partition is simply a first-come-first serve strategy, based on the ordering of devices in the netlist. For the linear solver, there are various options, but they are all based on graph partitioners like ParMETIS, where the graph just comes from the matrix.

Steven Bos

09/09/2022, 9:59 PM

42 seconds

👍 1

Stefan Schippers

09/09/2022, 9:59 PM

Aha, here comes the hot stepper.

Stefan Schippers

09/09/2022, 10:00 PM

42 sec is the time with Xyce serial and teh .option @Eric Keiter mentioned?

Eric Keiter

09/09/2022, 10:00 PM

We’ve intended to look at parallel partitioning based on the (known) circuit hierarchy, but at this point Xyce doesn’t do that. The matrix graph is from (essentially) a flattened netlist.

Stefan Schippers

09/09/2022, 10:00 PM

will add in my test

Stefan Schippers

09/09/2022, 10:03 PM

@Eric Keiter this ROM circuit test uses a

.options SCALE=0.1

i didn't see any mention of it in the manual but i believe it works. SImularion would not succeed with 10x sizes. BTW also sky130 uses a SCALE factor, so it definitely is accepted by Xyce.

Eric Keiter

09/09/2022, 10:04 PM

Yes, option SCALE is supported. I had thought we documented it, but possibly it was overlooked.

Steven Bos

09/09/2022, 10:05 PM

Hmm the data is wrong though

Stefan Schippers

09/09/2022, 10:06 PM

Oh, yes found it in the ref man. 2.1.21.3. Length Scaling

Steven Bos

09/09/2022, 10:06 PM

image.png

Steven Bos

09/09/2022, 10:07 PM

42 sec was with parallel xyce. Should i run this with serial xyce?

Steven Bos

09/09/2022, 10:08 PM

I also get these errors:

xyce_log

Stefan Schippers

09/09/2022, 10:10 PM

these are warnings, and yes there are some transistors in the ROM array with drain not connected to the bitline, also there are unrecognized model parameters, but these are also from ngspice

Stefan Schippers

09/09/2022, 10:11 PM

If you see COmpleted at the very beginning it means the process returned EXIT_SUCCESS.

Steven Bos

09/09/2022, 10:12 PM

maybe I added the .options command wrongly? Now the simulation is running a bit longer. .options SCALE=0.10 .options LINSOL type=klu vs .options SCALE=0.10 LINSOL type=klu

Stefan Schippers

09/09/2022, 10:13 PM

I did on a separate .options line just to be sure it is accepted

Stefan Schippers

09/09/2022, 10:13 PM

i don't see big differences (on serial Xyce, the only one i have installed)

Steven Bos

09/09/2022, 10:15 PM

Ok. The longer simulation was serial xyce with the linsol option

Steven Bos

09/09/2022, 10:16 PM

which seems to give correct output

Steven Bos

09/09/2022, 10:16 PM

image.png

Stefan Schippers

09/09/2022, 10:16 PM

looks good

Eric Keiter

09/09/2022, 10:17 PM

.options SCALE and .options LINSOL type=klu need to be on separate lines.

Stefan Schippers

09/09/2022, 10:17 PM

so the question is why the 42 sec simulation did produce wrong results.

Stefan Schippers

09/09/2022, 10:18 PM

may be your LINSOL option on teh same lien as the SCALE did invalidate also the SCALe factor and boom nothing works

Steven Bos

09/09/2022, 10:18 PM

i think so

Steven Bos

09/09/2022, 10:18 PM

i am now running the parallel version again and it is passed 42 seconds

Eric Keiter

09/09/2022, 10:19 PM

If SCALE got ignored, it would certainly cause the results to be wrong. I think if Xyce sees an unrecognized .options command, it will print a warning and run anyway.

Steven Bos

09/09/2022, 10:19 PM

This time we got good data and ...

Stefan Schippers

09/09/2022, 10:19 PM

yes, to my knowledge no design id robust enough to survice a 10x increase in L and W

Steven Bos

09/09/2022, 10:19 PM

image.png

Stefan Schippers

09/09/2022, 10:20 PM

Good! success🍺

Steven Bos

09/09/2022, 10:20 PM

173 seconds, meaning no real speed up compared to 169 but at least quality results

Eric Keiter

09/09/2022, 10:20 PM

In general, all the “.options” commands go to specific parts of the code. The long way to specify “.options scale” is to actually have “.options parser scale=#”.

Eric Keiter

09/09/2022, 10:21 PM

So “options parser” and “options LINSOL” are different metadata, and would conflict.

Eric Keiter

09/09/2022, 10:21 PM

Anyway, glad the result matches now!

Stefan Schippers

09/09/2022, 10:21 PM

177 sec is a great result anyway. I am curious what HSIM would be able to do , lol

Eric Keiter

09/09/2022, 10:21 PM

If you were running parallel I am a little surprised it wasn’t faster than serial. How many processors were you using?

Steven Bos

09/09/2022, 10:21 PM

Eric Keiter

09/09/2022, 10:22 PM

To run parallel, the command would be “mpirun -np N Xyce netlist.cir “

Eric Keiter

09/09/2022, 10:22 PM

OK, 6.

Eric Keiter

09/09/2022, 10:22 PM

I would have expected some speed up over serial in that case.

Eric Keiter

09/09/2022, 10:23 PM

Actually, I’ve lost track of how long the serial Xyce calculation took. Was it the 260sec result?

Steven Bos

09/09/2022, 10:25 PM

TLDR; ngspice 169s parallel xyce w/o linsol 169s parallel xyce w 173s serial xyce 267s

✅ 1

Eric Keiter

09/09/2022, 10:25 PM

AH, ok. Thanks!

Stefan Schippers

09/09/2022, 10:25 PM

I think so, yes (@Steven Bos confirm?)

Steven Bos

09/09/2022, 10:26 PM

these are single measurements, i dont know how deterministic this process is

Stefan Schippers

09/09/2022, 10:26 PM

I have similar results for ngspie and Xyce serial (scaled to my laptop cou)

Eric Keiter

09/09/2022, 10:26 PM

Understood, makes sense.

Stefan Schippers

09/09/2022, 10:27 PM

single measurements but long simulation, so if machineis not busy i think results are repeatable

Stefan Schippers

09/09/2022, 10:27 PM

Thank you @steven for testing Xyce parallel

Steven Bos

09/09/2022, 10:28 PM

maybe you can test my 5bit DAC @Stefan Schippers? That one is weirdly much faster on xyce than ng for some reason

Eric Keiter

09/09/2022, 10:28 PM

Interesting.

Stefan Schippers

09/09/2022, 10:28 PM

Sure can you share the schematics ? or the netlist if you prefer

Steven Bos

09/09/2022, 10:28 PM

No problem, it is fun to test these things and I learn a lot from the interactions

Stefan Schippers

09/09/2022, 10:29 PM

eys doing this tests helps a lot. I have also fixed a crashing bug if xschem attempts rto read in a raw file Xycehas not finished writing, so always improving

Steven Bos

09/09/2022, 10:30 PM

Also, @Eric Keiter, in another thread, I tested a 7 bit DAC which was outperforming serial xyce significantly. But the 8 bit DAC was the reverse, xyce parallel found 24 singularities and the full simulation would have taken 9 days (i stopped after 2%)

Steven Bos

09/09/2022, 10:31 PM

Possibly the scale of the circuit? I dont think it has more devices than this circuit though

Eric Keiter

09/09/2022, 10:31 PM

Weird. How large (# of unknowns and/or devices)?

Steven Bos

09/09/2022, 10:31 PM

Let me look up the thread

Eric Keiter

09/09/2022, 10:32 PM

If the circuit is under about 100k unknowns, it is usually best to use the “.options LINSOL type=klu” (parallel device eval, serial solve) setting

Stefan Schippers

09/09/2022, 10:32 PM

@Eric Keiter I started working seriously on Xyce integration when i saw the m= parameter was implemented on subcircuit calls, This is soooo much used in big system simulations for ganging together idle subcircuits but not reducing the loading capacitances. One subcircuit with m=256 simulates much faster that 256 instances of the same subcircuit!

Eric Keiter

09/09/2022, 10:33 PM

Oh, yes. I can see that “M=“ would be very important! I should have implemented it a long time ago, but better late than never!

Stefan Schippers

09/09/2022, 10:33 PM

well done!

Steven Bos

09/09/2022, 10:33 PM

https://open-source-silicon.slack.com/archives/C01TLV579C5/p1662590281973179

Eric Keiter

09/09/2022, 10:33 PM

Back to the linear solver, etc; I think that when the number of unknowns is 10,000 or less, parallel Xyce automatically will use KLU (the serial direct solver). But, in reality, the trade-off point for the parallel sovler is more like 100k.

Steven Bos

09/09/2022, 10:34 PM

I will try that option

Eric Keiter

09/09/2022, 10:34 PM

So, between 10k and 100k, it needs to be set manually, and it is the right thing to do. Above 100k, it is harder to saw which is best. I’ve seen some circuits where KLU is still the better choice at 500,000 unknowns. But I wouldn’t count on it at that point.

Stefan Schippers

09/09/2022, 10:36 PM

500k at transistor level is a big beast, lol

Eric Keiter

09/09/2022, 10:36 PM

Yes!

Steven Bos

09/09/2022, 10:36 PM

When it finds singletons, does it mean that xyce cant partition?

Eric Keiter

09/09/2022, 10:37 PM

Singletons are actually fine. If Xyce is reporting them, however, that means you are using the parallel solver.

Eric Keiter

09/09/2022, 10:38 PM

For the parallel solver, Xyce does several pre-processing steps to the matrix to make it easier to solve. The first step is something we call “singleton removal”.

Steven Bos

09/09/2022, 10:39 PM

Hmm i wonder why the parallel solver is then having trouble with the circuit while the serial solver can solve it in 700s

Eric Keiter

09/09/2022, 10:39 PM

Basically, what it is doing is identifying matrix structures that are associated with ideal voltage sources (like power supplies, clocks, etc). For ideal sources, they tend to be connected to a lot of things, which causes communication problems for a parallel solver. However, since they are ideal, we already know the solution for those nodes and there isn’t any point in including them in the matrix solve. So, the “singleton removal” function factors them out before moving to the next step

Steven Bos

09/09/2022, 10:40 PM

Ah thanks for sharing that!

Eric Keiter

09/09/2022, 10:40 PM

It is generally difficult to get parallel matrix solvers to work well with circuit matrices. There are a handful of specialized pre conditioners that we’ve developed (that are applied after singleton removal), but they tend to only be effective on certain types of circuits.

Eric Keiter

09/09/2022, 10:41 PM

So, the default preconditioned is a fairly simple one, that has a “one-size fits all” quality to it.

Eric Keiter

09/09/2022, 10:41 PM

“Preconditioner” I mean.

Steven Bos

09/09/2022, 10:42 PM

Right, so results with parallel xyce really depend on the circuit, but if suitable can lead to great speed up

Stefan Schippers

09/09/2022, 10:42 PM

@Steven Bos can you do a TL;DR for the sim times of your 10 bid DAC with ngspice, Xyce serial and Xyce parallel?

Steven Bos

09/09/2022, 10:42 PM

Yes that was my plan!

Steven Bos

09/09/2022, 10:43 PM

but so far xyce parallel was not running anything higher than 7. I will now test it with the suggestion @Eric Keiter gave above using KLU

Eric Keiter

09/09/2022, 10:43 PM

In general, even when the parallel matrix solver is working well, it can’t beat a good direct solver. THe parallel solvers are based on GMRES, and even the serial GMRES will be slower than a direct solve for small circuits. But, GMRES is easy to do in parallel, and direct solver methods are very hard to do in parallel. Also they scale very differently. Direct solvers (even the best ones) scale much worse. So, eventually, there is a problem size where the direct solver loses.

Steven Bos

09/09/2022, 10:44 PM

Yeah for my circuit that was around 6 bit DAC where parallel came out on top

Eric Keiter

09/09/2022, 10:45 PM

So, up in the 100s of thousands of unknowns, the direct solver just can’t hack it any more. After that, the iterative (GMRES based) solvers are the only option.

Eric Keiter

09/09/2022, 10:45 PM

The holy grail would be direct solver that actually scales well in parallel. We’ve worked on this, but it is hard.

Steven Bos

09/09/2022, 10:45 PM

Great, they should teach this practical wisdom

👍 1

Steven Bos

09/09/2022, 10:47 PM

I will share my 6 bit DAC netlist with you @Stefan Schippers, if you have time please give it a comparison between ng and xyce

Stefan Schippers

09/09/2022, 10:47 PM

sure i will. (I only have Xyce serial, may be in thenext days i will take a big breath and build the parallel as well )

Steven Bos

09/09/2022, 10:49 PM

yes that was quite a struggle :D

Steven Bos

09/09/2022, 11:04 PM

No idea if this will be useful to you, but this was my installation log of all the dependencies incl. trilinos serial, parallel, regression-suite, xyce serial and xyce parallel It is not a step-by-step guide, but all the important steps should be in there

Xyce_Install_Log.txt

Stefan Schippers

09/09/2022, 11:34 PM

Thanks i have saved the file for reference. 👍

Stefan Schippers

09/09/2022, 11:47 PM

I am looking at your DAC10 netlist . I would suggest increasing rise / fall times of voltage sources. Since period varies between 10u and 5120u, using 1ps rise/fall is probably slowing the simulator, requiring very small timesteps around the transitions. I suggest using 100ps or something, but i am not sure if increasing the rise/fall in the pulse voltage source lines without shifting the start time will keep the signals correctly aligned.

Steven Bos

09/09/2022, 11:52 PM

Good one, the parallel xyce complained about too small time steps

Steven Bos

09/09/2022, 11:56 PM

I slightly redesigned the DAC, it was build around a digitally controlled analog switch but the design i used seemed weird when doing a DC sweep

Steven Bos

09/09/2022, 11:56 PM

so it is taking a bit longer to propagate all the changes and cretae new netlists

Steven Bos

09/10/2022, 12:05 AM

The design constraint was to switch analog voltages between 3.3V and 0V using 1.8V digital logic.

Stefan Schippers

09/10/2022, 12:12 AM

The classical voltage shifter is this one:

Stefan Schippers

09/10/2022, 12:13 AM

Usually with a buffer 3.3V inverter on the output

Steven Bos

09/10/2022, 12:15 AM

What are the drawback/benefits of this design compared to the one above?

Stefan Schippers

09/10/2022, 12:15 AM

Your switch has Nfets with body connected to OUT. This can be done using triple insulated well, but ustually nfet body terminal is 0V.

Stefan Schippers

09/10/2022, 12:16 AM

on cmos processes with no insulated p-wells (that require buried nwells) that switch cannot be done

Steven Bos

09/10/2022, 12:16 AM

Ah. so this design would not work for the sky130 process?

Stefan Schippers

09/10/2022, 12:17 AM

It can be done, sky130 has insulated p-well, but surely takes more space due to the numerous tap rings

Steven Bos

09/10/2022, 12:18 AM

Right. good to know. I will be experimenting with layout and post layout simulation in the next weeks

Stefan Schippers

09/10/2022, 12:27 AM

1.png

Steven Bos

09/10/2022, 12:29 AM

Wow! Same curve but a much cleaner design. I have so much to learn about designing 😄

Steven Bos

09/10/2022, 12:30 AM

Does the output need an 3.3V inverting buffer like you mentioned? Or can i chain this to the next one without a buffer like so

Stefan Schippers

09/10/2022, 12:33 AM

yes if there is a capacitive loading this is the right structure:

Stefan Schippers

09/10/2022, 12:34 AM

if output load (the capacitance attached to the voltage shifter output) is low you can avoid the buffering inverter.

Steven Bos

09/10/2022, 12:36 AM

I was thinking of 1pF load capacitance

Stefan Schippers

09/10/2022, 12:36 AM

Oh, i see there is a problem. Your VREFL is not alwats 0 and VREFH is not always 3.3. Then the above structure is not good. i thought you needed to switch between 2 fixed levels, 0 and 3.3

Stefan Schippers

09/10/2022, 12:37 AM

ok delete my last posts 🙂

Steven Bos

09/10/2022, 12:37 AM

no problem, i found it very useful anyway!

Stefan Schippers

09/10/2022, 12:37 AM

well, if you need a 1.8 --> 3.3 voltage domain shifter the above is the way to go

Steven Bos

09/10/2022, 12:38 AM

The design i already made?

Stefan Schippers

09/10/2022, 12:38 AM

no i mean the one with the cross-coupled p channels

Steven Bos

09/10/2022, 12:39 AM

ah for pure switching of 1,8 to 3.3. Gotcha!

Stefan Schippers

09/10/2022, 12:39 AM

if you need an analog switch that uses arbitrary VREFL and VREFH levels your implementation is probably the way to go.

Steven Bos

09/10/2022, 12:41 AM

Great, I think i will continue to layout then -- after a comparison plot of 1 to 10 bit DAC.

Steven Bos

09/10/2022, 12:42 AM

I noticed tht the .raw files blow up quite well, the 7 bit is 450MB

Steven Bos

09/10/2022, 12:45 AM

Thanks for all the help @Stefan Schippers and @Eric Keiter. I will call it a night and will produce the plot tomorrow.

Stefan Schippers

09/10/2022, 1:01 AM

This switch implementation is probably safe, it will switch between arbitrary VREFL / VREFH levels anywhere between 0 and3.3V. It is similar to your design but uses 3.3V signals (X and XB) to drive the transmission gates instead of your ininv / inbuf signals at 1.8V. Also if the IN signal is driving 2 swicthes the X-XB generator can be shared between 2 swicthes. Also no insulated pwell required. Also no static consumption occurs. for any switch state, for any vrefh/vrefl combinations.

🤯 1

🙌 1

Stefan Schippers

09/10/2022, 1:12 AM

... Always plot the voltage supply currents, so you see if there are static leakage paths . In this case only during switching there is a consumption.

✅ 1

Stefan Schippers

09/10/2022, 8:05 AM

@Steven Bos i did a comparison of Xyce serial vs ngspice of your DAC10 test netlist. I changed all voltage sources to be 0 at time 0 and ramp up to their values after 1ns. I also have set the rise and fall times to 100ps and used

uic

in the

.tran

lines. Also to speed up things i simulated only the first 128usec. ngspice report:

Copy code

Transient analysis time = 269.679
Total analysis time (seconds) = 271.359

Xyce serial report:

Copy code

***** Total Simulation Solvers Run Time: 831.252 seconds
***** Total Elapsed Run Time:            1277.15 seconds
*****
***** End of Xyce(TM) Simulation

Results are correct for both ( out voltage = 3.3 * decimal(D[9:0]) / 1024 ) Ngspice used 2 cores (max 140% cpu usage) for solver and equation calculations. Xyce used only one core (max 100%). Simulation time on Xyce serial was significantly higher. (cc @Eric Keiter)

Stefan Schippers

09/10/2022, 8:22 AM

@Steven Bos the following (long!) expression in a graph (RPN notation):

Copy code

"ERROR;
d0 
d1 2 * + 
d2 4 * + 
d3 8 * + 
d4 16 * +
d5 32 * + 
d6 64 * +
d7 128 * +
d8 256 * +
d9 512 * +
1.8 /
1024 /
3.3 *
out -
"

calculates the ADC error: it calculates the theroretical voltage by summing the weighted binary bits, normalized to 1V dividing by the VCC level (1.8) and by 1024 and then normalized to 3.3V. the

out

node is subtracted from the result. This error will be significant if you do mismatch simulations.

Steven Bos

09/10/2022, 11:06 AM

Wow @Stefan Schippers! Thank you again for sharing your design experience, I think more people from this channel and other analog beginners like myself should take notice. I will go with your design (with full credits and share the github repo when submitted for tapeout). The error graph is something i was planning to do in Excel or Matlab. Doing it automatically after graph loading in xschem is of course the proper way of doing it in.

Steven Bos

09/10/2022, 11:53 AM

I am still investigating why my DACs show the inverse result, with ngspice being slower. Could you run this 5-bit DAC netlist (still based on my switch design)? It should be small enough to run a full sim. Something must be very wrong for ngspice as these are 10x differences. My logs and netlists are attached ngspice (default, using 2 thread): 220s ngspice (set threads to 12): 215s xyce serial (serial load, serial solve w/ KLU linear solver): 19s xyce parallel (parallel load, parallel solve): 28s xyce parallel (parallel load, serial solve w/

linsol klu

): 26s my ngspice sims are done with your .spiceinit file in the netlist directory. But without any of the sky130 lib and corner optimization mentioned in an earlier thread.

5bitdac_xyce_parallel.log DAC05_simulation_ngspice_12threads.spice 5bitdac_xyce_serial.log DAC05_simulation_xyce_linsol.spice DAC05_simulation_ngpsice_2threads.spice

5bitdac_xyce_parallel_linsol.log DAC05_simulation_xyce.spice

Steven Bos

09/10/2022, 12:00 PM

This is the schematic

Steven Bos

09/10/2022, 12:04 PM

Could this be hardware difference, like a new cpu instruction or more memory? I am running Ubuntu 20.04 LTS using the Windows Subsystem for Linux v2 (windows 11)

Steven Bos

09/10/2022, 12:14 PM

Though, the rom_8k test gave the same results on both our platforms: yours: 771s (ngspice) vs 980s (xyce serial) means ngpsice is +27% faster mine: 169s (ngspice) vs 267 (xyce serial) means ngpsice is +58% faster and: xyce parallel load and solve was also 169s, so ngpsice and xyce parallel tied perfectly on this circuit

Steven Bos

09/10/2022, 5:07 PM

@Stefan Schippers I managed to get similar plots to your 2:1 analog mux design. My titles are a bit wrong though, it has 1 output so should be named 1-channel analog switch

Steven Bos

09/10/2022, 5:09 PM

Though, the current variable in xyce is loaded as V1#branch while in ngspice is i(v1) meaning I cannot refresh my data in the graphs

Steven Bos

09/10/2022, 5:18 PM

The transient analysis of the same device

Steven Bos

09/10/2022, 5:19 PM

~100ps delay

Steven Bos

09/10/2022, 5:22 PM

ngspice does it in 10.5s while xyce serial in 0.8s

Stefan Schippers

09/10/2022, 8:49 PM

@Steven Bos about current names saved in the raw file: i do not save all variables with the

-r file.raw

given on command line. Without this i specify with

.prin tran format=raw ...

the variables i want to save (the xschem

spice_probe.sym

attached to the nets does exactly that). FOr currents i explictly save the ones i need as in: .

print tran format=raw i(vvcc) i(vsa) i(vl) i(vdec)

and these currents retain the name given on the .print line. If you remember the rom8k reloads all variables including currents, either from ngspice raw and from Xyce raw. However another thing i will do is to look for V1#branch (either upper/lower case) if search of i(v1) / I(V1) fails.

Steven Bos

09/11/2022, 2:02 AM

@Stefan Schippers Thanks! It is a minor nuisance, just duplicating them for both ngpsice and xyce is a good workaround for me. Other question: when designing my analog switch i frequently run both DC and tran analysis. I have currently assigned ngspice for tran and xyce for DC and thus reuse my graphs for both of them. Maybe a graph can be paired to a certain simulator or file such that both graphs can exist at the same time when doing a refresh?

Steven Bos

09/11/2022, 2:04 AM

I have not run more run time comparisons as I first want to learn about your experience with 5-bit DAC netlists send earlier.

Steven Bos

09/11/2022, 2:06 AM

In the mean time I have been exploring your level shifter combined with the transmissions gates. I noticed you used the thick oxide FETs for the level shifter and transmission gates. Will the 1.8V be damaged when supplied with 3.3V? How did you decide on the W/L?

Steven Bos

09/11/2022, 2:17 AM

Since we use a transmission gate, I thought it would be interesting to experiment with negative voltages such that the DAC can supply both. My current design uses two level shifters, one for the plus rail and one for the negative rail with each connecting to a transmission gate. To make it work I had to tweak the L/W of the negative level shifter. Is my approach with 2 level shifters a good one for this requirement and is it acceptable (manufacturable) to have different L/W transistors in a cell? As you can see in the left most graph, I still need some more tweaking for the negative level shifter. ATM it is more trying than science.

Stefan Schippers

09/11/2022, 7:18 AM

@Steven Bos I did the DAC05 comparison, DAC05_simulation_ngpsice_2threads.spice vs DAC05_simulation_xyce_linsol.spice. Got the same results as you, Xyce being much faster.

Copy code

***** Total Simulation Solvers Run Time: 51.5038 seconds
***** Total Elapsed Run Time:            89.2508 seconds
*****
***** End of Xyce(TM) Simulation
*****

However, in the ngspice file you did a '`save all`' while only selected variables are saved in Xyce. Moreover the

.tran 1n 160u uic

causes Xyce to simulate and save

timesteps in the raw file while ngspice saves

points! (No. of Data Rows : 160819) I changed the ngspice commands to save same variables as Xyce, set

tran 50n 160u uic

and simulation completed in 55 seconds:

Copy code

Total analysis time (seconds) = 24.99
Total elapsed time (seconds) = 55.076

In above run i also added

.option chgtol=4e-16 method=gear

to increase precision and better integration method. This way ngspice saved

points in raw file, a number that is comparable to Xyce. The simulators take different timesteps decisions. ngspice tends to use the given time step in saved file, while Xyce takes its own decision regardless of this parameter.

Stefan Schippers

09/11/2022, 7:28 AM

For currents that are saved either as i(vvcc) or VVCC#branch, before i will fix xschem to look for either one or the other (ngspice itself does in some cases use one or the other naming, this is really annoying and needs to be fixed) you can simply add i(vcc) and vcc#branch in the graph. The one that is found will be shown.

Stefan Schippers

09/11/2022, 7:32 AM

For having both Xyce and ngspice graphs at the same time and in the same schematic this is currently not possible. for a given schematic only one raw file can be loaded. Loading another file will unload the previous one. You can however open the same schematic in another tab (xschem gives a warning, since opening same schematic in 2 tabs can be dangerous if editing and saving both). In the new tab you can load another file.

Steven Bos

09/11/2022, 2:13 PM

@Stefan Schippers i noticed that for the rom_8k test xyce produces a 222MB raw file while ngspice produces a 23MB, that could explain why ngspice is faster in this test. Can you confirm this at your laptop? As you can see this was due to xyce recording much more variables

Steven Bos

09/11/2022, 2:51 PM

As you can see for all my circuits xyce was significantly faster. Especially the first circuit a simple analog switch with the help from @Stefan Schippers runs in 0.8s, giving it a almost real-time vibe vs 10s with ngspice. In the video you can also see the clear differences in .raw files. xyce records more variables, but ngspice more samples

IlQ6ya5cA5.mp4

Stefan Schippers

09/11/2022, 4:40 PM

for short simulations nsgpice sim time is dominated by model file parsing. @Steven Bos if you use

xky130_fd_pr/corner.sym

which feeds to ngspice only the desired corner things speed up considerably. This is of course a dirty workaround for something ngspice should fix.

Stefan Schippers

09/11/2022, 4:44 PM

@Steven Bos i have now removed in xchem the -r command line option for Xyce in the default configuration, leaving it as a comment so users can easily enable it if they wish. You can anyway easily save all voltages in a given hierarchy with

.print tran format=raw file=... v(*)

Steven Bos

09/11/2022, 8:34 PM

I have reran the ROM_8k test such that both ngpsice and xyce only sample

i(vvcc) i(vsa) i(vl) i(vdec)

by removing all the

.print tran format=raw v(xctrl:LDCPB)

from the xyce spice file. The raw file sizes are now more or less equal (~23MB), and both report almost similar variables and datapoints. The runtime statistics reported earlier do not change, meaning Xyce serial is a fair bit slower than ngspice at 2 thread, while Xyce parallel is just as fast.

Steven Bos

09/11/2022, 8:43 PM

I will see if I can improve my standard simulations with ngspice using the corner hack mentioned earlier next

Steven Bos

09/11/2022, 8:48 PM

@Stefan Schippers thanks for helping me looking into this. I hope others will benefit from it as well. Since many new users will start with small circuits using the sky130 lib, having a 10x improvement in simulation time is too good to ignore. Also, xyce parallel is a great addition to xyce serial, especially since ngspice compiled with openmp is also doing work in parallel by default. With the circuits tested so far, in none of them ngspice came out on top (incl. ROM_8k), and only tied at best. I really like to see a formal benchmark with standard circuits (big and small, out-of-the-box and simulator tuned) for a more conclusive answer. I hope the developers of ngspice will address the corner issue soon. For the developers of xyce (@Eric Keiter) much improvements can be done in the building of xyce serial and parallel. The current build process will discourage new users to try it.

Steven Bos

09/11/2022, 9:09 PM

@Stefan Schippers this comment of yours is probably the most important "_The simulators take different timesteps decisions. ngspice tends to use the given time step in saved file, while Xyce takes its own decision regardless of this parameter_" So far example

.tran 1ns 160us

will not result in 160k samples by xyce. This is quite magical. How is this dynamic time stepping done while at first glance, no loss of data (does it ignore redundant data points?). Can it be switched on with ngspice as well? Changing the stepsize in the netlist such that ngspice and xyce will have the same amount of datapoints seems cheating for the comparison. The ROM_8k test has identical .tran settings which gave good results for both ngspice and xyce. Maybe that circuit has much more signal variation and thus xyce cannot do dynamic stepsize?

Stefan Schippers

09/11/2022, 9:39 PM

Yes i think the rom8k has no idle periods. I have seen Xyce is much more efficient in jumping simulation forward if the circuit is idle, while ngspice takes longer. For the step size calculation, modern iterative solvers have also an error estimator. If the estimated error (which is in most cases a lower limit of the real error) is far lower than the allowed error, then take longer timesteps. On the other hand if the error estimate is too high reduce step and repeat calculation.

Steven Bos

09/11/2022, 9:41 PM

Thanks for confirming!

Steven Bos

09/11/2022, 9:45 PM

I am quite surprised that ngspice doesnt even attempt to do this, this is a huge feature. It is great that it follows the analysis command to the detail (eg. if you want 100 samples, you get 100 samples), but there should be an option to let this be dynamic and tweak the allowable error limits. In my circuit 160k data points vs 3k data points is like no compression vs lossless compression, the result is identical

Stefan Schippers

09/11/2022, 9:48 PM

well also ngspice speeds up considerably when circuit is idle, but it seems Xyce does that faster

Steven Bos

09/11/2022, 9:49 PM

I wonder if we can get a report about the step size change decisions so we can measure the total error

Steven Bos

09/11/2022, 9:52 PM

Based on your experience, is it more common to simulate circuits that idle than not?

Stefan Schippers

09/11/2022, 9:53 PM

I was not 100% precise when i said rom has no idle.. after the read cycles are completed @400ns simulation goes all the way to 480ns with no circuit activity. The time points saved by Xyce (xschem raw_query values time) between 400ns and end are:

Copy code

4.0220587039e-07 4.0714201077e-07 4.1701429154e-07 4.3675882466e-07 4.7624794774e-07 4.7999998287e-07

the time points in the same interval saved by ngspice are:

Copy code

4.0035197912e-07 4.0055198269e-07 4.0075198626e-07 4.0095196141e-07 4.0115196498e-07 4.0135196855e-07 4.0155197212e-07 4.0175197569e-07 4.0195197926e-07 4.0215198283e-07 4.0235198639e-07 4.0255196154e-07 4.0275196511e-07 4.0295196868e-07 4.0315197225e-07 4.0335197582e-07 4.0355197939e-07 4.0375198296e-07 4.0395198653e-07 4.0415196167e-07 4.0435196524e-07 4.0455196881e-07 4.0475197238e-07 4.0495197595e-07 4.0515197952e-07 4.0535198309e-07 4.0555198666e-07 4.0575196181e-07 4.0595196538e-07 4.0615196895e-07 4.0635197252e-07 4.0655197608e-07 4.0675197965e-07 4.0695198322e-07 4.0715198679e-07 4.0735196194e-07 4.0755196551e-07 4.0775196908e-07 4.0795197265e-07 4.0815197622e-07 4.0835197979e-07 4.0855198336e-07 4.0875198692e-07 4.0895196207e-07 4.0915196564e-07 4.0935196921e-07 4.0955197278e-07 4.0975197635e-07 4.0995197992e-07 4.1015198349e-07 4.1035198706e-07 4.1055196220e-07 4.1075196577e-07 4.1095196934e-07 4.1115197291e-07 4.1135197648e-07 4.1155198005e-07 4.1175198362e-07 4.1195198719e-07 4.1215196234e-07 4.1235196591e-07 4.1255196948e-07 4.1275197304e-07 4.1295197661e-07 4.1315198018e-07 4.1335198375e-07 4.1355195890e-07 4.1375196247e-07 4.1395196604e-07 4.1415196961e-07 4.1435197318e-07 4.1455197675e-07 4.1475198032e-07 4.1495198388e-07 4.1515195903e-07 4.1535196260e-07 4.1555196617e-07 4.1575196974e-07 4.1595197331e-07 4.1615197688e-07 4.1635198045e-07 4.1655198402e-07 4.1675195916e-07 4.1695196273e-07 4.1715196630e-07 4.1735196987e-07 4.1755197344e-07 4.1775197701e-07 4.1795198058e-07 4.1815198415e-07 4.1835195930e-07 4.1855196287e-07 4.1875196644e-07 4.1895197000e-07 4.1915197357e-07 4.1935197714e-07 4.1955198071e-07 4.1975198428e-07 4.1995195943e-07 4.2015196300e-07 4.2035196657e-07 4.2055197014e-07 4.2075197371e-07 4.2095197728e-07 4.2115198085e-07 4.2135198441e-07 4.2155195956e-07 4.2175196313e-07 4.2195196670e-07 4.2215197027e-07 4.2235197384e-07 4.2255197741e-07 4.2275198098e-07 4.2295198455e-07 4.2315195969e-07 4.2335196326e-07 4.2355196683e-07 4.2375197040e-07 4.2395197397e-07 4.2415197754e-07 4.2435198111e-07 4.2455198468e-07 4.2475195983e-07 4.2495196340e-07 4.2515196697e-07 4.2535197053e-07 4.2555197410e-07 4.2575197767e-07 4.2595198124e-07 4.2615198481e-07 4.2635195996e-07 4.2655196353e-07 4.2675196710e-07 4.2695197067e-07 4.2715197424e-07 4.2735197781e-07 4.2755198137e-07 4.2775198494e-07 4.2795196009e-07 4.2815196366e-07 4.2835196723e-07 4.2855197080e-07 4.2875197437e-07 4.2895197794e-07 4.2915198151e-07 4.2935198508e-07 4.2955196022e-07 4.2975196379e-07 4.2995196736e-07 4.3015197093e-07 4.3035197450e-07 4.3055197807e-07 4.3075198164e-07 4.3095198521e-07 4.3115196036e-07 4.3135196393e-07 4.3155196749e-07 4.3175197106e-07 4.3195197463e-07 4.3215197820e-07 4.3235198177e-07 4.3255198534e-07 4.3275196049e-07 4.3295196406e-07 4.3315196763e-07 4.3335197120e-07 4.3355197477e-07 4.3375197833e-07 4.3395198190e-07 4.3415198547e-07 4.3435196062e-07 4.3455196419e-07 4.3475196776e-07 4.3495197133e-07 4.3515197490e-07 4.3535197847e-07 4.3555198204e-07 4.3575198561e-07 4.3595196075e-07 4.3615196432e-07 4.3635196789e-07 4.3655197146e-07 4.3675197503e-07 4.3695197860e-07 4.3715198217e-07 4.3735198574e-07 4.3755196089e-07 4.3775196445e-07 4.3795196802e-07 4.3815197159e-07 4.3835197516e-07 4.3855197873e-07 4.3875198230e-07 4.3895198587e-07 4.3915196102e-07 4.3935196459e-07 4.3955196816e-07 4.3975197173e-07 4.3995197530e-07 4.4015197886e-07 4.4035198243e-07 4.4055198600e-07 4.4075196115e-07 4.4095196472e-07 4.4115196829e-07 4.4135197186e-07 4.4155197543e-07 4.4175197900e-07 4.4195198257e-07 4.4215198614e-07 4.4235196128e-07 4.4255196485e-07 4.4275196842e-07 4.4295197199e-07 4.4315197556e-07 4.4335197913e-07 4.4355198270e-07 4.4375198627e-07 4.4395196142e-07 4.4415196498e-07 4.4435196855e-07 4.4455197212e-07 4.4475197569e-07 4.4495197926e-07 4.4515198283e-07 4.4535198640e-07 4.4555196155e-07 4.4575196512e-07 4.4595196869e-07 4.4615197226e-07 4.4635197582e-07 4.4655197939e-07 4.4675198296e-07 4.4695198653e-07 4.4715196168e-07 4.4735196525e-07 4.4755196882e-07 4.4775197239e-07 4.4795197596e-07 4.4815197953e-07 4.4835198310e-07 4.4855198666e-07 4.4875196181e-07 4.4895196538e-07 4.4915196895e-07 4.4935197252e-07 4.4955197609e-07 4.4975197966e-07 4.4995198323e-07 4.5015198680e-07 4.5035196194e-07 4.5055196551e-07 4.5075196908e-07 4.5095197265e-07 4.5115197622e-07 4.5135197979e-07 4.5155198336e-07 4.5175198693e-07 4.5195196208e-07 4.5215196565e-07 4.5235196922e-07 4.5255197278e-07 4.5275197635e-07 4.5295197992e-07 4.5315198349e-07 4.5335198706e-07 4.5355196221e-07 4.5375196578e-07 4.5395196935e-07 4.5415197292e-07 4.5435197649e-07 4.5455198006e-07 4.5475198363e-07 4.5495198719e-07 4.5515196234e-07 4.5535196591e-07 4.5555196948e-07 4.5575197305e-07 4.5595197662e-07 4.5615198019e-07 4.5635198376e-07 4.5655195891e-07 4.5675196247e-07 4.5695196604e-07 4.5715196961e-07 4.5735197318e-07 4.5755197675e-07 4.5775198032e-07 4.5795198389e-07 4.5815195904e-07 4.5835196261e-07 4.5855196618e-07 4.5875196975e-07 4.5895197331e-07 4.5915197688e-07 4.5935198045e-07 4.5955198402e-07 4.5975195917e-07 4.5995196274e-07 4.6015196631e-07 4.6035196988e-07 4.6055197345e-07 4.6075197702e-07 4.6095198059e-07 4.6115198415e-07 4.6135195930e-07 4.6155196287e-07 4.6175196644e-07 4.6195197001e-07 4.6215197358e-07 4.6235197715e-07 4.6255198072e-07 4.6275198429e-07 4.6295195943e-07 4.6315196300e-07 4.6335196657e-07 4.6355197014e-07 4.6375197371e-07 4.6395197728e-07 4.6415198085e-07 4.6435198442e-07 4.6455195957e-07 4.6475196314e-07 4.6495196671e-07 4.6515197027e-07 4.6535197384e-07 4.6555197741e-07 4.6575198098e-07 4.6595198455e-07 4.6615195970e-07 4.6635196327e-07 4.6655196684e-07 4.6675197041e-07 4.6695197398e-07 4.6715197755e-07 4.6735198111e-07 4.6755198468e-07 4.6775195983e-07 4.6795196340e-07 4.6815196697e-07 4.6835197054e-07 4.6855197411e-07 4.6875197768e-07 4.6895198125e-07 4.6915198482e-07 4.6935195996e-07 4.6955196353e-07 4.6975196710e-07 4.6995197067e-07 4.7015197424e-07 4.7035197781e-07 4.7055198138e-07 4.7075198495e-07 4.7095196010e-07 4.7115196367e-07 4.7135196724e-07 4.7155197080e-07 4.7175197437e-07 4.7195197794e-07 4.7215198151e-07 4.7235198508e-07 4.7255196023e-07 4.7275196380e-07 4.7295196737e-07 4.7315197094e-07 4.7335197451e-07 4.7355197808e-07 4.7375198164e-07 4.7395198521e-07 4.7415196036e-07 4.7435196393e-07 4.7455196750e-07 4.7475197107e-07 4.7495197464e-07 4.7515197821e-07 4.7535198178e-07 4.7555198535e-07 4.7575196049e-07 4.7595196406e-07 4.7615196763e-07 4.7635197120e-07 4.7655197477e-07 4.7675197834e-07 4.7695198191e-07 4.7715195706e-07 4.7735198905e-07 4.7755196420e-07 4.7775199619e-07 4.7795197133e-07 4.7815194648e-07 4.7835197847e-07 4.7855195362e-07 4.7875198561e-07 4.7895196076e-07 4.7915199275e-07 4.7935196790e-07 4.7955199989e-07 4.7975197504e-07 4.7995195018e-07 4.7999998287e-07

Steven Bos

09/11/2022, 9:55 PM

Have you done this test with the xyce netlist without

.print tran format=raw v(xctrl:LDCPB)

Steven Bos

09/11/2022, 9:58 PM

Ngpsice has so much datapoints in that period while the the final .raw files are near identical in terms of total data points. That means that in other parts of the simulation xyce must be sampling a lot more while ngpsice doesnt

Steven Bos

09/11/2022, 10:01 PM

My raw files

rom8k_raw_files.zip

Stefan Schippers

09/11/2022, 10:04 PM

the rom8k circuit contains spice probes, these automatically create a .print tran format=raw v(<netname>) of the net they are attached to. This way a limited set of nodes is saved. The above time values represents the times at which all variables are saved in the raw file. Unfortunately the raw file is extremely simple (and this makes it extremely easy to read) but if there is one node changing in a circuit with 100k nodes all other 99999 nodes are saved at that time point.

Steven Bos

09/11/2022, 10:06 PM

I had to remove all these

.print tran format=raw v(xctrl:LDCPB)

otherwise the xyce raw file exploded to 222 MB (10x), see earlier threads. Without these lines, both have near identical variables and datapoints

Stefan Schippers

09/11/2022, 10:06 PM

... And yes, i believe Xyce does a much better sampling around fast transitions .. ngspice also does that but with less deviation from the user specified time step. At least this is my impression.

✅ 1

Steven Bos

09/11/2022, 10:07 PM

sorry not data points, but only the variables exploded, which was 10x ( 700 variables vs 7000 with those extra prints)

Stefan Schippers

09/11/2022, 10:12 PM

that is strange, the spice_probe elements cause ~700 variables to be saved. This works if Xyce is launched without the -r option. The -r was included in the default Xyce lanch command. I have now removed this by default. There is also one thing that needs to be fixed. Ngspice understands a .save xxx placed inside a subcircuit. If the subcircuit is instantiated as X1 in the top the x1.xxx is saved . If there is also another x2 instance of it also x2.xxx is saved. Xyce does not understand spice robes placed in lower hierarchies, so i had to manually write the ,print tran format=raw .... for these lower level nodes at the top manually.

Steven Bos

09/11/2022, 10:15 PM

The test with 700 variables for both xyce and ngspice was done today after I pulled your latest version. I ran xyce without the -r option, but removed those print lines before i ran it

Stefan Schippers

09/11/2022, 10:17 PM

Anyway thanks to your tests and comments i have done some commits to make the handling of different simulators easier. The terrible combination of upper/lowe/mixed case of nodes saved was slowing down node lookups in graphs. I have now decided that regardless of the simulator, when xschem loads the raw file all nodes are converted to lower case (ngspice convention) and all hierarchy separators converted to "." (ngspice convention, xyce uses ":").

👍 1

Stefan Schippers

09/11/2022, 10:18 PM

@Steven Bos if you did not have -r and removed the .print lines where did Xyce get the list of nodes to be saved?

Steven Bos

09/11/2022, 10:19 PM

it saved the .raw files in .xschem/simulations

Steven Bos

09/11/2022, 10:21 PM

I startup xschem from /home. Load rom_8k.sch from /usr/local/share/doc/xschem/rom_8k and than hit the simulate launchers. I dont check

use simulation dir under current schemtatic dir

Steven Bos

09/11/2022, 10:33 PM

Sorry you meant something else i think. The only print line I left out was

.print tran format=raw i(vvcc) i(vsa) i(vl) i(vdec)

Steven Bos

09/11/2022, 10:34 PM

which is similar to

save tran i(vvcc) i(vsa) i(vl) i(vdec)

used in the ngspice netlist

Stefan Schippers

09/11/2022, 10:38 PM

ok i understand. The above line just adds 4 currents. to the ~700 nodes, so should not increase the output file that much.

Steven Bos

09/11/2022, 10:43 PM

I like the probe symbol feature vs writing it using save. It is much less error prone. I will use it in the future

Stefan Schippers

09/12/2022, 1:10 PM

@Steven Bos there is another big example in the xschem_sky130 tests:

sky130_tests/test_carry_lookahead.sch

. This example does a comparison between a ripple carry 32 bit and 256 bit adder vs 32 bit and 256 bit carry lookahead adders. The design has 27k devices and 80k unknowns:

Copy code

***** Device Count Summary ...
       C level 1 (Capacitor)                   3763
       M level 14 (BSIM4)                     23010
       V level 1 (Independent Voltage Source)   515
       --------------------------------------------
       Total Devices                          27288
***** Setting up matrix structure...
***** Number of Unknowns = 80343

ngspice and Xyce take approx the same time for simulating this. Ngspice:

Copy code

Total analysis time (seconds) = 2710.63
Total elapsed time (seconds) = 4191.515

Xyce:

Copy code

***** Total Simulation Solvers Run Time: 2103.34 seconds
***** Total Elapsed Run Time:            4147.71 seconds
*****
***** End of Xyce(TM) Simulation

Steven Bos

09/12/2022, 3:55 PM

Excellent! I will run it later today. Since this is the second non-trivial test with surely more to come, a simulator benchmark suite in the xschem repo could be interesting!

Stefan Schippers

09/12/2022, 4:20 PM

nice!. Get the updated versions at https://github.com/StefanSchippers/xschem_sky130/blob/main/sky130_tests/test_carry_lookahead.sch since i updated it to work with Xyce and ngspice. One big thing that i need to test is to verify if and how mismatch simulations can be done with Xyce, like the one done with ngspice on

sky130_tests/test_comparator,.sch

Eric Keiter

09/12/2022, 4:31 PM

Hi @Stefan Schippers and @Steven Bos, it looks like I missed a lot of posts here over the weekend! I’ll try to answer your questions, but I’m not sure if I’ve read them all.

Eric Keiter

09/12/2022, 4:32 PM

It looks like a lot of the questions pertain to dynamic time-stepping and/or output files? If so I can make some very general comments.

Steven Bos

09/12/2022, 4:33 PM

Hi @Eric Keiter, yes indeed since that seems to be a big differentiator between ngspice and xyce

Eric Keiter

09/12/2022, 4:34 PM

One comment, that might be relevant to the output file size: When using the command line “-r”, the resulting raw file will contain every solution variable, whether you want them or not. Alternatively, using

.print tran format=raw v(…)

in the netlist, you can reduce the number of outputs to only be the ones you want. Also, you can then include outputs that aren’t in the solution vector (like a lot of lead currents).

Eric Keiter

09/12/2022, 4:36 PM

Regarding dynamic time stepping, and also outputs one thing to understand is that many codes (including, possibly ngspice) do sampled output. So, if you are running Hspice for example, I think the points that appear in the output file are not the specific time steps used by the solver. They’ve been sampled/interpolated to reduce the size of the output file. By default, Xyce simply outputs the results for every time step used by the solver. But, Xyce can be instructed to do sampled output.

Eric Keiter

09/12/2022, 4:37 PM

I’m not sure if that is the nature of the difference you observed between Spice and ngspice, but I wanted to mention it.

Steven Bos

09/12/2022, 4:38 PM

@Stefan Schippers updated xschem to run xyce without the -r flag for this reason since we tried to create a fair comparison with equal sampled variables

👍 1

Steven Bos

09/12/2022, 4:41 PM

By default Xyce outputs every time step? In our test of a .tran 1n 160u we expected to see 160k datapoints which ngspice recorded but Xyce recorded only 3k (causing a huge speed up). The quality / output signal was (seemingly) identical

Eric Keiter

09/12/2022, 4:42 PM

Regarding sampled output, if you want Xyce to just output at certain intervals (to reduce output filesize), you can add this command to the netlist:

.OPTIONS OUTPUT INITIAL_INTERVAL=1e-3

If you do this, then it will output every 1e-3 seconds. It is also possible to have different intervals for different windows of time (although I don’t recall the precise command at the moment)

👍 1

Eric Keiter

09/12/2022, 4:46 PM

@Steven Bos Interesting. I can’t speak to what ngspice is necessarily doing, but if I had to guess, their time integrator is doing dynamic time stepping, but it is sampling the output at every 1ns. Pretty much all circuit simulators are doing dynamic time stepping under the hood.

Eric Keiter

09/12/2022, 4:46 PM

Xyce is doing dynamic time stepping as well, but we definitely were not the first to do it.

Eric Keiter

09/12/2022, 4:49 PM

FYI, if you are curious about how the dynamic time stepping is done, it is mostly based on local truncation error analysis. At each step, the integrator makes an explicit prediction, which serves as the initial guess for the step. Then it does an implicit solve for the step, which is the actual solution candidate. It then compares the prediction to this “corrector”, and based on this comparison can make an estimate as to how big the truncation error was for this step. If it is too big, then it rejects the step and takes a smaller one. If it is really small, it increases the stepsize for the next step.

Eric Keiter

09/12/2022, 4:50 PM

There are other constraints to time step, beyond local truncation error (LTE). For example, if you have PWL or Pulse sources, they create known discontinuities in the signal. The time stepping has to land precisely on those discontinuities and restart.

Eric Keiter

09/12/2022, 4:51 PM

Also, if the Newton solve fails (which is the process by which it computes the “corrector”), then the step fails and it cuts the stepsize by a fixed fraction. (In xyce the next attempted step is 1/8 the size of the failed one). The LTE analysis won’t be performed in this case, as there isn’t a valid corrector to usue.

Eric Keiter

09/12/2022, 4:51 PM

Anyway, at a high level this is what most codes do.

Eric Keiter

09/12/2022, 4:52 PM

But I think that Xyce is the only one that by default outputs every time step, rather than interpolating. Most codes can (like Hspice) can be forced to output every step if that is desired. And, Xyce (as noted) can be forced to use interpolated output. But the default behavior is different.

Eric Keiter

09/12/2022, 4:53 PM

I would guess that ngspice can be forced to output evey step rather than sampling. But I don’t know that code well enough to know the command.

Steven Bos

09/12/2022, 4:55 PM

For my circuits these speedup are not isolated events. Every circuit so far benefitted from the better dynamic time step in Xyce (see video above). Out-of-the-box experiences for small circuits are 10x in my case, almost real-time (although this in big part due to a good parsing support of the sky130 lib in ngspice). @Stefan Schippers and I will be testing both ngspice and xyce serial/parallel with more digital and analog focussed small/mid/large circuit test to see how both fair. One comment from Stefan mentioned above: " .. And yes, i believe Xyce does a much better sampling around fast transitions .. ngspice also does that but with less deviation from the user specified time step. At least this is my impression."

Eric Keiter

09/12/2022, 4:55 PM

OK! I haven’t been able to digest this whole thread yet. I’m certainly always happy to hear the Xyce is faster. 🙂.

Eric Keiter

09/12/2022, 4:56 PM

Are these test cases with Sky130 or with the simpler model cards? I’ve been told anecdotally that Xyce seems to parse the Sky130 files more quickly. But that faster parsing would only matter for a large number of PDK files.

Steven Bos

09/12/2022, 4:57 PM

We users too, if it doesnt lead to inferiour data of course.

👍 1

Steven Bos

09/12/2022, 4:57 PM

Yes test are with sky130. I wouldnt say more quickly. 0.8s vs 12s for a simple inverter or simple analog switch is out-of-the-park. And for my DAC circuit the speedup persists. We are very curious why.

👍 1

Eric Keiter

09/12/2022, 4:58 PM

OK. If there is a way to compare the parse/setup times, I’d be curious about that. For a really long simulation, the parse time won’t matter very much, of course.

Eric Keiter

09/12/2022, 4:58 PM

I have to run to a meeting, but I’ll check in here later.

👍 1

Steven Bos

09/12/2022, 5:27 PM

Sorry, mashed my questions in one post. "_..if the Newton solve fails (which is the process by which it computes the “corrector”), then the step fails and it cuts the stepsize by a fixed fraction. (In xyce the next attempted step is 1/8 the size of the failed one). The LTE analysis won’t be performed in this case, as there isn’t a valid corrector to usue.._" 1) I wonder how ngspice implements this part. If the fixed fraction is say 1/4 in ngspice that would be huge. Can we change this fraction in xyce and see how it effects simulation time? Also, we mostly use PWL and Pulse sources. "_But I think that Xyce is the only one that by default outputs every time step, rather than interpolating. Most codes can (like Hspice) can be forced to output every step if that is desired. And, Xyce (as noted) can be forced to use interpolated output. But the default behavior is different._" 2) Would interpolated output result in a speed up (at the cost of uncertainty in data)? If ngspice is doing this we should set xyce to the same setting (or change ngspice) because equal quality output is important for fair comparison. Although TBH, so far we couldnt visually detect any difference in output quality. 3) Can the LTE be logged at every time step change decision? That way we trace the time steps in a simulation.

Eric Keiter

09/12/2022, 11:39 PM

Regarding (1), I don’t expect for most digital circuits that you’ll get a ton of Newton solve failures. The stepsize is much more likely to be adjusted based on LTE. Newton solve failures are much more likely in highly nonlinear analog circuits. Or if you have model implementation problems (which would be my fault, if true … )

Eric Keiter

09/12/2022, 11:42 PM

Interpolated output can result in faster run times if the total number of outputs is significantly different otherwise. If you have a circuit where there are long periods of time in which not much happens, the adaptive step size algorithm should take a relatively small number of steps. If you do interpolated output with a very small interval, you’ll wind up outputting a lot more, which would probably slow things down.

Eric Keiter

09/12/2022, 11:42 PM

Or, alternatively, you could have a complex waveform (like a high-q oscillator taking a long time to settle) where there is a really small time step. In that case, the solver will use a LOT of steps, and if you do interpolated output at a much larger interval, you’ll get a much smaller output file and possibly a faster simulation.

Eric Keiter

09/12/2022, 11:43 PM

Regarding (3) In the more verbose builds of Xyce the results of the LTE calculation are output to the screen (stdout/terminal output). In the default build, however, I don’t think there is currently a way to do this.

Eric Keiter

09/12/2022, 11:44 PM

However, generally, your LTE algorithm is controlled by the time integrator tolerances, reltol and abstol, which are set on the .options timeint line.

Steven Bos

09/13/2022, 2:38 PM

Thanks for these answers @Eric Keiter! This gives a few knobs to turn. Especially interpolated output vs non-interpolated. I have to check if reltol and abstol is identical to ngspice.

Steven Bos

09/13/2022, 3:16 PM

@Stefan Schippers I tried to run the test but it is missing the

stimuli_test_carry_lookahead.cir

file. Could you commit that?

Stefan Schippers

09/13/2022, 3:35 PM

@Steven Bos you can generate this file yourself, by going to SImulation-> Utile Stimuli editor (GUI) and pressing '`Translate`'

Steven Bos

09/13/2022, 4:00 PM

@Stefan Schippers Yes but doesnt that need some input file as well? When I press SImulation-> Utile Stimuli editor (GUI) it is empty. I recall that rom_8k had a stimuli.rom8k file but cannot find such a file for this test

Stefan Schippers

09/13/2022, 4:10 PM

@Steven Bos ok then i have to check.

Stefan Schippers

09/13/2022, 4:32 PM

ok, @Steven Bos i have checked in `sky130_tests/stimuli.test_carry_lookahead`'. update your repo, then copy this file into the simulation directory and do the '`Translate`' step. This copy into simulation directory is boring and i need to fix that, this file should be looked up in the directory where the schematic is. For now accept this extra step.

👍 1

Eric Keiter

09/13/2022, 5:14 PM

@Steven Bos Glad I could help. I should mention that in my experience the main reason users turn on the interpolated output in Xyce is to reduce the size of the output file. Some of our users run a lot of very long running analog circuits, and they can wind up having millions of time steps by the time they are complete. For plotting purposes, etc, you often just don’t need that many points, and they’d rather not fill up their drive with huge files. As far as overall runtime goes, I usually only expect these IO differences to matter when the circuits aren’t too large. Once things get big enough the solver time dominates. But I’ve never really done a systematic study of it.

Steven Bos

09/13/2022, 5:17 PM

We try to mention solve time next to runtime in our test so that eventually we can see them side-by-side for several circuits. Is there a quick way to find all the default settings (eg. what is the default RELTOL in xyce?)

Eric Keiter

09/13/2022, 5:21 PM

For the time integrator, the current defaults are ABSTOL=1.0E-6 andRELTOL = 1.0E-3.

Steven Bos

09/13/2022, 5:23 PM

it seems that ngspice has the same RELTOL but a lower ABSTOL

Eric Keiter

09/13/2022, 5:23 PM

It is possible that they aren’t using it in quite the same way.

Steven Bos

09/13/2022, 5:24 PM

Implementation differences you mean?

Eric Keiter

09/13/2022, 5:25 PM

Some simulators set different tolerances for different types of variables. So, for example, they’d have a different ABSTOL for charge than for current. We have not done that.

Eric Keiter

09/13/2022, 5:26 PM

The relative tolerance inherently handles whatever the natural/typical magnitude of each variable.

Steven Bos

09/13/2022, 5:27 PM

Ah ok. I recon at some point in the comparison we just need to accept some implementation differences.

Eric Keiter

09/13/2022, 5:27 PM

Yes, I think so.

Eric Keiter

09/13/2022, 5:30 PM

Two settings we also have, which may be of interest to you are:

.options timeint NEWLTE

and

.options timeint ERROPTION

The first one (NEWLTE) allows you to set what the “reference” used by RELTOL. The other one (ERROPTION) allows you to completely disable local trunction error, and only rely on nonlinear solver behavior to set the time step. This second one (ERROPTION) is mostly a last resort when someone has a really difficult circuit that won’t run otherwise. But it often runs very fast (albeit probably less accurately).

Eric Keiter

09/13/2022, 5:32 PM

The

.options timeint NEWLTE

can be set to 0,1,2 or 3. The meaning is as follows:

Steven Bos

09/13/2022, 5:32 PM

Interesting! I will try them. I am now running another test made by @Stefan Schippers, the carry lookahead test.

Steven Bos

09/13/2022, 5:39 PM

@Stefan Schippers I assume that the Xyce simulation be run with the -r option for this carry lookahead test?

Stefan Schippers

09/13/2022, 5:49 PM

@Steven Bos add the following command in the Xyce command block:

.print tran format=raw file=carry_lookahead_xyce.spice.raw v(*)

I am restructuring all examples to run without the -r command line option, this example was not completed yet.

Stefan Schippers

09/13/2022, 9:27 PM

@Steven Bos the

test_carry_lookahead

example has 80k unknowns, using -r will save all internal nodes, which are not interesting and will generate a huge raw file. the .print above just saves the top level voltages which is enough for the test.

Stefan Schippers

09/14/2022, 9:53 PM

@Steven Bos regarding ngspice killed, are you running inside a container? if so check process / memory limits in the setup/preferences. If not using containers/sandboxes check your limits with

ulimit -a

, though it is usually unrestricted as far as i know. If using a virtual machine / WSL try to see if there are restrictions. I am not familiar with these, so can not help much. If everything is set up correctly then it might be a ngspice problem.

Eric Keiter

09/14/2022, 9:53 PM

Hello @Steven Bos, It looks like the parallel simulation is using the parallel linear solver rather than KLU. You can tell this by the message:

Eric Keiter

09/14/2022, 9:55 PM

If that was the intent, that is fine. But the reason that the parallel linear solve needs 4:22.302 and the serial linear solve needs 53.103 is this issue. At this size, serial KLU is more efficient than parallel preconditioned GMRES.

Eric Keiter

09/14/2022, 9:55 PM

If running with KLU you won’t ever see the singleton warnings.

Eric Keiter

09/14/2022, 9:55 PM

Or the Hypergraph message

Stefan Schippers

09/14/2022, 9:59 PM

@Steven Bos i saw the screenshot. Based on past experience the (terse!) message '`Killed`' means someone else killed (in the rude way, signal 9) ngspice. This is the exact message i get if i send a kill -9 to the ngspice process while running.

Eric Keiter

09/14/2022, 10:00 PM

Regarding the runtime vs. simtime question. That is a good quesion. I notice in the logs that the “distribute devices” phase of the setup is taking 11 minutes in serial and 18 minutes in parallel. That is a long time; I’m not sure I can explain why that is happening.

Eric Keiter

09/14/2022, 10:00 PM

Is this a circuit that would be easy to share with us? There might be a weird bottleneck in the setup.

Eric Keiter

09/14/2022, 10:01 PM

Usually, even for very large circuits and PDKs the setup time is faster than that.

Stefan Schippers

09/14/2022, 10:05 PM

@Eric Keiter @Steven Bos I did the tests on Xyce serial and a very long time is spent parsing the netlist. Ngspice also had similar results, ~50% of the time spent processing / parsing, then starting the simulaton.

Stefan Schippers

09/14/2022, 10:07 PM

@Eric Keiter the circuit is highly hierarchical, it contains (among other fhings ) a 256 bit adder, which is implemented as 4x 64 bit adders, implemented as 4x 16 bit adders, implemented as 4x4 bit adders, implemented as 4 1-bit full adders.... Don't know if this means something for the simulator ability to parse the whole thing.

Eric Keiter

09/14/2022, 10:08 PM

@Stefan Schippers @Steven Bos Interesting. I had previously been told anecdotally that Xyce handled the setup a lot faster than ngspice. But that probably wasn’t a systematic comparison, and it may have been a very problem dependent observation.

Eric Keiter

09/14/2022, 10:09 PM

Usually, I expect hierarchical netlists to be a bit faster to parse, if only b/c the file IO is less than it would be for its flattened equivalent. But that is just me speculating.

Stefan Schippers

09/14/2022, 10:12 PM

@Eric Keiter the results of Xyce (serial) and ngspice made by me on this design (27k devices, 80k unknowns)were posted above: Ngspice:

Copy code

Total analysis time (seconds) = 2710.63
Total elapsed time (seconds) = 4191.515

Xyce:

Copy code

***** Total Simulation Solvers Run Time: 2103.34 seconds
***** Total Elapsed Run Time:            4147.71 seconds
*****
***** End of Xyce(TM) Simulation

so not very different. Results were good for both and raw sizes were comparable (Xyce aves less timepoints when circuit is idle, ngspice somewhat more).

Eric Keiter

09/14/2022, 10:13 PM

Interesting result. Thanks!

Stefan Schippers

09/14/2022, 10:13 PM

see both take a considerable time to get the netlist down their throaths.

Steven Bos

09/15/2022, 7:24 AM

@Stefan Schippers Indeed ngspice rapidly hogs all the 15GB main +4GB swap memory in about 3 minutes and at that point gets killed

Steven Bos

09/15/2022, 7:25 AM

I tried running it in batch mode hoping that it would offload some memory usage on disk, but apparently that only works for solver (writing it directly to .raw instead of memory). It gets killed while doing the parsing, so before doing any .tran

Steven Bos

09/15/2022, 7:26 AM

I will ask around in the ngspice forums on sourceforge, too bad the devs dont have a slack channel in this community

Steven Bos

09/15/2022, 7:52 AM

@Eric Keiter I will test Xyce parallel again with

.options LINSOL type=klu

and see how it performs. BTW all test we do are open source and are either shared here or in the xschem repo. I am thinking about a simulator benchmark/test suite probably as a separate repo where all the schematics are centralized and users can submit PR of their own schematics. But before that a good test method is needed. I can imagine parsing run time , file I/O, solving, total run time good candidates

👍 1

Steven Bos

09/15/2022, 7:59 AM

For me xyce starts up really fast with sky130 regardless of circuit compared to ngspice with only

Copy code

.lib /usr/local/share/pdk/sky130A/libs.tech/ngspice/sky130.lib.spice tt

but this test does a bit more with

Copy code

.include /usr/local/share/pdk/sky130A/libs.ref/sky130_fd_sc_hd/spice/sky130_fd_sc_hd.spice
.include /usr/local/share/pdk/sky130A/libs.tech/ngspice/corners/tt.spice
.include /usr/local/share/pdk/sky130A/libs.tech/ngspice/r+c/res_typical__cap_typical.spice
.include /usr/local/share/pdk/sky130A/libs.tech/ngspice/r+c/res_typical__cap_typical__lin.spice
.include /usr/local/share/pdk/sky130A/libs.tech/ngspice/corners/tt/specialized_cells.spice

Stefan Schippers

09/15/2022, 8:37 AM

@Steven Bos try giving ngspice only the corner it needs using the component

sky130_fd_pr/corner.sym

. (if you are using my test it is probably alrready done). Also ensure you have the

.spiceinit

file in the simulation directory with following content:

Copy code

set ngbehavior=hsa
set ng_nomodcheck

May be the 2nd line does help, not sure.

Stefan Schippers

09/15/2022, 8:42 AM

I have only 4GB ram and both simulators peaked around 1.2 - 1.5GB resident (not Virtual) memory during the run time.

Steven Bos

09/15/2022, 3:29 PM

Again, great call @Stefan Schippers. I ran this example from the .xschem folder that didnt have a .spiceinit file. With this file (and probably the

set ng_nomodcheck

line) memory never exceeded 1.8 GB. The result: Ngspice 2 threads 1458s (773s) and raw file 1381 (1351)

ngspice.log

Steven Bos

09/15/2022, 3:36 PM

I am now running the parallel with linsol and than make another TLDR; post

Steven Bos

09/15/2022, 4:01 PM

TLDR; @Stefan Schippers laptop and my laptop both report identical ranking and file size but other durations due to hardware specification. Xyce parallel was not yet tested by Stefan. Two mid-sized circuits were tested: total runtime (solver time) and raw file #variables (#data points) ROM_8K ------------------ ngspice 2 threads 169s (168s) xyce parallel 170s (169s) xyce parallel klu 174s (173s) xyce serial 268s (267s) CARRY_LOOKAHEAD (256-bit adder) ------------------- xyce serial 1218s (515s) and raw file 10799 (1448) ngspice 2 threads 1458s (773s) and raw file 1381 (1351) xyce parallel klu 1500s (451s) and raw file 10799 (1448) xyce parallel 1608s (509s) and raw file 10799 (1448) CC. @Eric Keiter!

xyce_parallel_klu.log

Steven Bos

09/15/2022, 4:07 PM

I am now finishing up my new 1 to 10 bit pulsed DACS (thanks to your help @Stefan Schippers and @Mitch Bailey) and will pair ngspice and xyce again, but now for these smaller circuits. When I find some time I will use the hspice version of the sky130 lib to test that as well for the circuits mentioned here.

Eric Keiter

09/15/2022, 4:37 PM

@Steven Bos thanks for the info. I am still a bit surprised that KLU isn’t faster. One other thought that I have, looking at the xyce_parallel_klu.log file is that the amount of warnings is really excessive. The file is 25MB and contains 473778 lines! If I edit out the warnings, the log file is only 246 lines. We’ve talked about adding a setting so that Xyce will optionally throttle or suppress warnings like this, as this many is really excessive and not useful.

Eric Keiter

09/15/2022, 4:39 PM

I am curious if you see similar warnings from ngspice

Eric Keiter

09/15/2022, 4:44 PM

Part of why I ask is that I want to double check if these warnings represent something we need to fix. I’ve seen these warnings before but haven’t had time to dig into them. They are coming from one of the BSIM models (probably BSIM4). I know that commercial simulators have often added extra geometrical parameters to the BSIM models, that were not part of the original model from Berkeley. I also think that ngspice has implemented some of these extra geometrical params. So, I’d like to check if these warnings are happening b/c the Sky130 PDK is using those parameters and Xyce is incorrectly not using them. Other than the mountain of warnings, they don’t seem to be causing the answers to be substantially different, so there is that.

Steven Bos

09/15/2022, 7:05 PM

Ha! Did i forgot to add the ngspice log file? I will add it first thing tomorrow. Note that parallel with klu is faster than all the other solvers, except not in total runtime. I don't recall seeing that many errors. TBF @Stefan Schippers mentioned the .spiceinit file for ngspice. That file has a line 'nomodcheck' that solved a memory issue in the parse phase . Not sure if xyce can also do something similar. I know from for example Blender that the obj parser was recently sped up hugely because nobody touched&analyzed that piece of code.

Steven Bos

09/15/2022, 7:13 PM

Great that you look into this. The less warnings, the more comfortable people are when simulating. We definitely noticed the effects of heavy I/O slowing down total runtime. Surprisingly the #variables in the second test was 7x as much as ngspice but xyce serial was 240sec faster and this was almost pure solving time.

Eric Keiter

09/15/2022, 10:16 PM

One of my colleagues pointed out to me that the version of the BSIM4 model you are probably running is 4.6.1, which has until recently been the only version in the Xyce source code. However, there are two newer versions commonly in use: 4.7 and 4.8. My colleague very recently (last week) added 4.7.0 to the Xyce source and added 4.8 to Xyce over the past few days. According to him, the function that is producing all the warnings got changed in 4.7.0. So, possibly if you run this with a build of Xyce that includes the 4.7 model these warnings will go away. I haven’t tried this yet; it is just a theory. But it would be convenient if that turns out to be the case.

Steven Bos

09/16/2022, 1:35 PM

@Eric Keiter I have not found any reference to BSIM 4.8 has that been merged to master yet? BSIM 4.7 has indeed been added 10 days ago

Stefan Schippers

09/16/2022, 1:41 PM

Results are in line with mine (if scaled to a slower computer, lol) ~50% of the time is spend parsing the netlist and building internal data structures. about the

-r

flag it not only forces raw writing, it also saves all variables if no

.save

lines are present in the netlist. This is (i believe) different from Xyce, which is saving everything if

-r

is specified, even if

.print

lines are present in the netlist. @Steven Bos I found a very different behavior between ngspice and Xyce if doing a .dc analysis and some capacitors with a IC=.. condition are present in the netlist. Ngspice removes all capacitors (and shorts all inductors) in DC analysis (this makes sense since frequency is 0) Xyce replaces capacitors with an IC condition with voltage sources with value equal to the IC voltage in a .DC simulation. This is something we must be very careful about. IC conditions are typically used in transient analysis to define an initial state, In my experience nobody did care about IC conditions if doing a DC analysis, but for Xyce it affects DC results. This is the behavior described in the Xyce reference manual: "If one is doing a transient with DC operating point calculation or a DC operating point analysis, the initial condition is applied by inserting a voltage source across the capacitor to force the operating point to find a solution with the capacitor charged to the specific voltage. The resulting operating point will be one that is consistent with the capacitor having the given voltage in steady state".

🌍 1

Steven Bos

09/16/2022, 3:12 PM

Good catch @Stefan Schippers!

Kunal

09/18/2022, 5:11 AM

@skandha deepsita @akhendra kumar padavala

❤️ 1

Kunal

09/18/2022, 5:17 AM

@Dr. P Akhendra Kumar

❤️ 1

34 Views

Open in Slack

Previous Next