C-based simulation faster than HDL-based simulation? 
Author Message
 C-based simulation faster than HDL-based simulation?

Hi,
  Does anyone know why simulation based on compiled C code that is
converted from HDL is supposed to be faster than simulation based on
compiled HDL? Is it because C compilers are more efficient than HDL
compilers? Or is it because C can handle event scheduling more
efficiently than HDL? Or is it because of some limitation in HDL?

Thanks,
TigerShark



Mon, 12 Sep 2005 23:28:55 GMT  
 C-based simulation faster than HDL-based simulation?
It may be because the C compiler are more "mature", however I've seen
some cycle-based simulators (I didn't say event-based) which
supposedly can emit better code than, say, gcc because their register
allocation algorithms, etc are specifically tuned for CBS.  Of course
with all speed based stuff, YMMV.

Sim based on compiled HDL also may simply be syntax tree walking or
whatever and might *not* be generating native machine code. It really
depends on your HDL compiler.

-t

Quote:

> Hi,
>   Does anyone know why simulation based on compiled C code that is
> converted from HDL is supposed to be faster than simulation based on
> compiled HDL? Is it because C compilers are more efficient than HDL
> compilers? Or is it because C can handle event scheduling more
> efficiently than HDL? Or is it because of some limitation in HDL?

> Thanks,
> TigerShark



Tue, 13 Sep 2005 05:17:23 GMT  
 C-based simulation faster than HDL-based simulation?

Quote:

> Hi,
>   Does anyone know why simulation based on compiled C code that is
> converted from HDL is supposed to be faster than simulation based on
> compiled HDL?

Similar to a synthesis tool, an optimizing C compiler can
remove unused logic, factor out redundant computation, and
perform graph transformations that result in faster run times.

I had a chance to compare Tenison's VTOC with a few [anonymous]
HDL simulators.  Even though VTOC took a significant amount of
time converting large Verilog systems to C, the simulation
speed-up was impressive.

If you would like to try a comparison for yourself, you can
download a few of our Confluence generated cores from
www.opencores.org.  Confluence is a logic design language that
compiles into Verilog, VHDL, and bit-for-bit cycle accurate C.
The generated C models also include functions for recording VCD
data, so they work as stand-alone simulators.

The CF_FFT test-bench currently on OpenCores is a small
configuration, but I'll build and upload a larger model
that would serve as a good benchmark.

To compile and run the C just do the following:

$ gcc -Wall -o test_fft_c test_fft_testbench.c test_fft.c
$ ./test_fft_c

If you have Icarus, you can do the same with the Verilog:

$ iverilog -Wall -o test_fft_v test_fft_testbench.v test_fft.v
$ ./test_fft_v

Both produce VCD waveforms.  I use Dinotrace for viewing:

$ dinotrace test_fft_c.vcd & dinotrace test_fft_v.vcd

I'll build and upload the larger test-bench now; you
should see it at http://www.opencores.org/projects/cf_fft/
by the time this message hits the group.

Remember, no public announcements of benchmark results!

Regards,
Tom

--

Launchbird Design Systems, Inc.         http://www.launchbird.com/



Tue, 13 Sep 2005 23:58:19 GMT  
 C-based simulation faster than HDL-based simulation?

Quote:

> Hi,
>   Does anyone know why simulation based on compiled C code that is
> converted from HDL is supposed to be faster than simulation based on
> compiled HDL? Is it because C compilers are more efficient than HDL
> compilers? Or is it because C can handle event scheduling more
> efficiently than HDL? Or is it because of some limitation in HDL?

Some converters (such as skilled humans) eliminate more of the
event scheduling during the conversion than do HDL compilers.

Almost all race-free synchronous digital circuits can be simulated
without events, that is, by a program that is statically scheduled.
(The only exception to that rule that I'm aware of is actually due
to an organization/partitioning choice.  With slightly different
partitioning, its simulation can also be statically scheduled.)



Wed, 14 Sep 2005 01:22:27 GMT  
 C-based simulation faster than HDL-based simulation?

Quote:

> Hi,
>   Does anyone know why simulation based on compiled C code that is
> converted from HDL is supposed to be faster than simulation based on
> compiled HDL? Is it because C compilers are more efficient than HDL
> compilers? Or is it because C can handle event scheduling more
> efficiently than HDL? Or is it because of some limitation in HDL?

> Thanks,
> TigerShark

This is a long answer

As the author of a working Verilog lite to C compiler I can say some
things about this & exactly why it is much faster but also where it
might be slower. This is not a product announcement, more of a comment
on work in progress.

I am aware of Tenison & Icarus, but I haven't studied them. I wrote
V2C because I needed a Verilog to C compiler that met my
DSP/CPU/engine needs that could be molded into other tools. I have no
interest in full compliant Verilog EDA as yet & no interest in monster
Verilog+C++ languages like SuperLog.

Generally it is convenient to model complex systems both in HDL & in
C. The HDL can be detail simulated all the way down to gates or even
switches & usually synthesized, and ofcourse RTL HDL is highly
portable to ASIC/FPGA etc.

The C model can be written in a very direct way that leads to max
efficiency & is also portable to many cpus, but isn't usually useful
to any EDA tools.

Example, an FFT in C could be the Rad2 Cooley Tukey algorithm about 20
lines of code right out of any DSP book.

The same detailed model in HDL could be several hundreds of lines if
written in RTL code lets say 200 detail lines, in both case no
comments or blanks included.

It would be nice to only describe a problem once & both simulate &
synth it from HDL but also get the benefit of C code translation for
evaluation in a Matlab/C environment. We might already have a shipping
C code app and might be taking C code & turning it to HDL for speed up
in HW ie FPGA accelerator.

For DSP, CPUs & other structured engines that use one clock and where
all the pipelines are continuously clocking and where there are no
unusual cycle graphs in the code ie no infered latches, it becomes
practical to compile Verilog lite to cycle C. For mixed clock domains
it is more a problem of finding where & when the work is going on,
ultimately leading back to event simulation. Syncronized multiple
clocks can be "simulated" but become less efficient. It would be
simpler to use 1 clock and logically mux gate it for slower domains,
but then the resulting Q signals may be evaluated endlessly.

In V2C there is a direct conversion between every construct in the
Verilog language supported to the C output, mostly in same order. In
V2C I impose a few limits that may appear to be startling, but some of
the restrictions will likely get reduced or eliminated later on. V2C
has been used to design DSPs & a CPU in the works for several years.
By definition the Verilog supported is basically idiot proof for
synthesis. V2C never deals with blocking issues.

V2C does not support anything but the following, Hierarchy of modules
& instances with the usual ports, wire declarations & continuous
assigns. A basic #'macro preprocessor just got added. Wires are
limited to 32 bits, and right now all assigns are manually levelized.
These are the 2 biggest limits awaiting fix before any minimal
release.


implies sequential behaviour. The input Verilog is assumed to link
with 2 libs. The Verilog lib would be used only by other Verilog tools

clocked engines can be built. The C lib would include similar
primitive cells to the Verilog lib and describes DFlops directly as a
Din master load. The master to Q slave copy is done at the bottom of
the cycle for all Flops in a magic clock fn.

At a minimum both libs must describe all the DFlop structures needed,
ie DFF,EFF,MFF or basic, enabled, muxed etc. Latches are to be
avoided. Memories are a special case since they contain private state.
All wire widths variations are also specified in the C lib but the
Verilog lib needs only parameterized models. It would be prudent to
add parameters or templates for the C code models too.

Both libs can arbitrarily contain any ssi,msi,lsi function you could
imagine from the TTL days or anything you could describe in both, even
a cpu. However the more that goes into these libs the more limiting is
the V2C use since everything must be written for both C & Verilog and
some things may not be synthesizable. On the other hand the more there
is in these libs, the faster the potential C simulation. If we started
with a hand written RTL cycle C model and corresponding RTL Verilog
model of an entire chip, the V2C top level Verilog would reduce to an
instance. For something like an AND gate the C model would be just Q =
A & B..; inside a fn wrapper.

All the C fns are inlined where ever possible, some might be macros.

V2C uses a 4 stage process. The Verilog sources are 1st fed into the
comment {*filter*}. 2nd is the combined lexer & parser. Much of the
parser code reads very similar to the EBNF for the Verilog language
with a little extra thrown in. The 3rd stage processess the Tuple tree
from 2nd stage and does all the various transforms desired esp hier
flattenning & tree optimisation & semantic checks. The 4th stage
writes out the flat or hier Tuple trees to C, Verilog, or Tuple
diagrams. It wouldn't be that difficult to add VHDL, EDIF or other
output, but input is 2 orders of magnitude harder esp since I don't
know VHDL well enough.

If I want to include behavioural Verilog code, I can just write it as
a module and as a C funct with matching ports. The C fn can be
instanced as any normal module. So far these modules have been equiv
to TTL msi functions or less. I could include something like a
Quicksort or Wordprocessor, but the Verilog lib version would be
harder to write and neither would be synthesizable.

Generally a chunk of Verilog code produces about the same amount of C
code or proportionaly more if a module is instanced many times. It is
just a tree walker after all.

V2C is pretty mechanical, and compiles about 1MB of sourcs /sec/GHz on
a P3.
V2C C code generally runs at typical speeds of about 1 Billion
lines/min, since typical Verilog lines emit about 60 x86 codes after C
compiling. That avg comes from a DSP simulation that routinely runs
for millions of cycles in a few minutes.

The output C code is really a straight line branchless code that could
easily be 1Mil lines long and as such will break most C compilers. To
get past that, V2C artificially breaks it into n line chunk fns with
no internal vars. All wires become unsigned longs which are global to
all fns. The wires/ulongs have names that include their instance path
name joined by _ (very few chars possible).

The value n is chosen so that the fns can pass through std C compiler
with highest optimization enabled, and so that C compiler compiles as
fast as possible, typically 25-200. If n is too small, the optimizer
will see related code across fn boundary. If n is too big, it will
{*filter*}& take forever.

Assignments are the only way to include case statements of the ?: form
and can be any size. They support any expression type and ?: is always
latch free.

Bit field operations such as [:] {,,} are handled with bit fns both
LHS & RHS.

I am pretty keen to really optimize the hell out of the C code and add
RISC32 x86 code and go to a direct run from the cmd line. V2C is
written in std C/C++ & has run on Windows & BeOS. Some use of classes
for strings & output.

As long as the Verilog assignments are word by word, the C code is
bound to be 2 orders faster since it completely ignores xz states, and
Verilog has to evaluate everything bit by bit. If the input Verilog is
gate level & bit by bit, the speed up would be nominal until the x86
optimizations are added.

So if I compare the C code that is generated by V2C with a direct RTL
C code that does exactly the same thing as the Verilog source, I would
expect V2C code to be 2-5 times slower. But RTL C code is very
difficult to write and can't easily be used by any EDA tools. With
more & more optimizations it will be possible to make V2C code almost
as efficient, and if x86 native code added, probably much faster than
RTL C compiled to x86, since the tool has far more knowledge available
than any C compiler.

I have been considering releasing under GPL, as well as commercial.
I got miles to go and many bills to pay.

Interested in peoples thoughts.

JJ



Fri, 16 Sep 2005 13:35:01 GMT  
 C-based simulation faster than HDL-based simulation?
Tenison's VTOC does everything described as for V2C here and also handles
  transparent latches
  any number of clocks, asynchronous, synchronous, gated
  arbitrary length vectors
  very big designs
  any synthesisable Verilog
  ANSI C or SystemC or 'native' C++ or PLI-callable C++ output
Doing the WHOLE language has been extremely hard - VTOC has been
under development for several years now, and has been used on real
big commercial designs.

VTOC does static scheduling of everything, the output code goes much faster
than compiled RTL because the compiler knows more.

If you can write your Verilog with your translator in mind and don't
mind sticking to Verilog Lite, then V2C may be all you need.
If you have large bodies of existing Verilog, or your design
needs some of the things above, then you need VTOC.



Mon, 19 Sep 2005 21:56:47 GMT  
 C-based simulation faster than HDL-based simulation?

Quote:

> Tenison's VTOC does everything described as for V2C here and also handles
>   transparent latches
>   any number of clocks, asynchronous, synchronous, gated
>   arbitrary length vectors
>   very big designs
>   any synthesisable Verilog
>   ANSI C or SystemC or 'native' C++ or PLI-callable C++ output
> Doing the WHOLE language has been extremely hard - VTOC has been
> under development for several years now, and has been used on real
> big commercial designs.

> VTOC does static scheduling of everything, the output code goes much faster
> than compiled RTL because the compiler knows more.

> If you can write your Verilog with your translator in mind and don't
> mind sticking to Verilog Lite, then V2C may be all you need.
> If you have large bodies of existing Verilog, or your design
> needs some of the things above, then you need VTOC.

Precisely. I hope the authors of these other Verilogs VTOC & Icarus
get some $ commercial reward for their efforts. I am curious what the
market share for vendors is after VCS, & XL.

My self I am more inspired by the smaller languages created by Wirth
etc, and when I have a tool that does enough, I can get back to using
it. I prefer to think of V2C as a way to add Verilog lite onto C so
that modelling can be done in a C env without growing the C language
directly and without trying to use C directly as a HDL.



Tue, 20 Sep 2005 06:58:05 GMT  
 C-based simulation faster than HDL-based simulation?

Quote:

> Precisely. I hope the authors of these other Verilogs VTOC & Icarus
> get some $ commercial reward for their efforts.

Not a cent.

(Not strictly true. I got a book from Stu Sutherland, and a copy
of Foundation 4.2i from Xilinx.)

--
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
steve at picturel.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."



Tue, 20 Sep 2005 07:45:56 GMT  
 C-based simulation faster than HDL-based simulation?
Hi,
I have been toying with Verilator 3.103 for a few weeks. On a 1800 line
synthesizable Verilog RTL input file, it was 9X faster than a commercial
compiled Verilog simulator. I plan to try a larger block in the near future.
Here is the link for the source.

http://www.veripool.com/verilator.html

Has anyone else evaluated this free Verilog to C tool? It seems to be a bit
more mature than V2C. The project was started in 1994, and Verilator is now
in it's third major revision. It does have limitations, that may make it
unsuitable as a replacement for a commercial Verilog simulator product.

I also plan to evaluate VTOC, when I get the time. I expect VTOC to be even
faster than Verilator.

Robert A. Clark


Quote:
> Hi,
>   Does anyone know why simulation based on compiled C code that is
> converted from HDL is supposed to be faster than simulation based on
> compiled HDL? Is it because C compilers are more efficient than HDL
> compilers? Or is it because C can handle event scheduling more
> efficiently than HDL? Or is it because of some limitation in HDL?

> Thanks,
> TigerShark



Tue, 20 Sep 2005 14:53:26 GMT  
 C-based simulation faster than HDL-based simulation?

Quote:

>Hi,
>I have been toying with Verilator 3.103 for a few weeks. On a 1800 line
>synthesizable Verilog RTL input file, it was 9X faster than a commercial
>compiled Verilog simulator. I plan to try a larger block in the near future.
>Here is the link for the source.

>http://www.veripool.com/verilator.html

>Has anyone else evaluated this free Verilog to C tool? It seems to be a bit
>more mature than V2C. The project was started in 1994, and Verilator is now
>in it's third major revision. It does have limitations, that may make it
>unsuitable as a replacement for a commercial Verilog simulator product.

>I also plan to evaluate VTOC, when I get the time. I expect VTOC to be even
>faster than Verilator.

>Robert A. Clark

thanks for the verilator link.  Where would one find VTOC and V2C?

Phil



Tue, 20 Sep 2005 15:41:56 GMT  
 C-based simulation faster than HDL-based simulation?

Interesting post...I wrote a similar set of verilog/simulator tools
back in '99 for use on the job (processor and IO chip design).  Some
comments follow...

Quote:

> implies sequential behaviour. The input Verilog is assumed to link
> with 2 libs. The Verilog lib would be used only by other Verilog tools

> clocked engines can be built. The C lib would include similar
> primitive cells to the Verilog lib and describes DFlops directly as a
> Din master load. The master to Q slave copy is done at the bottom of
> the cycle for all Flops in a magic clock fn.

For a CBS methodology like this, don't you *also* have to propagate Q
through the relevant combinatorial logic to handle gets for values of
primary outputs which aren't registerized?  (Of course if you do your
get at the "proper" time you don't need to do that.)

Quote:
> V2C uses a 4 stage process. The Verilog sources are 1st fed into the
> comment {*filter*}. 2nd is the combined lexer & parser. Much of the
> parser code reads very similar to the EBNF for the Verilog language
> with a little extra thrown in. The 3rd stage processess the Tuple tree
> from 2nd stage and does all the various transforms desired esp hier
> flattenning & tree optimisation & semantic checks. The 4th stage
> writes out the flat or hier Tuple trees to C, Verilog, or Tuple
> diagrams. It wouldn't be that difficult to add VHDL, EDIF or other
> output, but input is 2 orders of magnitude harder esp since I don't
> know VHDL well enough.

Something that is extremely useful as a netlist optimization is single
sink reduction where you fold back nets used only once into another
eqn.  That is,

wire a,b,c,d;

assign a=b&c;
assign d=a|e;  // assume this is the only place a is used

can really be reduced down to d=(b&c)|e.  This cuts down on the number
of globals and allows better register usage in the c/c++ compiler.

This works especially well when used to optimize ?: muxes as large
chunks of the model might not even need to be simulated.  This is from
the cpu6502 model in ver:

Cycle: 50000 Time: 1.32 sec (37921.88 cycles/sec)
Cycle: 100000 Time: 2.63 sec (37984.87 cycles/sec)
Cycle: 150000 Time: 3.95 sec (38009.20 cycles/sec)

vs

Cycle: 50000 Time: 0.65 sec (76463.35 cycles/sec)
Cycle: 100000 Time: 1.32 sec (75925.30 cycles/sec)
Cycle: 150000 Time: 1.97 sec (76135.60 cycles/sec)

...of course ?: precludes parallel simulation across the bits in a
machine word as your code is no longer branchless, but for you that
probably doesn't matter.

Quote:
> If I want to include behavioural Verilog code, I can just write it as
> a module and as a C funct with matching ports. The C fn can be
> instanced as any normal module. So far these modules have been equiv
> to TTL msi functions or less. I could include something like a
> Quicksort or Wordprocessor, but the Verilog lib version would be
> harder to write and neither would be synthesizable.

Sounds like you've reinvented what IBM's server group has been doing
for years.  I've been a big fan of C/C++ behaviorals for a long time.

Quote:
> The output C code is really a straight line branchless code that could
> easily be 1Mil lines long and as such will break most C compilers. To
> get past that, V2C artificially breaks it into n line chunk fns with
> no internal vars. All wires become unsigned longs which are global to
> all fns. The wires/ulongs have names that include their instance path
> name joined by _ (very few chars possible).

I've discovered the C compiler breakage long ago much to my chagrin.
I have found it interesting that if you don't run your c compiler in
optimizing mode, it usually *can* handle any arbitrarily large
function.  However, as you've discovered, you're better off breaking
into n-line functions and calling them one after the other.

Quote:
> I am pretty keen to really optimize the hell out of the C code and add
> RISC32 x86 code and go to a direct run from the cmd line. V2C is
> written in std C/C++ & has run on Windows & BeOS. Some use of classes
> for strings & output.

> As long as the Verilog assignments are word by word, the C code is
> bound to be 2 orders faster since it completely ignores xz states, and
> Verilog has to evaluate everything bit by bit. If the input Verilog is
> gate level & bit by bit, the speed up would be nominal until the x86
> optimizations are added.

Well, Verilog does not have to evaluate sequentially bit-by-bit and 4
state can be done in parallel for all the bits across a machine word
(assuming you have an extra word to store the X/Z information).
Ignoring strength information, this is one possible way to do it:

205 :/tmp/ver-1.3.36-10/experimental/fast4state> more vcs_gates.c
#include <stdio.h>

/*
 * 00 '0'
 * 01 '1'
 * 10 'Z'
 * 11 'X'
 */
#define MVL(a,b) "01ZX"[((a)&1)*2+((b)&1)]

#define ITER2 for(t3=0;t3<2;t3++) for(t4=0;t4<2;t4++) {
#define TAIL2 t0&=1; t1&=1; printf("%d%d -> %d%d \t ~%c -> %c\n",
t3,t4, t0, t1, MVL(t3,t4), MVL(t0, t1));\
                }printf("\n");
#define ITER4 for(t3=0;t3<2;t3++) for(t4=0;t4<2;t4++)
for(t5=0;t5<2;t5++) for(t6=0;t6<2;t6++) {
#define TAIL4 t0&=1; t1&=1; printf("%d%d & %d%d -> %d%d \t %c & %c ->
%c\n", t3,t4, t5, t6, t0, t1, MVL(t3,t4), MVL(t5,t6), MVL(t0, t1));\
                }printf("\n");

int main(int argc, char **argv)
{
int c, d, t0, t1, t3, t4, t5, t6;

printf("NOT\n===\n");
ITER2
        t0=t3;
        t1=(t3|~t4);
TAIL2

printf("AND\n===\n");
ITER4
        d=(t3|t4)&(t5|t6);
        t0=d&(t3|t5);
        t1=d;
TAIL4

printf("OR\n==\n");
ITER4
        c=(t3^t5)^((t3|t4)&(t5|(t6&t3)));
        t0=c;
        t1=((t4|t6)|c);
TAIL4

printf("XOR\n===\n");
ITER4
        c=t3|t5;
        t0=c;
        t1=(c|(t4^t6));
TAIL4

printf("NAND\n====\n");
ITER4
        d = (t3 | t4) & (t5 | t6);
        c = d & (t3 | t5);
        d = c | (~d);  
        t0 = c;
        t1 = d;
TAIL4

printf("NOR\n===\n");
ITER4
        c = (t3 ^ t5) ^ ((t3 | t4) & (t5 | (t6 & t3)));
        d = c | (~((t4 | t6) | c));
        t0 = c;  
        t1 = d;
TAIL4

printf("XNOR\n====\n");
ITER4
        c = t3 | t5;
        d = c | (~(c | (t4 ^ t6)));
        t0 = c;  
        t1 = d;
TAIL4

exit(0);

Quote:
}

206 :/tmp/ver-1.3.36-10/experimental/fast4state>

Quote:
> I have been considering releasing under GPL, as well as commercial.
> I got miles to go and many bills to pay.

Ain't that the truth.  =)

Later,
Tony



Thu, 22 Sep 2005 14:12:51 GMT  
 C-based simulation faster than HDL-based simulation?

Quote:


> >Hi,
> >I have been toying with Verilator 3.103 for a few weeks. On a 1800 line
> >synthesizable Verilog RTL input file, it was 9X faster than a commercial
> >compiled Verilog simulator. I plan to try a larger block in the near
> future.
> >Here is the link for the source.

> >http://www.veripool.com/verilator.html

> >Has anyone else evaluated this free Verilog to C tool? It seems to be a bit
> >more mature than V2C. The project was started in 1994, and Verilator is now
> >in it's third major revision. It does have limitations, that may make it
> >unsuitable as a replacement for a commercial Verilog simulator product.

> >I also plan to evaluate VTOC, when I get the time. I expect VTOC to be even
> >faster than Verilator.

> >Robert A. Clark

> thanks for the verilator link.  Where would one find VTOC and V2C?

> Phil

I haven't released V2C yet, I would like to do more work on it to remove some
restrictions I wouldn't want anyone else to put up with. Since it will be part
of something much bigger, I am not sure about the license so it may be binary
for now.

There may have been a prior V2C work as well, so I may have to get another name
maybe Vpp. In the mean time enjoy Icarus or Verilator. I am not sure if Tenison
VTOC is free or not. Try http://www.geda.seul.org/

JJ



Sat, 01 Oct 2005 09:08:19 GMT  
 
 [ 12 post ] 

 Relevant Pages 

1. C-based simulation faster than HDL-based simulation?

2. PC-based S/370 assembler simulation

3. 6 dof motion base simulation

4. Cycle Based Simulation

5. Q: Cycle based simulation: What, how, etc

6. cycle-based simulation, etc.

7. Vhdl cycled base simulation.

8. cycle-based simulation

9. Cycle Based Simulation

10. VHDL-based simulation of DLX processor

11. VHDL Test Bench for Schematic-Based Simulation

12. Simulation-Based Verification references

 

 
Powered by phpBB® Forum Software