Cliff Cummings' LONG response to various nonblocking assignment discussions 
Author Message
 Cliff Cummings' LONG response to various nonblocking assignment discussions

[I am posting this for Cliff Cummings. -mac]

I was on a Verilog Standards Group conference call last Thursday,
discussing new wording for nonblocking assignments in sections 5 and 9 of
the IEEE Verilog-2000 Standard when some fellow committee members mentioned
that my name was being thrown around comp.lang.verilog. Two committee
members e-mailed me most of the threads surrounding nonblocking assignments
and my Synopsys Users Group Paper from last Spring.

I am both surprised and honored that Jan Decaluwe (of the VHDL FAQ fame?)
is one of the principal detractors concerning my recommendations for
synthesizable coding styles with nonblocking assignments. We have always
been told how VHDL dominates the European design community. Jan, are you
actually using Verilog for a European design?  (I think this falls into the
category of seeing the light!   ;-)  ;-)  ;-)

BTW - Synopsys users can go to the Synopsys SNUG web site and down-load the
presentation that accompanied the paper. The presentation shows numerous
examples with events scheduled in the various queues, more than what is
shown in the paper.

An updated PDF version of my paper is also available at the following web
location (a few readers found some obvious typos in my original paper that
have been corrected):

http://www.*-*-*.com/

Now let me briefly address some of the issues, questions and opinions that
have been raised on comp.lang.verilog. To keep me involved in the
discussion, please e-mail copies of follow-up postings to

Having done both Verilog and VHDL synthesis design, I will try to relate
some of the Verilog nonblocking assignment concepts to similar VHDL concepts.

=====

Jamil Khatib asked what is the difference between blocking and nonblocking
statements in Verilog for both the simulation and synthesis?

Brief answer:
Blocking Assignments: The RHS (Right-Hand Side) of a blocking assignment is
evaluated and the LHS (Left-Hand Side) is updated before executing other
statements; hence, they "block" other assignments from being executed until
the current blocking assignment has completed (exception not discussed in
the paper, blocking assignment with a delay on the RHS of the assignment
operator).

Nonblocking Assignments: The RHS of a nonblocking assignment is evaluated
immediately and the LHS is updated later in the same time step (in the
nonblocking updates event queue), giving the appearance of what VHDL-types
call a delta time.

If no delays are used and if we restrict the discussion to assignments
within a Verilog always block or VHDL process, blocking assignments behave
very similar to VHDL variable assignments (except that Verilog blocking
assignments are not restricted to variables declared within an always
block) and nonblocking assignments are similar to VHDL signal assignments.

Both are used in synthesis and even though you can make your design work if
you use just blocking assignments or just nonblocking assignments, if you
are careful and really know what you are doing, I highly recommend using
blocking assignments for purely combinational logic always blocks
(guidelines #3 and part of #5 from my paper), and using nonblocking
assignments for sequential always blocks (guidelines #1, #2, #4 and part of
#5 from my paper).

=====

Quote:

>I know I seem to be quite alone here, but I do mean Cliff's paper
>gives bad advice. Roughly speaking perhaps 20% is worth reading
>(Verilog's scheduling, nonblocking myths) but to me the rest is
>trivialities or bad advice.

Bad advice? I disagree (more discussion later).

Trivialities? Some - much of the material I presented in the paper is from
my Comprehensive and Advanced Verilog classes. There is so much confusion
surrounding nonblocking assignments that I make them a significant part of
every class I teach. While co-teaching for the first time with one of my
Sunburst Design sub-contract instructors, a very talented Verilog and VHDL
ASIC designer who also taught Synopsys Chip Synthesis classes for two years
before and after jobs that he had doing 1M+ gate ASIC designs, after seeing
the presentation for the first time the instructor strongly encouraged me
to give the material as a paper at a SNUG conference. He said he had known
of the recommendations but did not know why the recommendations were
important until he had seen the nonblocking presentation. Per his request,
I gave the paper and to my surprise, the paper was voted best paper at the
conference. So Jan, topics that you and I thought to be somewhat trivial,
are not as trivial to others as I originally thought.

Quote:
>I have a problem with "use nonblocking for sequential logic"
>and I don't find any real argument in his paper for this.
>Since the early days, flip-flop inference from blocking
>assignments was perfectly supported, yet for some reason
>many people find this feature apparently too high level.

Synthesis is less of an issue than simulation. Before nonblocking
assignments, pipelines were often modeled as follows:

// non-synthesizable

  qout = #1 q3;
  q3   = #1 q2;
  q2   = #1 q1;
  q1   = #1 qin;
join

or following the guideline you gave of only allowing one sequential always
block per module and ordering the blocking statements correctly.

A few years ago, I did some benchmarks using Verilog-XL to examine
simulation efficiency based solely on coding styles. I knew of engineers
that recommended coding lots of small always blocks for each flip-flop in
the design as opposed to combining multiple sequential elements into a
single always block.

Every time signals pass module boundaries, you pay a significant penalty
(which is why I think VCS has the ability to completely flatten a design -
to improve simulation speed). And breaking larger always blocks into
multiple always blocks for individual flip-flops also came with a
significant simulation penalty.

Nonblocking assignments are less simulation efficient than blocking
assignments (how much??? - I don't know, but I don't think the penalty is
as great as passing data through Verilog ports).

Quote:

>I find it hilarious when he says several times "Flawed coding
>style, but it works" or when he describes blocking assignments
>as "bad coding style" just because the model is not correct
>(Example 5)!

I'm glad I could bring a smile to your face!   ;-)

The "flawed" is with respect to my recommendations, and in the SNUG
presentation I showed four sequential coding styles using blocking
assignments where 1 of 4 was guaranteed to simulate correctly and 3 of 4
synthesized to the correct pipeline. When I showed the exact same examples
using nonblocking assignments, 4 of 4 simulated correctly and 4 of 4
synthesized to the correct logic. You are right that example 5 is just
plain coded wrong, but if you swap the blocking assignments with
nonblocking assignments, the design simulates and synthesizes to the
desired pipeline logic. The point I made in the presentation is that if you
are careful, you can make blocking assignments work for sequential logic,
but that nonblocking assignments worked for all four coding styles. The
question is, how hard do you want to work to get it right? Small simulation
inefficiencies are insignificant to the time spent by engineers debugging
Verilog schedule-queue issues, which is why I give the recommendations that
I do.

=====

Quote:

>The worst of all is that the most interesting, perfectly
>synthesizable, behavior-like coding styles are just not
>compatible with his guidelines. So anyone who follows this
>guidelines is deprived from taking advantage of such coding
>styles.

Examples?


(I added begin-end to the else statement, assuming that Jan did not want to
asynchronously reset and increment count without a clock edge?)

Quote:

> if( ~reset_n )    
>     Count = 0;
> else begin
>     // reset logic
>     if (ReadCondition)
>         Count = 0;
>     // event increment logic
>     if (Event)
>         Count = Count + 1;
> end

>This is more modular: The reset logic only resets and need
>not care about events. To avoid missing events, the
>Event logic is put "late" in the always block.

Is this the same as:


  if (!reset_n) Count <= 0;         // async reset
  else begin
    Count <= Count;                 // default - no change
    case ({ReadCondition, Event})
      2'b01:    Count <= Count + 1; // increment
      2'b10:    Count <= 0;         // sync reset
      2'b11:    Count <= 1;         // reset w/ inc
    endcase
  end

I am having a hard time picturing the logic using Jan's coding style, but
if I interpreted correctly, I know exactly what hardware I am trying to
infer from the latter coding style. Am I missing something? Is Jan trying
to infer hardware that actually resets and increments synchronously on the
same clock edge without any actual hardware races? When I get a few
minutes, I will try synthesizing both styles and see if there is any
difference.

Quote:

>Furthermore, also in VHDL we infer flip-flops from both
>signal assignments and variable assignments. We certainly
>don't use the rule that all sequential logic should
>be inferred from signal assignments, for the same reason:
>some sequential logic can only be elegantly described
>using variable assignments.

Many VHDL engineers that I know like to mix VHDL variable assignments in
the same process and VHDL signal assignments? A similar coding style works
in Verilog but it is easy to get wrong and difficult to debug unless you
really understand the Verilog event queues, which is why I discourage such
coding styles (guideline #5). I never warmed to mixing concurrent signal
assignments with sequential variable ...

read more »



Sat, 01 Mar 2003 23:38:24 GMT  
 Cliff Cummings' LONG response to various nonblocking assignment discussions


Quote:

......snipped.....

> Every time signals pass module boundaries, you pay a significant penalty
> (which is why I think VCS has the ability to completely flatten a design -
> to improve simulation speed). And breaking larger always blocks into
> multiple always blocks for individual flip-flops also came with a
> significant simulation penalty.

I've heard others says that breaking up a design into more modules
causes a significant penalty, but I haven't figured out for myself yet
just exactly how it happens.   Does it cause more events?  Or does it
make the simulator 'elaborate' the network of procedures into a much
larger memory footprint?   Or, is there some special activity that comes
into play because a connection between 2 communicating procedures
happens to cross a module boundary instead of being within the same
module?   Or is this, perhaps, a characteristic of compiled and linked
vs. interpreted Verilog?   Or, none of the above?   Anybody know?

By the way, Cliff, your recommendations on  using '=' vs. '<=' are excellent.



Sun, 02 Mar 2003 00:18:09 GMT  
 Cliff Cummings' LONG response to various nonblocking assignment discussions


<snip>

Quote:

> ?? LFSRs (or rather CRC calculations) are actually one of my
> favourite examples of where you can use blocking (variable) assignments
> to do things which are otherwise very difficult. Question: how do you
> describe CRC logic in the case where you have to treat multiple data
> bits in the same clock cycle?

Believe it or not, you use *your* (Easics') CRC code.

It uses blocking assignments for the function which generates the CRC
as combinational logic. You then generate registers from the function
using non-blocking assignments. That's what I did.

I downloaded 8-bit and 16-bit parallel Ethernet-standard CRCs from
Easics a while ago. I wanted to compare your implementation with mine
and with one from an IP provider. All three gave identical results in
simulation (fortunately for me, I hadn't blown the math!), yours was
smallest (about 90% the size of mine), and mine was fastest when
synthesized to a Xilinx Virtex part (about 10% faster than yours). I
won't comment on the speed or size of the version we paid for, which
should be comment enough. (To be fair, the CRC generator was one small
part of a large design we bought.)

Putting aside the blocking/non-blocking assignment controversy, I
would like to compliment you on your CRC code. It's accurate and
compact, and I sure wish I had known about it 18 months ago!

Marc



Sun, 02 Mar 2003 02:23:46 GMT  
 Cliff Cummings' LONG response to various nonblocking assignment discussions

Quote:



> <snip>

> > ?? LFSRs (or rather CRC calculations) are actually one of my
> > favourite examples of where you can use blocking (variable) assignments
> > to do things which are otherwise very difficult. Question: how do you
> > describe CRC logic in the case where you have to treat multiple data
> > bits in the same clock cycle?

> Believe it or not, you use *your* (Easics') CRC code.

> It uses blocking assignments for the function which generates the CRC
> as combinational logic. You then generate registers from the function
> using non-blocking assignments. That's what I did.

This is clearly the best answer :-)
(http://www.easics.com/webtools/crctool)

But now suppose we hadn't done this job of predigesting CRC equations
in a very efficient, reusable format, and you would have to design
it yourself. Then there is a second best answer, using synthesis tool
capabilities and blocking assignments, that can give reasonable results
in many cases.

Quote:
> I downloaded 8-bit and 16-bit parallel Ethernet-standard CRCs from
> Easics a while ago. I wanted to compare your implementation with mine
> and with one from an IP provider. All three gave identical results in
> simulation (fortunately for me, I hadn't blown the math!), yours was
> smallest (about 90% the size of mine), and mine was fastest when
> synthesized to a Xilinx Virtex part (about 10% faster than yours). I
> won't comment on the speed or size of the version we paid for, which
> should be comment enough. (To be fair, the CRC generator was one small
> part of a large design we bought.)

> Putting aside the blocking/non-blocking assignment controversy, I
> would like to compliment you on your CRC code. It's accurate and
> compact, and I sure wish I had known about it 18 months ago!

Thanks. I happen to know that many people are benefitting from it,
but it's very nice (and also rare) to hear that explicitly.

Regards, Jan

--
Jan Decaluwe           Easics              
Design Manager         System-on-Chip design services  
+32-16-395 600         Interleuvenlaan 86, B-3001 Leuven, Belgium



Sun, 02 Mar 2003 03:05:20 GMT  
 Cliff Cummings' LONG response to various nonblocking assignment discussions

Quote:

> Here is another example, but this time with zero delays.

>     initial
>       begin
>         CBN <= 0;
>         CBN <= 1;
>       end

> The value of CBN after the initial statement is indeterminate since
> multiple values are assigned to the same register at the same time. The
> Verilog HDL standard does not define the order in which the events
> are scheduled nor the order of the events that get cancelled.

On the contrary - this is just about the only place where the
LRM explicitly states the order of events within the same time-step.
(and I means EXPLICITLY with language outlining exactly this case).

Non-blocking assignments to the same variable in the same block are
always retired in order so:

        initial begin
                A <= 0;
                A <= 1;
        end

is well behaved and A ends up with 1 (after glitching to 0 for 0
time). While:

        initial A <= 0;
        initial A <= 1;

is undefined with A ending up with either value

        Paul Campbell



Sun, 02 Mar 2003 12:09:38 GMT  
 Cliff Cummings' LONG response to various nonblocking assignment discussions

Quote:


> > Here is another example, but this time with zero delays.

> >     initial
> >       begin
> >         CBN <= 0;
> >         CBN <= 1;
> >       end

> > The value of CBN after the initial statement is indeterminate since
> > multiple values are assigned to the same register at the same time. The
> > Verilog HDL standard does not define the order in which the events
> > are scheduled nor the order of the events that get cancelled.

Paul - I (Jan Decaluwe) did not write this. Please review my post.
Per request by Cliff, I was quoting literally from a book. The book
that has it wrong is:

 J. Bhaskher (Lucent Technologies): A Verilog HDL Primer, Star Galaxy Press, 1997
 -> Section 8.4.3 Non-blocking Procedural Assignment

Regards, Jan

--
Jan Decaluwe           Easics              
Design Manager         System-on-Chip design services  
+32-16-395 600         Interleuvenlaan 86, B-3001 Leuven, Belgium



Sun, 02 Mar 2003 16:40:28 GMT  
 Cliff Cummings' LONG response to various nonblocking assignment discussions

Quote:

> Paul - I (Jan Decaluwe) did not write this. Please review my post.
> Per request by Cliff, I was quoting literally from a book. The book
> that has it wrong is:

>  J. Bhaskher (Lucent Technologies): A Verilog HDL Primer, Star Galaxy Press, 1997
>  -> Section 8.4.3 Non-blocking Procedural Assignment

yes - I realized this after I posted it - sorry

        Paul



Sun, 02 Mar 2003 22:16:56 GMT  
 Cliff Cummings' LONG response to various nonblocking assignment discussions
A couple of minor VHDL-related points:

Quote:
>[I am posting this for Cliff Cummings. -mac]

>If no delays are used and if we restrict the discussion to assignments
>within a Verilog always block or VHDL process, blocking assignments behave
>very similar to VHDL variable assignments (except that Verilog blocking
>assignments are not restricted to variables declared within an always
>block)

You can declare normal variables within processes and subprograms.
VHDL also has 'shared variables', which can be declared anywhere an
ordinary variable can't be, and which can be accessed by multiple
processes/always blocks. They're almost never used, however, for
exactly the reasons covered in this thread. If you were perverse
enough, you could duplicate the blocking/race problem in VHDL by using
shared variables.

Quote:
>I never warmed to mixing concurrent signal
>assignments with sequential variable assignments in the same VHDL process.
>I always figured that if my combinational logic was complex enough to merit
>multiple lines of code and/or case statements, that the combinational code
>deserved its own process statement. Simply a style issue.
> <snipped> .... Simple
>nonblocking assignments can solve the problem just like concurrent VHDL
>signal assignments within a single process.

'Concurrent signal assignments' are concurrent statements, ie. outside
of a process. They're shorthand forms of the 'if' and 'case'
statements inside processes and subprograms. They have some
similarities to Verilog's continuous assignments. Signal and variable
assignments (and everything else) inside a process (or subprogram) are
sequential, in the sense that they are guaranteed to be executed
sequentially.

Quote:
>Where Paul and I differ is that I highly discourage using blocking
>assignments with delays on the RHS of the assignment operator, and in fact,
>if I could remove this from the Verilog language I would.

That would certainly guarantee that a lot of legacy code could never
be reused...

Quote:
>(Ka-ching$$$)

Interesting to see that contractors world-wide have a common
vocabulary...  :)

Evan Lavelle



Mon, 03 Mar 2003 03:00:00 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. Nonblocking IO on popen pipe (isn't nonblocking)

2. faq: proposed response: various vendors f90 status

3. Various things - Long post - THanks everybody.

4. various problems and object persistance... (long)

5. OT: Choosing software (long response)

6. OT: Choosing software (long response)

7. F8X response (long)

8. Wish80 (windows) Response Time is Too Long ?

9. Discussion on extending the Rexx language [long]

10. coroutines and continuations ( in Python? - long discussion )

11. Simulating 'Response Window'

12. QUERY: Various Smalltalk, Opinions on these various

 

 
Powered by phpBB® Forum Software