VHDL block and guarded input - what does this code do? - vhdl

I have the following from a beginner's VHDL tutorial:
rising_edge: block(clk’event and clk = ‘1’)
begin
result <= guarded input or force after 10ns;
end block rising_edge
The explanatory text is
"Essentially I have a block called rising_edge, and it's a block with a guard condition which does the following, it checks that we have an event on the clock, and that the clock is equal to one, so we're effectively looking for the so called rising_edge. We're looking for the event where the clock goes from 0 to 1, and if it does, then we can conditionally assign the results, so you'll see that the result variable here says that it is a guarded input or force after 10 ns might seem a bit confusing, but consider it without the guarded keyword. All we're doing is we're assigning the result of the evaluation of input or force, and we're doing it in a guarded setup. So, in this case, the assignment of the signal result is only executed if the guard signal is actually true, and in our example it means that the assignment of the expression, which is input or force, will only happen on the rising_edge of the clock because that's on guard condition."
Now I've read this over and over and searched on the net but have come up blanks as to what this is actually doing. Can someone please gently explain its purpose?

A block is essentially a grouping of concurrent statements. In terms of practical usage, it is very similar to a process, only it has a limited scope wich allows component-style signal mapping(with port and port map). It can be used to improve readability(see this question) and really not much else. Blocks are resonably rarely used and frequently not synthesis supported(see here). To my (limited) knowledge, the use of blocks has no other advantage than readability.
Because your block statement contains a guard condition(clk'event and clk='1' is the guard condition here), it is a guarded block. Inside a guarded block, signals that are declared guarded (like in your example) will only be assigned if the guard condition evaluates to true
The entire statement that has been guarded(i.e. in your case input or force after 10ns) will only be executed when the guard condition evaluates to true, i.e. on the rising edge of clk. Thus, for all intents and purposes this block has the same behaviour as
process(clk)
begin
if clk'event and clk = '1' then
result <= input or force after 10ns;
end if;
end process;
I will say though, this is a terrible example. For one thing, as others have stated, the usage of block is very rare and they are generally only used in quite advanced designs. The usage of clk'event and clk = '1' has been discouraged since 1993(see here). It should also be mentioned again that the usage of rising_edge as a label is a terrible idea, as is the use of force for a signal name(in VHDL 2008, force is a reserved keyword that can be used to force a signal to a value).

Working from the idea that this is supposed to be a beginners tutorial, and with the lack of any explanation as to why such an unusual style has been used, a much more conventional implementation would be:
process : (clk)
begin
if (rising_edge(clk)) then
result <= input or force after 10 ns;
end if;
end process;
A couple of points to note:
This assumes that input and force are either signals, or inputs to the entity.
It is unusual to model signal assignment delays if your code is going to be implemented in a real hardware device.
The code in your question uses after 10ns;, which is not valid; you need a space between the value and the units (as in my code).
The code in your question uses rising_edge as an identifier, when this is actually already defined as a function, assuming you are including standard IEEE libraries newer than I believe VHDL93.
The code in your question uses force as a signal name, when this is also a reserved language keyword since VHDL2008.
My advice to you is to find a different tutorial. The quote you posted is not clearly written, and the code you posted appears to be sending you down a strange path. All I can think is that the tutorial is in fact very, very old.

Related

Is it necessary to seperate combinational logic from sequential logic while coding in VHDL, while aiming for synthesis?

I am working on projects which requires synthesis of my RTL codes specifically for ASIC development. Given the case, how much important is it, to separate sequential logic from differential logic while designing my RTLs ? And if it is important, then what should be my approach while designing, as if how should I differentiate my design for sequential and combinational logic?
I generally will separate sequential and combinational as much as possible until it results in too much code (which is usually rare, and possibly an indication of a poor design), or when something just makes more sense (which again is rare, but happens).
This segregation usually aids in initial design, brain->rtl->synthesis (what you think you're making is actually what is synthesized), CDC evaluation for multiclock designs, verification, and other things as well. It's hard for me to give a great example of something bad versus something I would call good, but here is a stab at it.
Say I have a counter that I want to reset on a certain value. I could do this (which I tend to see from people who have a strong software and/or FPGA background)
always #(posedge clk or posedge reset) begin
if(reset) begin
count <= 0;
end else begin
if((count == reset_count_val) || (~enable)) count <= 0;
else count <= count + 1;
end
end
Or I would do this (which is what I would personally do):
//Combinational Path
assign count_in = enable ? ((count == reset_count_val) ? 4'd0 : count_in + 4'd1) : 4'd0;
//Sequential Path
always #(posedge clk or posedge reset) begin
if(reset) count <= 4'd0;
else count <= count_in;
end
The above I would agree is more typing, and to some, more difficult to read. It does however split up the circuit in a way that allows me to easier see what happens on each clock edge. I know that "count_in" is being setup prior to the posedge of clk. I can easily see (as well as anyone else looking at the code) that I would expect a MUX for the reset or addition of count based on the reset_count_val, with a final MUX for gating the count based on the enable signal. Now you could see the same with the first bit of code, however IMO it's not as clear. When you look at a sim, you can see what count_in looks like prior to the rising edge of the clk. This could aid you if the conditional statement for the count_in was rather complex.
Let's say you sent this through synthesis and place and route and you get a timing violation between the Q of the count reg and the D of the count reg (since you have a loopback based on the addition). It would generally be easier to see which path is causing the issue with the 2nd batch of code. This depends on the tool (Primetime more than likely). CDC could also be easier because let's say that reset_count_val is coming from a static registers in another clock domain. The tool may try to synthesize/elaborate the OR in the 1st batch of code thinking the reset_count_val and enable are somewhat related giving you a weird looking CDC violation. Again, sometimes it's hard to come up with an example that exercises all of the "why you shouldn't do this" type of cases.
As an example about splitting combinational and sequential, I inherited a design where someone had a state machine that was written with the combiational and sequential in the same block (always #(posedge clk) where the if/else's would get deep and have next state logic in it). I'm no genius, but after days of staring at that thing and running sims, I could just not figure out what it was doing. It was quite large as well. I simply redid the design, keeping the same algorithm, but splitting up the logic in the format I described here. Even with me adding some features, the size went down ~15%. Other engineers who had the same problem of being lost with the other design, could now understand what was going on with it. This won't always be the case, but more than often it is.
TLDR;
I try to be extremely descriptive when designing RTL that is to be used in an ASIC. The more abstract the code, the more likely it is to build something you don't want, or is more complex than needed. Verification is often much easier, particularly when you get to gate sims. While I am in the camp of "the less code the better" that is not always the case with verilog, especially on an ASIC.
In general, I would not hesitate to mix combinational with sequential logic. I come from an IC design background and have always mixed up combinational logic with sequential. I think that you are restricting yourself too much if you don't and are not fully using the power of your logic synthesiser.
For example, here is how I would design a simple, asynchronously-reset counter in VHDL:
process (Clock, Reset)
begin
if Reset = '1' then
Cnt <= (others => '0');
elsif Rising_edge(Clock) then
if Enable = '1' then
Cnt <= Cnt + 1;
end if;
end if;
end process;
This style of writing a counter in VHDL is ubiquitous. I personally can see no advantage to splitting the code up into two separate processes, one sequential the other combinational. I have just taught a room full of engineers to design a counter in exactly this way.
Here are some exceptions. I would split the combinational logic from the sequential logic if:
i) I were designing a state machine:
There is what I think is a really elegant way of coding a state machine, where you do split the combinational logic from the sequential:
Registers: process (Clock, Reset)
begin
if Reset = '1' then
State <= Idle;
elsif Rising_edge(Clock) then
State <= NextState;
end if;
end process Registers;
Combinational: process (State, Inputs)
begin
NextState <= State;
Output1 <= '0';
Output2 <= '0';
-- etc
case State is
when Idle =>
if Inputs(1) = '1' then
NextState <= State2;
end if;
when State2=>
Output1 <= '1';
if Inputs = "00" then
NextState <= State3;
end if;
-- etc
end case;
end process Combinational;
The advantage of coding a state machine like this is that the combinational process looks very like the state diagram. It is less error prone to write, less error prone to modify and less error prone to read.
ii) The combinational logic were complex:
For really big block of combinational logic, I would separate. The exact definition of "really big" is a matter of judgement.
iii) The combinational logic were on the Q output of a flip-flop:
Any signal driven in a sequential process infers a flip-flop. Therefore, if you wish to implement combinational logic that drives an output of an entity* then this combinational logic must be in a separate process.
*often not a good idea - be careful.
I would post it as comment if I could, as I am not writing full answer, but giving you a source, and also I don't fully refere to the question (I don't know anything about ASICs). But there is great pdf about this problem in general here. Generally, you don't have to completely separate sequential logic from differential logic, but it is helpful to write more readable and maintainable code.
Mixing sequential and combo condenses the code which almost always makes it easier to understand.
Separating makes ECOs easier.
Which you choose is a matter of personal style and organizational coding conventions and standards.

VHDL: Mealy machine and button press detection

Hi I'm a bit confused about the implementation of Mealy state machine using VHDL. My current work is like this:
process(clk, rst)
begin
if rst = '1' then
state <= s1;
elsif (clk'event and clk = '1') then
state <= next_state;
end if;
end process;
and another process like this:
process(state, op)
begin
case state is
when s1 =>
...some implementation
end process;
And now the problem is: I need to detect the press of the button from the user, but I'm not sure where to put it. Should it be inside the first process or the second process? Besides, I also looked through the following guide: implement state machine in FPGA, is it okay to use just one process for the Mealy machine as shown on the webpage? If it is so then I think the work will be easier. Thanks!
You should put it in the second process. The first process is only used to change states and the next_state is also calculated in the second.
There are several ways to write FSMs and people tend to favour one or the other for various reasons. Pick the one that works for you.
You cannot design a Mealy state machine with only one process. Even Moore state machines, in most cases, cannot be modelled with only one process.
A state machine always has a state register which must be modelled with a synchronous process. That is, a process which sensitivity list contains only the clock (and set or reset signals if they are asynchronous).
Every output of a synchronous process will synthesize as the output of a register because its value changes only on an edge of the clock (plus states of asynchronous set or reset if any). So, you cannot describe the outputs of a Mealy state machine in the same synchronous process as the state register. If you were doing so, it would not be a Mealy machine any more because its outputs would not combinationally depend on the inputs.
For Moore machines, things are a bit more subtle but, except in very exceptional cases, you also need at least two processes. When I write "process", I include processes short-hands like concurrent signal assignments, concurrent procedure calls or component/entity instantiations.
To make it simple: VHDL modelling for synthesis is straightforward if you have a clear view of the hardware you want.
Draw a block diagram of your hardware with registers and combinatorial parts clearly identified.
Draw bubbles enclosing hardware elements, one bubble per process, respecting the rule that if a bubble contains a register, all its outputs must be register outputs.
The synchronous processes are those enclosing registers. Their code is exactly:
process(clk)
begin
if rising_edge(clk) then
<your code>
end if;
end process;
Put your code in <your code>, never put code elsewhere. If you have asynchronous set or reset the code must be something like:
process(clk, reset)
begin
if reset = '1' then
<initialize outputs>
elsif rising_edge(clk) then
<your code>
end if;
end process;
The other processes are combinatorial processes. List all their entering signals (INPUTS) and output signals (OUTPUTS). The code must be:
process(INPUTS)
begin
<your code>
end process;
with the constraint that each OUTPUT signal must be assigned a value in every execution of the process. The best way to guarantee this is to start the process with a default assignment of all OUTPUTS.
That's all. Draw and code what you see. Bonus: every arrow crossing the border of one of your process-bubbles is a signal that you will have to declare unless it is already a primary input or output of your design.
Exercise: draw the block diagram of a Mealy state machine and understand why it cannot be modelled with one single process. Understand also why it can always be modelled with two processes, even if it is not necessarily desirable. Finally, try to identify the rare cases where a Moore state machine can be modelled with one process only.

What is the practical difference between implementing FOR-LOOP and FOR-GENERATE? When is it better to use one over the other?

Let's suppose I have to test different bits on an std_logic_vector. would it be better to implement one single process, that for-loops for each bit or to instantiate 'n' processes using for-generate on which each process tests one bit?
FOR-LOOP
my_process: process(clk, reset) begin
if rising_edge (clk) then
if reset = '1' then
--init stuff
else
for_loop: for i in 0 to n loop
test_array_bit(i);
end loop;
end if;
end if;
end process;
FOR-GENERATE
for_generate: for i in 0 to n generate begin
my_process: process(clk, reset) begin
if rising_edge (clk) then
if reset = '1' then
--init stuff
else
test_array_bit(i);
end if;
end if;
end process;
end generate;
What would be the impact on FPGA and ASIC implementations for this cases? What is easy for the CAD tools to deal with?
EDIT:
Just adding a response I gave to one helping guy, to make my question more clear:
For instance, when I ran a piece of code using for-loops on ISE, the synthesis summary gave me a fair result, taking a long while to compute everything. when I re-coded my design, this time using for-generate, and several processes, I used a bit more area, but the tool was able to compute everything way way faster and my timing result was better as well. So, does it imply on a rule, that is always better to use for-generates with a cost of extra area and lower complexity or is it one of the cases I have to verify every single implementation possibility?
Assuming relatively simple logic in the reset and test functions (for example, no interactions between adjacent bits) I would have expected both to generate the same logic.
Understand that since the entire for loop is executed in a single clock cycle, synthesis will unroll it and generate a separate instance of test_array_bit for each input bit. Therefore it is quite possible for synthesis tools to generate identical logic for both versions - at least in this simple example.
And on that basis, I would (marginally) prefer the for ... loop version because it localises the program logic, whereas the "generate" version globalises it, placing it outside the process boilerplate. If you find the loop version slightly easier to read, then you will agree at some level.
However it doesn't pay to be dogmatic about style, and your experiment illustrates this : the loop synthesises to inferior hardware. Synthesis tools are complex and imperfect pieces of software, like highly optimising compilers, and share many of the same issues. Sometimes they miss an "obvious" optimisation, and sometimes they make a complex optimisation that (e.g. in software) runs slower because its increased size trashed the cache.
So it's preferable to write in the cleanest style where you can, but with some flexibility for working around tool limitations and occasionally real tool defects.
Different versions of the tools remove (and occasionally introduce) such defects. You may find that ISE's "use new parser" option (for pre-Spartan-6 parts) or Vivado or Synplicity get this right where ISE's older parser doesn't. (For example, passing signals out of procedures, older ISE versions had serious bugs).
It might be instructive to modify the example and see if synthesis can "get it right" (produce the same hardware) for the simplest case, and re-introduce complexity until you find which construct fails.
If you discover something concrete this way, it's worth reporting here (by answering your own question). Xilinx used to encourage reporting such defects via its Webcase system; eventually they were even fixed! They seem to have stopped that, however, in the last year or two.
The first snippet would be equivalent to the following:
my_process: process(clk, reset) begin
if rising_edge (clk) then
if reset = '1' then
--init stuff
else
test_array_bit(0);
test_array_bit(1);
............
test_array_bit(n);
end if;
end if;
end process;
While the second one will generate n+1 processes for each i, together with the reset logic and everything (which might be a problem as that logic will attempt to drive the same signals from different processes).
In general, the for loops are sequential statements, containing sequential statements (i.e. each iteration is sequenced to be executed after the previous one). The for-generate loops are concurrent statements, containing concurrent statements, and this is how you can use it to make several instances of a component, for example.

VHDL state machine differences (for synthesization)

I am taking a class on embedded system design and one of my class mates, that has taken another course, claims that the lecturer of the other course would not let them implement state machines like this:
architecture behavioral of sm is
type state_t is (s1, s2, s3);
signal state : state_t;
begin
oneproc: process(Rst, Clk)
begin
if (Rst = '1') then
-- Reset
elsif (rising_edge(Clk)) then
case state is
when s1 =>
if (input = '1') then
state <= s2;
else
state <= s1;
end if;
...
...
...
end case;
end if;
end process;
end architecture;
But instead they had to do like this:
architecture behavioral of sm is
type state_t is (s1, s2, s3);
signal state, next_state : state_t;
begin
syncproc: process(Rst, Clk)
begin
if (Rst = '1') then
--Reset
elsif (rising_edge(Clk)) then
state <= next_state;
end if;
end process;
combproc: process(state)
begin
case state is
when s1 =>
if (input = '1') then
next_state <= s2;
else
next_state <= s1;
end if;
...
...
...
end case;
end process;
end architecture;
To me, who is very inexperienced, the first method looks more fool proof since everything is clocked and there is less (no?) risk of introducing latches.
My class mate can't give me any reason for why his lecturer would not let them use the other way of implementing it so I'm trying to find the pros and cons of each.
Is any of them prefered in industry? Why would I want to avoid one or the other?
The single process form is simpler and shorter. This alone reduces the chance that it contains errors.
However the fact that it also eliminates the "incomplete sensitivity list" problem that plagues the other's combinational process should make it the clear winner regardless of any other considerations.
And yet there are so many texts and tutorials advising the reverse, without properly justifying that advice or (in at least one case I can't find atm) introducing a silly mistake into the single process form and rejecting the entire idea on the grounds of that mistake.
The only thing (AFAIK) the single-process form doesn't do well is un-clocked outputs. These are (IMO) poor practice anyway as they can be races at the best of times, and could be handled by a separate combinational process for that output only if you really had to.
I'm guessing there was originally some practical reason behind it; maybe a mid-1990s synthesis tool that couldn't reliably handle the single process form, and that made it into the original documentation that the lecturers learned from. Like those blasted non-standard std_logic_arith libraries. And so the myth has been perpetuated...
Those same lecturers would probably have a fit if they saw what can pass through a modern synthesis tool : integers, enumerations, record types, loops, functions and procedures updating signals (Xilinx ISE is now fine with these. Some versions of Synplicity have trouble with functions, but accept an identical procedure with an Out parameter).
One other comment : I prefer if Rst = '1' then over if (Rst = '1') then. It looks less like line noise (or C).
I agree with Brian on this. The only issue with the one process state machine is you cannot have un-clocked outputs, which is an issue if you need 0 latency on input to output. Otherwise the one process model helps to minimize bugs as it clearly relates the outputs to the state.
I was taught the two process model in school, but have discovered that the one process model is what is generally accepted in industry. I believe the reasoning for using the two process model in school is it gives students an understanding of how the placement of combinational logic relative to registers changes based on how the code is written (which IMO is very important when starting out) and what it means for their design. However simply forcing you to use the two process model with no explanation does not accomplish this.
the first method looks more fool proof since everything is clocked and there is less (no?) risk of introducing latches.
Yes, the first method where everything is clocked has no chance of introducing latches. It may introduce flipflops, but that's fine.
The 2nd method can introduce true asynchronous latches, which even in the best case are not very well handled by the back end FPGA tools I've used, and are not supported at all in some architectures, so would have to be built out of gates or lookup-tables.
In addition, if you get your sensitivity list wrong in the second process, your simulation can differ from your synthesis result! This is because synthesisers (for reasons I've given up trying to understand) treat the sensitivity list as if it were populated with all the signals you read (completely ignoring the VHDL language spec in the process) whereas the simulator will do exactly what you said.
Ugh. I hate the dual process state machine thing personally. He is probably an old guy and this was the most reliable way to do it 20 years ago. The tools understand your way and I personally like that approach better.
Your classmate is absolutely right. The problem here is that your question is not complete. The reason for your colleague's code to be better than yours is that people normally define the output values and the next state values in the same process, as shown below (this is the same as your own code, just with the output values added to it, which results in a "bad"code):
elsif (rising_edge(clk)) then
case state is
when s1 =>
--define outputs:
outp1 <= ...;
outp2 <= ...;
...
--define next state:
if (input = '1') then
state <= s2;
else
state <= s1;
end if;
when s2 =>
...
...
end case;
end if;
Recall that in an FSM the output is produced by the combinational logic section, therefore it is memoryless. However, in the code above, the output values get registered, which is not what the FSM must produce. Indeed, registering the outputs is a case-by-case decision EXTERNAL to the FSM (the outputs could be registered, for example, for glitch removal, which is a PARTICULAR, PLANNED decision, not a FORCED situation, as in the code above).

Is the use of records the solution to all latch problems in VHDL

I was recently told that the solution to all (most) problems with unintended latches during VHDL synthesis is to put whatever the problematic signal is in a record.
This seems like it's a little bit too good to be true, but I'm not that experienced with VHDL so there could be something else that I'm not considering.
Should I put all my signals in records?
No, you should not put all your signals in records. This will quickly become very confusing and you will not gain anything by using the record.
One way that a record may help you avoid latches, is if you register an entire record in a clocked process, you are really registering all of the components of the record. This takes one line of code, instead of possibly tens of lines. In the case where you have many elements which all need to be treated the same, a record can save you "silly mistakes", and possibly save you from creating a latch.
As stated by others, a record doesn't have any specific synthesis interpretation. It is simply a group of signals that you are grouping together for coding-convenience.
I don't see how this would help - a record (or even just parts of a record) can become a latch just as easily as a signal. A latch is generated if a signal keeps its state through some combinatorial process (i.e., is not assigned a value on ALL paths through the process). The same holds for constituents of a record.
Records can be useful to group related signals for readability, but synthesis-wise a record is pretty much equivalent to a bunch of individual signals.
My personal suggestion to avoid latches: avoid combinatorial processes. Make all processes clocked, and do combinatorial logic at the architecture level.
A record is just another way of grouping other types, similar to using an array
for grouping of a std_logic to std_logic_vector, so there is nothing
magical about records that make them better for avoiding latches in a design.
If you get unintended latches in your design, what I guess you think of as
"latch problems", it is because you coding style specifies latches, and you
should change the coding style, as #zennehoy also suggests.
One approach can be to define some code templates for different constructions
that you use, and then stick to these known and working templates.
The template for a flip-flop (FF) with asynchronous reset can be:
process (clk_i, rst_i) is
begin
-- Clock
if rising_edge(clk_i) then
... Control structures with Qs assign by function for Ds
... Synchronous reset is just another branch
end if;
-- Reset (asynchronous) if required
if rst_i = '1' then
... Qs assign with constant reset value for so or all Qs
end if;
end process;
Use concurrent signal assigns when possible, and more complex expressions can
be done through use of concurrent function call, where a function is used
outside a process like:
z_o <= fun(a_i, b_i);
If a process is used to create combinatorial logic, then a common pitfall and
cause for latches in VHDL is to forget a signal in the sensitivity list.
However, VHDL-2008 has a solution for this, since you can use (all) as
sensitivity list, whereby all signal used in the process are implicitly
included in the sensitivity list. So if you use VHDL-2008, then your template
for combinatorial processes can be:
process (all) is
begin
z_o <= a_i and b_i;
end process;
These template should be all you need for typical synthesizable design, and
these will keep your design latch free.

Resources