I have two process'es like below.
If say A=1, B=2 and C=3, what happens in simulation is on rising_edge B=1 and C=2, which is the result I want.
But am I guaranteed that this is also true when the design is implemented into an fpga?
What worries me is the delay associated with the extra if-state in process BC.
AB : process(A,clk)
begin
if rising_edge(clk) then
B <= A;
end if;
end process;
BC : process(B,clk)
begin
if rising_edge(clk) then
if (some_statement) then
C <= B;
end if;
end if;
end process;
B will take on the value of A (B=1) and C will take on the value of B (C=2).
However, I guess you are not actually describing what you want to. The problem is that you have A and B in the sensitivity list of the two processes. This means that in process AB, B will change each time A changes as well as when rising_edge(clk) is true. The same holds for process BC. Assuming you want to describe two registers in series, your code should be
AB : process(clk)
begin
if rising_edge(clk) then
B <= A;
end if;
end process;
BC : process(clk)
begin
if rising_edge(clk) then
if (some_statement) then
C <= B;
end if;
end if;
end process;
In this case, if you synthesize this code onto an FPGA, you will infer two registers. The register in process BC will use the registers enable signal which is connected to the boolean output of some_statement. If some_statement is already a single std_logic signal, this will not introduce additional delay but require some routing ressources, so you should still avoid using the enable signal where you don't really need it.
I think Simon answered the question perfectly, just to clarify the issue a bit further:
If initial values of your data is A=1, B=2 and C=3 then you will have the following during the simulation:
During start A=1, B=2 and C=3
After first rising edge of the clock A=1, B=1 and C=2
After second rising edge of the clock A=1, B=1 and C=1
After that, all signals will be 1.
The delay of the if statement must be more than 'clock period' - 'hold time necessary for internal registers' to cause any problem for you. Unless you have an extremely complicated logic with signals from multiple clock domains there, there is a little risk you get into problem with your code (the more accurate code is the one sent by Simon).
But am I guaranteed that this is also true when the design is implemented into an fpga?
The synthesizer ought to produce an FPGA which matches the behaviour of your VHDL in the simulator. If not, it's a bug!
Note that there are some "accepted" deviations - for example, if you miss a right-hand side signal off the sensitivity list, the synthesizer will assume you meant to put it there, but the simulator will assume you know what you're doing, and there will be a mismatch. Personally, I regard that behaviour as a bug, but it is too firmly entrenched by too many tools, I don't see it ever changing.
What worries me is the delay associated with the extra if-state in process BC.
Everything in a clocked process like yours "executes" within a single clock tick. If there is too much logic (for example, each nested if introduces a new layer of logic), you may find that a clock tick has to last longer than you desire.
(Not like software on most modern micros, where everything "takes as long as it takes", and is often unpredictable, depending on the state of caches, TLBs etc.)
Related
When I learnt how to express finite state machines in VHDL, it was with a two-process architecture. One process handles the clock/reset signals, and another handles the combinatorial logic of updating the state and output. An example is below.
I've seen this style criticised (see the comments and answer to this question for example), but never in any detail. I'd like to know whether there are objective(ish) reasons behind this.
Are there technical reasons to avoid this style? Xilinx' synthesiser seems to detect it as a state machine (you can see it in the output, and verify the transitions), but do others struggle with it, or generate poor quality implementations?
Is it just not idiomatic VHDL? Remember to avoid opinion-based answers; if it's not idiomatic, is there a widely used teaching resource or reference that uses a different style? Idiomatic styles can also exist because, eg. there are classes of mistakes that are easy to catch with the right style, or because the code structure can better express the problem domain, or for other reasons.
(Please note that I'm not asking for a definition or demonstration of the different styles, I want to know if there are objective reasons to specifically avoid the two-process implementation.)
Example
Some examples can be found in Free Range VHDL (p89). Here's a super simple example:
library ieee;
use ieee.std_logic_1164.all;
-- Moore state machine that transitions from IDLE to WAITING, WAITING
-- to READY, and then READY back to WAITING each time the input is
-- detected as on.
entity fsm is
port(
clk : in std_logic;
rst : in std_logic;
input : in std_logic;
output : out std_logic
);
end entity fsm;
architecture fsm_arc of fsm is
type state is (idle, waiting, ready);
signal prev_state, next_state : state;
begin
-- Synchronous/reset process: update state on clock edge and handle
-- reset action.
sync_proc: process(clk, rst)
begin
if (rst = '1') then
prev_state <= idle;
elsif (rising_edge(clk)) then
prev_state <= next_state;
end if;
end process sync_proc;
-- Combinatorial process: compute next state and output.
comb_proc: process(prev_state, input)
begin
case prev_state is
when idle =>
output <= '0';
if input = '1' then
next_state <= waiting;
else
next_state <= idle;
end if;
when waiting =>
output <= '1';
if input = '1' then
next_state <= ready;
else
next_state <= waiting;
end if;
when ready =>
output <= '0';
if input = '1' then
next_state <= waiting;
else
next_state <= ready;
end if;
end case;
end process comb_proc;
end fsm_arc;
(Note that I don't have access to a synthesiser right now, so there might be some errors in it.)
I always recommend one-process state machines because it avoids two classes of basic errors that are exceedingly common with beginners:
Missing items in the combinational process's sensitivity list cause the simulation to misbehave. It even works in the lab since most synthesizers don't care about the sensitivity list.
Using one of the combinational results as an input instead of the registered version, causing unclocked loops or just long paths/skipped states.
Less importantly, the combinational process reduces simulation efficiency.
Less objectively, I find them easier to read and maintnain; they require less boiler plate and I don't have to keep the sensitivity list in sync with the logic.
The only two objective reasons I see are about readability and simulation efficiency in the case of Moore state machines (where primary outputs depend only on the current state, not the primary inputs).
Readability: if you merge together in a single process outputs and next state computations it might be more difficult to read / understand / maintain than with separate combinatorial processes for separate concerns.
Simulation efficiency: in the 2-processes solution your combinatorial process will be triggered on every primary input and / or current state change. This makes sense for the part of the process that computes the next state but not for the part that computes the outputs. The latter shall be triggered only on current state changes.
There is a detailed description about this in ref. [1] below. First, FSMs are classified in 3 categories (first time this is done), then each is examined thoroughly, with many complete examples. You can find the exact answer to your question on pages 107-115 for category 1 (regular) finite state machines; on pages 185-190 for category 2 (timed) machines; and on pages 245-248 for category 3 (recursive) state machines. The templates are described in detail for both Moore and Mealy version in each of the three categories.
[1] V. Pedroni, Finite State Machines in Hardware: Theory and Design (with VHDL and SystemVerilog), MIT Press, Dec. 2013.
I am taking a class on embedded system design and one of my class mates, that has taken another course, claims that the lecturer of the other course would not let them implement state machines like this:
architecture behavioral of sm is
type state_t is (s1, s2, s3);
signal state : state_t;
begin
oneproc: process(Rst, Clk)
begin
if (Rst = '1') then
-- Reset
elsif (rising_edge(Clk)) then
case state is
when s1 =>
if (input = '1') then
state <= s2;
else
state <= s1;
end if;
...
...
...
end case;
end if;
end process;
end architecture;
But instead they had to do like this:
architecture behavioral of sm is
type state_t is (s1, s2, s3);
signal state, next_state : state_t;
begin
syncproc: process(Rst, Clk)
begin
if (Rst = '1') then
--Reset
elsif (rising_edge(Clk)) then
state <= next_state;
end if;
end process;
combproc: process(state)
begin
case state is
when s1 =>
if (input = '1') then
next_state <= s2;
else
next_state <= s1;
end if;
...
...
...
end case;
end process;
end architecture;
To me, who is very inexperienced, the first method looks more fool proof since everything is clocked and there is less (no?) risk of introducing latches.
My class mate can't give me any reason for why his lecturer would not let them use the other way of implementing it so I'm trying to find the pros and cons of each.
Is any of them prefered in industry? Why would I want to avoid one or the other?
The single process form is simpler and shorter. This alone reduces the chance that it contains errors.
However the fact that it also eliminates the "incomplete sensitivity list" problem that plagues the other's combinational process should make it the clear winner regardless of any other considerations.
And yet there are so many texts and tutorials advising the reverse, without properly justifying that advice or (in at least one case I can't find atm) introducing a silly mistake into the single process form and rejecting the entire idea on the grounds of that mistake.
The only thing (AFAIK) the single-process form doesn't do well is un-clocked outputs. These are (IMO) poor practice anyway as they can be races at the best of times, and could be handled by a separate combinational process for that output only if you really had to.
I'm guessing there was originally some practical reason behind it; maybe a mid-1990s synthesis tool that couldn't reliably handle the single process form, and that made it into the original documentation that the lecturers learned from. Like those blasted non-standard std_logic_arith libraries. And so the myth has been perpetuated...
Those same lecturers would probably have a fit if they saw what can pass through a modern synthesis tool : integers, enumerations, record types, loops, functions and procedures updating signals (Xilinx ISE is now fine with these. Some versions of Synplicity have trouble with functions, but accept an identical procedure with an Out parameter).
One other comment : I prefer if Rst = '1' then over if (Rst = '1') then. It looks less like line noise (or C).
I agree with Brian on this. The only issue with the one process state machine is you cannot have un-clocked outputs, which is an issue if you need 0 latency on input to output. Otherwise the one process model helps to minimize bugs as it clearly relates the outputs to the state.
I was taught the two process model in school, but have discovered that the one process model is what is generally accepted in industry. I believe the reasoning for using the two process model in school is it gives students an understanding of how the placement of combinational logic relative to registers changes based on how the code is written (which IMO is very important when starting out) and what it means for their design. However simply forcing you to use the two process model with no explanation does not accomplish this.
the first method looks more fool proof since everything is clocked and there is less (no?) risk of introducing latches.
Yes, the first method where everything is clocked has no chance of introducing latches. It may introduce flipflops, but that's fine.
The 2nd method can introduce true asynchronous latches, which even in the best case are not very well handled by the back end FPGA tools I've used, and are not supported at all in some architectures, so would have to be built out of gates or lookup-tables.
In addition, if you get your sensitivity list wrong in the second process, your simulation can differ from your synthesis result! This is because synthesisers (for reasons I've given up trying to understand) treat the sensitivity list as if it were populated with all the signals you read (completely ignoring the VHDL language spec in the process) whereas the simulator will do exactly what you said.
Ugh. I hate the dual process state machine thing personally. He is probably an old guy and this was the most reliable way to do it 20 years ago. The tools understand your way and I personally like that approach better.
Your classmate is absolutely right. The problem here is that your question is not complete. The reason for your colleague's code to be better than yours is that people normally define the output values and the next state values in the same process, as shown below (this is the same as your own code, just with the output values added to it, which results in a "bad"code):
elsif (rising_edge(clk)) then
case state is
when s1 =>
--define outputs:
outp1 <= ...;
outp2 <= ...;
...
--define next state:
if (input = '1') then
state <= s2;
else
state <= s1;
end if;
when s2 =>
...
...
end case;
end if;
Recall that in an FSM the output is produced by the combinational logic section, therefore it is memoryless. However, in the code above, the output values get registered, which is not what the FSM must produce. Indeed, registering the outputs is a case-by-case decision EXTERNAL to the FSM (the outputs could be registered, for example, for glitch removal, which is a PARTICULAR, PLANNED decision, not a FORCED situation, as in the code above).
In VHDL, in a process all steps will be executed sequentially, but I wonder how an FPGA can execute steps sequentially. I am very confused about how sequential assignments, functions and similar are being generated in an FPGA, so can anyone throw some light on this topic?
process(d, clk)
begin
if(rising_edge(clk)) then
q <= d;
else
q <= q;
end if;
end process;
This is just code for a simple D-Latch, but how will this be implemented in an FPGA?
It is not "executed" sequentially as such - but the synthesizer interprets the code sequentially, and creates the hardware design to fit such an interpretation.
For instance, if you assign a value to a signal twice during a clocked process, the first assignment is simply ignored, while the second takes effect (remember that a signal is only assigned at the end of a process statement, not immediately):
signal a : UNSIGNED(3 downto 0) := (others => '0');
(...)
process(clk)
begin
if(rising_edge(clk)) then
a <= a - 1;
a <= a + 1;
end if;
end process;
The above process will always increment a by 1. Similarly, if you have the second assignment inside an if statement, the synthesizer will simply create two paths for a - a decrement for when the if statement is not fullfilled, and an increment for when it is.
If you use variables, the idea is the same - although intermediate values are used, as variables take on their new value immediately.
But it all boils down to that the synthesizer does all the "magic" of interpreting your process in a sequential way, then generating hardware that does what you have described.
Your example basically describes a d-flip-flop (the Xilinx FPGA tools iirc distinguish latches and flip-flops in that flip-flops are edge-sensitive, and latches are level-sensitive), although in a different way than typically recommended.
You can basically write the same code as:
process(clk)
begin
if(rising_edge(clk)) then
q <= d;
end if;
end process;
It will automatically keep its value in the other cases. This will be implemented as a flip-flop inside the FPGA. Most FPGAs consist of blocks of look-up tables and flip-flops, to which quite a lot of different hardware can be mapped. The above code will simply by-pass the look-up table, and just use the flip-flop of one of the blocks.
You can learn more about the internal workings by having a look at the datasheet for your particular FPGA. For Spartan3-series FPGAs for instance, have a look at page 24 of the Xilinx Spartan3 FPGA Family Data Sheet
I would like to latch a signal, however when I try to do so, I get a delay of one cycle, how can I avoid this?
myLatch: process(wclk, we) -- Can I ommit the we in the sensitivity list?
begin
if wclk'event and wclk = '1' then
lwe <= we;
end if;
end process;
However if I try this and look into the waves during simulation lwe is delayed by one cycle of wclk. All I want tp achieve is to sample we on the rising edge of wclk and keep it stable till the next rising edge. I then assign the latched signal to another entities port map which is defined in the architecture.
==============================================
Well I figured out that I have to omit the wclk'event to get a latch instead of a flip flop. This seems rather unintuitive to me. By simply shortening the time where I sample the signal to be latched I go from latch to flip flop. Can anyone explain why this is and where my perception is wrong. (I am a vhdl beginner)
First off, a few observations on the process you pasted above:
myLatch: process(wclk, we)
begin
if wclk'event and wclk = '1' then
lwe <= we;
end if;
end process;
The signal we can be omitted from the sensitivity list because you have described a clocked process. The only signals required in the sensitivity list of a process like this are the clock and the asynchronous reset if you choose to use one (a synchronous reset would not need to be added to the sensitivity list).
Instead of using if wclk'event and wclk = '1' then you should instead use if rising_edge(wclk) then or if falling_edge(wclk) then, there's a good blog post on the reasons why here.
By omitting the wclk'event you changed the process from a clocked process to a combinatorial process, like so:
myLatch: process(wclk, we)
begin
if wclk = '1' then
lwe <= we;
end if;
end process;
In a combinatorial process all inputs should be present in the sensitivity list, so you would be correct to have both wclk and we in the list as they had an influence on the output. Normally you would ensure that lwe is assigned in all cases of your if statement to avoid inferring a latch, however this appears to be your intention in this case.
Latches in general should be avoided, so if you find yourself needing one you should perhaps pause and consider your approach. Doulos have a couple of articles on latches here and here that you might find useful.
You stated that all you want to achieve is to sample we on the rising edge of wclk and keep it stable until the next rising edge. The process below will accomplish this:
store : process(wclk)
begin
if rising_edge(wclk) then
lwe <= we;
end if;
end process;
With this process, lwe will be updated with the value of we upon every rising edge of wclk and it will remain valid for a single clock cycle.
Let me know if this clears things up for you.
Believe it or not, the issue is actually in your testbench. This has to do with how the VHDL simulation model works.
VHDL is usually used for synchronous hardware design -- that means, using flip-flops that sample on the rising edge and set outputs on the falling edge, so that there are no race conditions between reading and writing. But in VHDL this master/slave logic is not actually simulated using opposite clock edges.
Consider a process
process (clock) begin
if rising_edge(clock) then
a <= b;
end if;
end process;
At the start of a simulation timestep, if clock has just risen, the if will execute. Then the assignment a <= b will be executed, and this will not immediately cause an assignment to take place, but schedule the assignment for the end of the timestep.
After all processes have been run, then all scheduled assignments take place. This means that no process will "see" the new value of a until the next timestep.
Time a b Actions
Start of ts 1 '0' '1' a <= '1' is scheduled
End of ts 1 '1' '0' a <= '1' is executed
Start of ts 2 '1' '0' a <= '0' is scheduled
End of ts 2 '0' '1' a <= '0' is executed
So when you look on the waveform viewer, what you will see is a apparently being set on the rising edge of the clock, and following b delayed by one clock cycle; you don't see the intermediate scheduling of assignments that causes this to happen.
Of course, in real life, there is no "end of the timestep", and the actual changing of signal a happens when the slave part of the flip-flop triggers, ie, on the negative edge. (Maybe it would have been less confusing for VHDL to just use the negative edge; but, oh well, this is how it works).
Here are two testbenches for your latch code:
Test bench 1, using rising edges
Test bench 2, using falling edges
In the first, if you look in the waveform viewer you will see exactly what you describe -- lwe appears to be delayed by 1 clock cycle -- but really, the delay is happening in the non-blocking assignment that sets counter -- so when the rising edge happens, we does not actually have its new value yet. And in the second, you see no such delay; lwe is set exactly on the rising edge to the value of we at that time.
For a related topic in Verilog, see Nonblocking Assignments in Verilog Synthesis, Coding Styles That Kill .
The process you have is what you want according to your description, although 'we' should be removed from the sensitivity list. If this doesn't work as you believe it should it is almost certainly a problem with your test bench/simulation. (See Owen's answer.) Specifically you are probably changing the value of 'we' too late, so that the flip-flop latches the previous value instead of the new one.
I'm interested to know what the source of this signal is though, if it's an asynchronous signal that can change at any time you will have to add some logic to protect against metastability.
To answer your second question about latches, it is correct that omitting wclk'event will result in a latch. This process will not do what you want, however, because it will propagate changes to 'we' to 'lwe' during the whole positive half-period of the clock. The short answer to your question is that implementing this type of behavior requires a latch, while the behavior described by the original process requires a flip-flop.
I'm going through the phases of learning VHDL for the second or third time now. (this time armed with a very good and free e-book ) and I'm finally starting to "get" quite a bit of it. Now I'm learning about behavioral styles and the process statement and most of it makes sense. However, I've read in many places that processes are to be avoided except for in certain cases. I mean, in theory can't everything be implemented in data-flow instead of behavioral?
When exactly should it be obvious that a process statement should be used?
The process statement is extremely useful, in what situations have you been told not to use them?
There are many different cases where you would use a process statement, I'll outline a few of these below:
One of the most common usages of the process statement (for synthesis) is to describe logic which is synchronous to a clock signal, for example a simple counter that increments every clock cycle when not in reset could be described as:
DATA_REGISTER : process(CLOCK)
begin
if rising_edge(CLOCK) then
if RESET = '1' then
COUNTER <= (others => '0');
else
COUNTER <= COUNTER + 1; --COUNTER is assumed to be of type 'unsigned'
end if;
end if;
end process;
As your designs grow more complex you will inevitably implement a state machine at some point, this will employ one or more processes depending on the style of state machine you choose to implement.
For behavorial code you can use processes in conjunction with wait statements to generate test vectors or to model the behaviour of a real system. Here's a really basic example of a 100MHz clock generator taken from one of my testbenches:
architecture BEH of ethernet_receive_tb is
signal s_clock : std_logic := '0'; --Initial assignment to clock kicks off the process.
begin
CLOCKGEN : process(s_clock)
begin
s_clock <= not s_clock after 5 NS;
end process CLOCKGEN;
...
You can also describe asynchronous logic with processes, in this case you need to include all signals which are read in the process in the sensitivity list and you need to make sure that any outputs are always defined to avoid inferred latches.
IF_ELSE: process (SEL, A, B)
begin
F <= B; -- Default assignment
if SEL = '1' then
F <= A;
end if;
end process;
Hopefully you can see that the process statement is very useful and that you will use it in many different situations. I hope this answered your question!
Process blocks are your friend.
They provide a way of saying "This block of code is related. It's inputs are X,Y,Z and it drives A,B,C". The inputs are documented by the sensitivity list (unless it's a clocked process in which case it should be in your comments). If anything else drives the same signals then you'll get warnings, errors, X's in simulation (depending on your tools). Whatever you get it's pretty obvious.
Personally I would be quite happy writing multiple processes in a single entity, but everyone has their styles. For example, if I have multiple pipe-line stages, each stage is a process. If I have parallel non-interfering paths each will be in a separate process. By doing it this way the code is structured in small, easy to read blocks. Small simple logic synthesizes into small fast blocks (in general).
You could view my style as using them as lightweight entities.
In synthesisable code, processes are required any time you need to keep information from one clock cycle to another. "To store state" in the jargon.
(Note that a process can implied by code such as
d <= q when rising_edge(clk);
)
If non-synthesisable code, processes are useful for getting events to happen in a particular order:
p1: process
begin
data <= "--------";
WE <= '0';
wait until reset = '1';
wait until processor_initialised = '1';
assert ACK = '0' report "ACK should be low!" severity error;
data <= X"16";
WE <= '1';
wait until ACK = '1';
end process;
Most of my code has a single process per entity. Each entity does some useful, well-defined and small-enough-to-be-testable task