About sequential code in FPGAs - vhdl

In VHDL, in a process all steps will be executed sequentially, but I wonder how an FPGA can execute steps sequentially. I am very confused about how sequential assignments, functions and similar are being generated in an FPGA, so can anyone throw some light on this topic?
process(d, clk)
begin
if(rising_edge(clk)) then
q <= d;
else
q <= q;
end if;
end process;
This is just code for a simple D-Latch, but how will this be implemented in an FPGA?

It is not "executed" sequentially as such - but the synthesizer interprets the code sequentially, and creates the hardware design to fit such an interpretation.
For instance, if you assign a value to a signal twice during a clocked process, the first assignment is simply ignored, while the second takes effect (remember that a signal is only assigned at the end of a process statement, not immediately):
signal a : UNSIGNED(3 downto 0) := (others => '0');
(...)
process(clk)
begin
if(rising_edge(clk)) then
a <= a - 1;
a <= a + 1;
end if;
end process;
The above process will always increment a by 1. Similarly, if you have the second assignment inside an if statement, the synthesizer will simply create two paths for a - a decrement for when the if statement is not fullfilled, and an increment for when it is.
If you use variables, the idea is the same - although intermediate values are used, as variables take on their new value immediately.
But it all boils down to that the synthesizer does all the "magic" of interpreting your process in a sequential way, then generating hardware that does what you have described.
Your example basically describes a d-flip-flop (the Xilinx FPGA tools iirc distinguish latches and flip-flops in that flip-flops are edge-sensitive, and latches are level-sensitive), although in a different way than typically recommended.
You can basically write the same code as:
process(clk)
begin
if(rising_edge(clk)) then
q <= d;
end if;
end process;
It will automatically keep its value in the other cases. This will be implemented as a flip-flop inside the FPGA. Most FPGAs consist of blocks of look-up tables and flip-flops, to which quite a lot of different hardware can be mapped. The above code will simply by-pass the look-up table, and just use the flip-flop of one of the blocks.
You can learn more about the internal workings by having a look at the datasheet for your particular FPGA. For Spartan3-series FPGAs for instance, have a look at page 24 of the Xilinx Spartan3 FPGA Family Data Sheet

Related

Synchronous counter in VHDL with compare match and load

I created the following Counter with a compare match functionality:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;
entity Counter is
generic (
N : natural := 24
);
port (
-- Input counter clock
clk : in std_logic := '0';
-- Enable the counter
enable : in std_logic := '0';
-- Preload value loaded when clk is rising and load is 1
load_value : in std_logic_vector((N-1) downto 0) := (others => '0');
-- Set to 1 to load a value
load : in std_logic := '0';
-- Compare match input is compared with the counter value
compare_match_value : in std_logic_vector((N-1) downto 0) := (others => '0');
-- Is 1 when compare_match_value = counter_value
compare_match : out std_logic := '0';
output_value : out std_logic_vector((N-1) downto 0) := (others => '0')
);
end Counter;
architecture Behavioral of Counter is
signal counter_value : unsigned((N - 1) downto 0) := to_unsigned(0, N);
begin
output_value <= std_logic_vector(counter_value);
process (clk) is
begin
if rising_edge(clk) then
if enable = '1' then
if load = '1' then
counter_value <= unsigned(load_value);
else
counter_value <= counter_value + 1;
end if;
else
if load = '1' then
counter_value <= unsigned(load_value);
end if;
end if;
end if;
end process;
process (counter_value) is
begin
if unsigned(compare_match_value) = counter_value then
compare_match <= '1';
else
compare_match <= '0';
end if;
end process;
end Behavioral;
The behavior of my counter is to be fully synchronous with the input clk signal. Disabling the counter is always possible and the value is held at the current count value. A load value can be assigned with the load and load_value signal. Whenever the load signal is high and a rising edge is detected, the counter value is updated to the load_value.
Another feature is the compare unit which outputs high on compare_match output. The simulation works as expected but I have a few questions when synthesizing this design on spartan 3 fpga.
Is this considered a good design of my counter because I'm still not much experienced in VHDL.
Are there any undefined states when using the compare unit in further logic in my design? As I see it compare_match is calculated whenever the counter_value is updated.
When using a large number for N, is there anything special about the delay I need to consider?
In general it seems to me a quite good description.
However, I would like to point out some minor things (that might me some answers to your 1st question).
1) As, I see right now your counter does not contain any reset (neither asynchronous nor synchronous). In general you cannot predict the starting point of your counting (even if, probably, it will be all '0's at start-up).
In my opinion, it would be a neater design if you could have a reset signal.
I also noticed that the loading is activated regardless of the fact that the counter is enabled or not. I have no comment about this since it could be a specification for your design. Maybe you can compact the code by moving the "if load" part outside of the "if enable" (i.e. changing the order to the comparisons).
To improve the readability (especially when the designs will be more complex), I advise you to label the process. This will help you to identify the different part of the design.
You can skip lot of the extra-typing if you use the VHDL-mode of emacs. It has built in templates that would take care of the "boring" part related to coding.
I also see that you have default values for your input ports. In my opinion, this is not a very good practice; they would be ignored by synthesizer leading to an IP that might behave differently than what you expect. In general, do not make assumptions (a part those that are specified) on the external signals.
Finally, I have a comment about the compare part.
This goes for both question 1) & 2)
1-2)
In the compare process, you have just listed counter_value in the sensitivity-list.
This means that the process would be activated only when counter_value changes.
Since you compare it with a signal (compare_match_value) that is an input to the block (hence it can change values) it would be better to have it too in the sensitivity-list. Otherwise, the comparison would not apply (i.e. the process would not be activated) when you change compare_match_value.
Linting tools and synthesizer might complain about it (stating warning like incomplete sensitivity-list). As matter of fact it is good practice to list all the signals that may change in the list for combinatorial processes.
Regarding the comparison itself, the way you described it is absolutely fine and you would not have uncovered states. Basically you have specified all possible conditions so no surprises there.
3)
Regarding your 3rd question, since you are targeting an FPGA, you could "relax" about it. FPGAs have dedicated structure for fast arithmetic operations and (as long as you do not use all of them) the synthesizer would use them to close time.
Also in ASIC, synthesizer would probably select an appropriate arithmetic structure to close time.
If you want to be on the safe side, you can add a register at the output of the comparison block. This will prevent creating a long combinatorial path especially if you IP has to be integrated with other blocks. Of course this extra-register would add an 1-clock-cycle latenc, but it will improve your overall timing.
I hope these suggestions could be useful to you and cover (at least partially) your doubts.
Keep on coding :)

Why not a two-process state machine in VHDL?

When I learnt how to express finite state machines in VHDL, it was with a two-process architecture. One process handles the clock/reset signals, and another handles the combinatorial logic of updating the state and output. An example is below.
I've seen this style criticised (see the comments and answer to this question for example), but never in any detail. I'd like to know whether there are objective(ish) reasons behind this.
Are there technical reasons to avoid this style? Xilinx' synthesiser seems to detect it as a state machine (you can see it in the output, and verify the transitions), but do others struggle with it, or generate poor quality implementations?
Is it just not idiomatic VHDL? Remember to avoid opinion-based answers; if it's not idiomatic, is there a widely used teaching resource or reference that uses a different style? Idiomatic styles can also exist because, eg. there are classes of mistakes that are easy to catch with the right style, or because the code structure can better express the problem domain, or for other reasons.
(Please note that I'm not asking for a definition or demonstration of the different styles, I want to know if there are objective reasons to specifically avoid the two-process implementation.)
Example
Some examples can be found in Free Range VHDL (p89). Here's a super simple example:
library ieee;
use ieee.std_logic_1164.all;
-- Moore state machine that transitions from IDLE to WAITING, WAITING
-- to READY, and then READY back to WAITING each time the input is
-- detected as on.
entity fsm is
port(
clk : in std_logic;
rst : in std_logic;
input : in std_logic;
output : out std_logic
);
end entity fsm;
architecture fsm_arc of fsm is
type state is (idle, waiting, ready);
signal prev_state, next_state : state;
begin
-- Synchronous/reset process: update state on clock edge and handle
-- reset action.
sync_proc: process(clk, rst)
begin
if (rst = '1') then
prev_state <= idle;
elsif (rising_edge(clk)) then
prev_state <= next_state;
end if;
end process sync_proc;
-- Combinatorial process: compute next state and output.
comb_proc: process(prev_state, input)
begin
case prev_state is
when idle =>
output <= '0';
if input = '1' then
next_state <= waiting;
else
next_state <= idle;
end if;
when waiting =>
output <= '1';
if input = '1' then
next_state <= ready;
else
next_state <= waiting;
end if;
when ready =>
output <= '0';
if input = '1' then
next_state <= waiting;
else
next_state <= ready;
end if;
end case;
end process comb_proc;
end fsm_arc;
(Note that I don't have access to a synthesiser right now, so there might be some errors in it.)
I always recommend one-process state machines because it avoids two classes of basic errors that are exceedingly common with beginners:
Missing items in the combinational process's sensitivity list cause the simulation to misbehave. It even works in the lab since most synthesizers don't care about the sensitivity list.
Using one of the combinational results as an input instead of the registered version, causing unclocked loops or just long paths/skipped states.
Less importantly, the combinational process reduces simulation efficiency.
Less objectively, I find them easier to read and maintnain; they require less boiler plate and I don't have to keep the sensitivity list in sync with the logic.
The only two objective reasons I see are about readability and simulation efficiency in the case of Moore state machines (where primary outputs depend only on the current state, not the primary inputs).
Readability: if you merge together in a single process outputs and next state computations it might be more difficult to read / understand / maintain than with separate combinatorial processes for separate concerns.
Simulation efficiency: in the 2-processes solution your combinatorial process will be triggered on every primary input and / or current state change. This makes sense for the part of the process that computes the next state but not for the part that computes the outputs. The latter shall be triggered only on current state changes.
There is a detailed description about this in ref. [1] below. First, FSMs are classified in 3 categories (first time this is done), then each is examined thoroughly, with many complete examples. You can find the exact answer to your question on pages 107-115 for category 1 (regular) finite state machines; on pages 185-190 for category 2 (timed) machines; and on pages 245-248 for category 3 (recursive) state machines. The templates are described in detail for both Moore and Mealy version in each of the three categories.
[1] V. Pedroni, Finite State Machines in Hardware: Theory and Design (with VHDL and SystemVerilog), MIT Press, Dec. 2013.

Is rising_edge in VHDL synthesizable

In my coding when I write this statement, it is simulated, but not synthesizable. why? Now what should I do to solve this problem???
IF ((DS0='1' OR DS1='1')and rising_edge(DS0) and rising_edge(DS1) AND DTACK='1' AND BERR='1') THEN
RV0 <= not RV;
else
RV0 <= RV;
The most important thing when doing FPGA-designs is to think hardware.
An FPGA consists of a number of hardware blocks, with a predetermined set of inputs and outputs - the code you write needs to be able to map to these blocks. So even if you write code that is syntactically correct, it doesn't mean that it can actually map to the hardware at hand.
What your code tries to do is:
IF ((DS0='1' OR DS1='1')and rising_edge(DS0) and rising_edge(DS1) AND DTACK='1' AND BERR='1') THEN
(...)
If DS0 and DS1 currently have a rising edge (implying that they're also both '1', making the first part with (DS='1' OR DS1='1') redundant), and if DTACK and BERR are both 1, then do something.
This requires an input block that takes two clock inputs (since you have two signals that you want to test for rising edges simultaneously), and such a block does not exist in any FPGA I've encountered - and also, how close together would such two clock events need to be to be considered "simultaneous"? It doesn't really make sense, unless you specify it within some interval, for instance by using a real clock signal (in the sense of the clock signal going to the clock input of a flip-flop), to sample DS0 and DS1 as shown in Morten Zilmers answer.
In general, you'd want to use one dedicated clock signal in your design (and then use clock enables for parts that need to run slower), or implement some cross-clock-domain synchronization if you need to have different parts of your design run with different clocks.
Depending on the IDE environment you use, you may have access to some language templates for designing various blocks, that can help you with correctly describing the available hardware blocks. In Xilinx ISE you can find these in Edit > Language Templates, then, for instance, have a look at VHDL > Synthesis Constructs > Coding Examples > Flip Flops.
An addition to sonicwave's good answer about thinking hardware and synthesis to
the available elements.
The rising_edge function is generally used to detect the rising edge of a
signal, and synthesis will generally use that signal as a clock input to a
flip-flop or synchronous RAM.
If what you want, is to detect when both DS0 and DS1 goes from '0' to
'1' at the "same" time, then such check is usually made at each rising edge
of a clock, and the change is detected by keeping the value from the
previous rising clock.
Code may look like:
...
signal CLOCK : std_logic;
signal DS0_PREV : std_logic;
signal DS1_PREV : std_logic;
begin
process (CLOCK) is
begin
if rising_edge(CLOCK) then
if (DS0 = '1' and DS0_PREV = '0') and -- '0' to '1' change of DS0
(DS1 = '1' and DS1_PREV = '0') and -- '0' to '1' change of DS1
DTACK = '1' AND BERR = '1' then
RV0 <= not RV;
else
RV0 <= RV;
end if;
DS0_PREV <= DS0; -- Save value
DS1_PREV <= DS1; -- Save value
end if;
end process;
...

VHDL event keyword on std_logic_vector

I am implementing a digital design in VHDL which has to be low power. The design has a lot of inputs that are declared as multiple standard logic vectors. The device is allowed to wake up if anything changes on any input. This has to be combinatorical logic because the device is in power down. The code of what I am trying to do says it all: (ToggleSTDBY is a signal so this is legal)
P_Wakeup: PROCESS (VEC1, VEC2, VEC3, Rst_N) IS
BEGIN
IF Rst_N = '0' THEN
ToggleSTDBY <= '0';
ELSIF VEC1'event OR VEC2'event OR VEC3'event THEN
ToggleSTDBY <= NOT(ToggleSTDBY);
END IF;
END PROCESS P_Wakeup;
This is legal in simulation, but upon synthesis it says "'event is only supported for single bit signals". How can I fix this? There are a total of 66 bits in the vectors all together and I really don't want to write 66 processes for waking the device up. A bitwise OR on all bits will not solve anything, since most signals will be high, so the OR on all bits will always result in a high. The following code:
P_Wakeup: PROCESS (VEC, Rst_N) IS
BEGIN
IF Rst_N = '0' THEN
ToggleSTDBY <= '0';
ELSE
FOR i IN VEC'RANGE LOOP
IF VEC(i)'EVENT THEN
ToggleSTDBY <= NOT(ToggleSTDBY);
END IF;
END LOOP;
END IF;
END PROCESS P_Wakeup;
gives error "The prefix of signal attribute 'EVENT must be a static signal name". How can I fix it AND keep the code readable?
The HDL part of VHDL is abbreviation for Hardware Description Language (HDL),
so the VHDL constructions you can use must be possible to map by the synthesis
tool to the target. The use of 'event is typically tied to sequential
(clocked) hardware elements like flip flops or RAMs, and the synthesis tool
typically requires that you write the VHDL in a specific way, so the tool can
identify that a particular hardware elements is to be used.
Even though you may write VHDL code for a simulator, for example ModelSim, that
can compile and simulate use of 'event as in your examples, the synthesis
tool will typically not be able to map this to any available target hardware
element, since there is probably no such hardware elements as an 'event
detector.
But an 'event actually indicates a change in signal value, so you can maybe
write the signal value change detector explicitly in VHDL like:
change <= '1' if (vec_now /= vec_previous) else '0';
Depending on the low-power hardware elements available, you may start the clock
when an '1' is detected asynchronously on change, maybe through
ToggleSTDBY, and then process the change. The last thing before going into
sleep mode is then to capture the current vec value in vec_previous, so
another change can be detected while in sleep mode.
The possibility for doing low-power design of the kind I assume you are doing
based on the description, depends entirely on the features provided in the
target FPGA/ASIC technology. So before trying to get the VHDL syntax right,
you may want to determine how the resulting hardware should look like, based on
the available low power features.
Even if it is possible to write a VHDL code that models your intended behavior, I believe it won't work as you expect. I suggest that before writing the code you try to sort out the details of how exactly your ToggleSTDBY would be set, tested, and reset (a circuit diagram could help).
If you decide to implement ToggleSTDBY as a vector, one solution for the "event is only supported for single bit signals" problem is to move the loop to outside the process, using a for-generate:
gen: for i in ToggleSTDBY'range generate
p_wakeup : process (vec, rst_n) is
begin
if rst_n = '0' then
ToggleSTDBY(i) <= '0';
else
if vec(i)'event then
ToggleSTDBY(i) <= not (ToggleSTDBY(i));
end if;
end if;
end process p_wakeup;
end generate;

In VHDL when is the right time to use a Process statement?

I'm going through the phases of learning VHDL for the second or third time now. (this time armed with a very good and free e-book ) and I'm finally starting to "get" quite a bit of it. Now I'm learning about behavioral styles and the process statement and most of it makes sense. However, I've read in many places that processes are to be avoided except for in certain cases. I mean, in theory can't everything be implemented in data-flow instead of behavioral?
When exactly should it be obvious that a process statement should be used?
The process statement is extremely useful, in what situations have you been told not to use them?
There are many different cases where you would use a process statement, I'll outline a few of these below:
One of the most common usages of the process statement (for synthesis) is to describe logic which is synchronous to a clock signal, for example a simple counter that increments every clock cycle when not in reset could be described as:
DATA_REGISTER : process(CLOCK)
begin
if rising_edge(CLOCK) then
if RESET = '1' then
COUNTER <= (others => '0');
else
COUNTER <= COUNTER + 1; --COUNTER is assumed to be of type 'unsigned'
end if;
end if;
end process;
As your designs grow more complex you will inevitably implement a state machine at some point, this will employ one or more processes depending on the style of state machine you choose to implement.
For behavorial code you can use processes in conjunction with wait statements to generate test vectors or to model the behaviour of a real system. Here's a really basic example of a 100MHz clock generator taken from one of my testbenches:
architecture BEH of ethernet_receive_tb is
signal s_clock : std_logic := '0'; --Initial assignment to clock kicks off the process.
begin
CLOCKGEN : process(s_clock)
begin
s_clock <= not s_clock after 5 NS;
end process CLOCKGEN;
...
You can also describe asynchronous logic with processes, in this case you need to include all signals which are read in the process in the sensitivity list and you need to make sure that any outputs are always defined to avoid inferred latches.
IF_ELSE: process (SEL, A, B)
begin
F <= B; -- Default assignment
if SEL = '1' then
F <= A;
end if;
end process;
Hopefully you can see that the process statement is very useful and that you will use it in many different situations. I hope this answered your question!
Process blocks are your friend.
They provide a way of saying "This block of code is related. It's inputs are X,Y,Z and it drives A,B,C". The inputs are documented by the sensitivity list (unless it's a clocked process in which case it should be in your comments). If anything else drives the same signals then you'll get warnings, errors, X's in simulation (depending on your tools). Whatever you get it's pretty obvious.
Personally I would be quite happy writing multiple processes in a single entity, but everyone has their styles. For example, if I have multiple pipe-line stages, each stage is a process. If I have parallel non-interfering paths each will be in a separate process. By doing it this way the code is structured in small, easy to read blocks. Small simple logic synthesizes into small fast blocks (in general).
You could view my style as using them as lightweight entities.
In synthesisable code, processes are required any time you need to keep information from one clock cycle to another. "To store state" in the jargon.
(Note that a process can implied by code such as
d <= q when rising_edge(clk);
)
If non-synthesisable code, processes are useful for getting events to happen in a particular order:
p1: process
begin
data <= "--------";
WE <= '0';
wait until reset = '1';
wait until processor_initialised = '1';
assert ACK = '0' report "ACK should be low!" severity error;
data <= X"16";
WE <= '1';
wait until ACK = '1';
end process;
Most of my code has a single process per entity. Each entity does some useful, well-defined and small-enough-to-be-testable task

Resources