sensitivity list VHDL process - vhdl

I'm trying to learn VHDL using Peter Ashenden's book 'The Designer's Guide to VHDL', but can't seem to shake the feeling that I have missed a fundamental item related to sensitivity lists.
for example a question is "Write a model that represents a simple ALU with integer inputs and output, and a function select input of type bit. if the function select is '0', the ALU output should be the sum of the inputs otherwise the output should be the difference of the inputs."
My solution to this is
entity ALU is
port (
a : in integer; -- A port
b : in integer; -- B port
sel : in bit; -- Fun select
z : out integer); -- result
end entity ALU;
architecture behav of ALU is
begin -- architecture behav
alu_proc: process is
variable result : integer := 0;
begin -- process alu_proc
wait on sel;
if sel = '0' then
result := a + b;
else
result := a - b;
end if;
z <= result;
end process alu_proc;
end architecture behav;
with the test bench
entity alu_test is
end entity alu_test;
architecture alu_tb of alu_test is
signal a, b, z : integer;
signal sel : bit;
begin -- architecture alu_tb
dut: entity work.alu(behav)
port map (a, b, sel, z);
test_proc: process is
begin -- process test_proc
a <= 5; b <= 5; wait for 5 ns; sel <= '1'; wait for 5 ns;
assert z = 0;
a <= 10; b <= 5; wait for 5 ns; sel <= '0'; wait for 5 ns;
assert z = 15;
wait;
end process test_proc;
end architecture alu_tb;
my issue has to do with the sensitivity list in the process. Since it is sensitive to changes of the select bit I must do the functions sequentially, first an subtraction, then an addition then a subtraction again in the test bench. In the question I get the feeling that you should be able to do several additions sequentially, no subtraction between. Of course I can add an enable signal and have the process be sensitive to that but I think that should be told in the questions then. Am I missing something in the language or is my solution "correct"?

The problem with the ALU process is that the wait on sel; does not include
a and b, thus the process does not wake up and the output is not
recalculated at changes to these inputs. One way to fix this is to add a and
´b´ to the wait statement, like:
wait on sel, a, b;
However, the common way to write this for processes is with a sensitivity list,
which is a list of signals after the process keyword, thus not with the
wait statement.
Ashendens book 3rd edition page 68 describes that a sensitivity list:
The process statement includes a sensitivity list after the keyword process.
This is a list of signals to which the process is sensitive. When any of
these signals changes value, the process resumes and executes the sequential
statements. After it has executed the last statement, the process suspends
again.
The use of sensitivity list as equivalent to wait statement is also described
in Ashendens book on page 152.
If the process is rewritten to use a sensitivity list, it will be:
alu_proc: process (sel, a, b) is
begin -- process alu_proc
if sel = '0' then
z <= a + b;
else
z <= a - b;
end if;
end process alu_proc;
Note that I removed the result variable, since the z output can just as
well be assigned directly in this case.
The above will recalculate z when any of the values used in the calculation
changes, since all the arguments for calculating z are included in the
sensitivity list. The risk of doing such continuous calculations in this way,
is that if one or more of the arguments are forgotten in the sensitivity list,
a new value for z is not recalculated if the forgotten argument changes.
VHDL-2008 allows automatic inclusion of all signals and ports in the
sensitivity list if all is used like:
alu_proc: process (all) is
A final comment, then for a simple process doing asynchronous calculation, like
for the shown ALU, it is possible to do without a process, if the generation of
z is written like:
z <= (a + b) when (sel = '0') else (a - b);
Using a concurrent assignment, like the above, make it possible to skip the
sensitivity list, and thus the risk of forgetting one of the signals or ports
that are part of the calculation.

Related

Multiplier via Repeated Addition

I need to create a 4 bit multiplier as a part of a 4-bit ALU in VHDL code, however the requirement is that we have to use repeated addition, meaning if A is one of the four bit number and B is the other 4 bit number, we would have to add A + A + A..., B number of times. I understand this requires either a for loop or a while loop while also having a temp variable to store the values, but my code just doesn't seem to be working and I just don't really understand how the functionality of it would work.
PR and T are temporary buffer standard logic vectors and A and B are the two input 4 bit numbers and C and D are the output values, but the loop just doesn't seem to work. I don't understand how to loop it so it keeps adding the A bit B number of times and thus do the multiplication of A * B.
WHEN "010" =>
PR <= "00000000";
T <= "0000";
WHILE(T < B)LOOP
PR <= PR + A;
T <= T + 1;
END LOOP;
C <= PR(3 downto 0);
D <= PR(7 downto 4);
This will never work, because when a line with a signal assignment (<=) like this one:
PR <= PR + A;
is executed, the target of the signal assignment (PR in this case) is not updated immediately; instead an event (a future change) is scheduled. When is this event (change) actioned? When all processes have suspended (reached wait statements or end process statements).
So, your loop:
WHILE(T < B)LOOP
PR <= PR + A;
T <= T + 1;
END LOOP;
just schedules more and more events on PR and T, but these events never get actioned because the process is still executing. There is more information here.
So, what's the solution to your problem? Well, it depends what hardware you are trying to achieve. Are you trying to achieve a block of combinational logic? Or sequential? (where the multiply takes multiple clock cycles)
I advise you to try not to think in terms of "temporary variables", "for loops" and "while loops". These are software constructions that can be useful, but ultimately you are designing a piece of hardware. You need to try to think about what physical pieces of hardware can be connected together to achieve your design, then how you might describe them using VHDL. This is difficult at first.
You should provide more information about what exactly you want to achieve (and on what kind of hardware) to increase the probability of getting a good answer.
You don't mention whether your multiplier needs to operate on signed or unsigned inputs. Let's assume signed, because that's a bit harder.
As has been noted, this whole exercise makes little sense if implemented combinationally, so let's assume you want a clocked (sequential) implementation.
You also don't mention how often you expect new inputs to arrive. This makes a big difference in the implementation. I don't think either one is necessarily more difficult to write than the other, but if you expect frequent inputs (e.g. every clock cycle), then you need a pipelined implementation (which uses more hardware). If you expect infrequent inputs (e.g. every 16 or more clock cycles) then a cheaper serial implementation should be used.
Let's assume you want a serial implementation, then I would start somewhere along these lines:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity loopy_mult is
generic(
g_a_bits : positive := 4;
g_b_bits : positive := 4
);
port(
clk : in std_logic;
srst : in std_logic;
-- Input
in_valid : in std_logic;
in_a : in signed(g_a_bits-1 downto 0);
in_b : in signed(g_b_bits-1 downto 0);
-- Output
out_valid : out std_logic;
out_ab : out signed(g_a_bits+g_b_bits-1 downto 0)
);
end loopy_mult;
architecture rtl of loopy_mult is
signal a : signed(g_a_bits-1 downto 0);
signal b_sign : std_logic;
signal countdown : unsigned(g_b_bits-1 downto 0);
signal sum : signed(g_a_bits+g_b_bits-1 downto 0);
begin
mult_proc : process(clk)
begin
if rising_edge(clk) then
if srst = '1' then
out_valid <= '0';
countdown <= (others => '0');
else
if in_valid = '1' then -- (Initialize)
-- Record the value of A and sign of B for later
a <= in_a;
b_sign <= in_b(g_b_bits-1);
-- Initialize countdown
if in_b(g_b_bits-1) = '0' then
-- Input B is positive
countdown <= unsigned(in_b);
else
-- Input B is negative
countdown <= unsigned(-in_b);
end if;
-- Initialize sum
sum <= (others => '0');
-- Set the output valid flag if we're already finished (B=0)
if in_b = 0 then
out_valid <= '1';
else
out_valid <= '0';
end if;
elsif countdown > 0 then -- (Loop)
-- Let's assume the target is an FPGA with efficient add/sub
if b_sign = '0' then
sum <= sum + a;
else
sum <= sum - a;
end if;
-- Set the output valid flag when we get to the last loop
if countdown = 1 then
out_valid <= '1';
else
out_valid <= '0';
end if;
-- Decrement countdown
countdown <= countdown - 1;
else
-- (Idle)
out_valid <= '0';
end if;
end if;
end if;
end process mult_proc;
-- Output
out_ab <= sum;
end rtl;
This is not immensely efficient, but is intended to be relatively easy to read and understand. There are many, many improvements you could make depending on your requirements.

For the following VHDL Code

For the following VHDL code, assume that D changes to '1' at time 5 ns. Give the values of A, B, C, D, E, and F each time a change occurs. That is, give the values at time 5 ns, 5 + delta, 5 + 2(delta), etc. Carry this out until either 20 steps have occurred, until no further change occurs, or until a repetitive pattern emerges.
entity prob4 is
port (D: inout bit);
end prob4;
architecture q1 of prob4 is
signal A,B,C,E,F: bit;
begin
C <= A;
A <= (B and not E) or D;
P1: proecess (A)
begin
B <= A;
end prcoess P1;
P2: process
wait until A <= '1';
wait for 0 ns;
E <= B after 5 ns;
D < = '0';
F <= E;
end process P2;
end architecture q1;
The problem as presented
There are some obvious syntax errors:
entity prob4 is
port (D: inout bit);
end prob4;
architecture q1 of prob4 is
signal A,B,C,E,F: bit;
begin
C <= A;
A <= (B and not E) or D;
P1: proecess (A) -- misspelled process
begin
B <= A;
end prcoess P1; -- misspelled process
P2: process
-- begin -- missing begin
wait until A <= '1';
wait for 0 ns;
E <= B after 5 ns;
D < = '0'; -- should be "<=" (a single delimiter token signifying assignment)
F <= E;
end process P2;
end architecture q1;
Answering this problem appears to require you've paid attention to lectures, taken notes and/or done the required reading.
Note that there are two drivers one in process P2 assigning D, the other the mode inout port D. There is no resolution for type BIT, implying some event not shown by the VHDL design specification is responsible for D to be assigned a value of 1 at 5 ns and that actual for port D isn't driven. A resolution function can be associated with any subtype declaration (and a port declaration declares a subtype).
That isn't the case here:
entity tb_prob4 is
end entity;
architecture foo of tb_prob4 is
signal D: bit;
begin
DUT:
entity work.prob4 port map (D);
STIMULUS:
process
begin
D <= '1' after 5 ns;
wait;
end process;
end architecture;
ghdl -a prob4 .vhdl
ghdl -e tb_prob4
ghdl -r tb_prob4
./tb_prob4:error: several sources for unresolved signal
for signal: .tb_prob4(foo).ghdl: compilation error
You could theoretically answer the 'problem' supplied in your 'question' by stepping the simulation of an analyzed version of prob4 with the appropriate VHDL tool. This would require forcing D to '1' at 5 ns and releasing it in the next delta cycle (after a step) in the device under test. Otherwise it's a nonsense problem, the VHDL design specification is invalid (see above). You could surmise the bit about "assume that D changes to '1'" was added to stave off questions about the validity.
Solving the problem
The problem can also also be done by hand (paper and pencil).
"... assume that D changes to '1' at time 5 ns" sounds like it implies a single event (no lasting force).
In VHDL an uninitialized signal will take the left most value, which for an enumerated value of type bit is '0' (See package std.standard). This tells you what everything is up to 5 ns. (What's the initial value of A,B,C,D,E,F ?)
Delta cycles are inferred by assignment to the current simulation time. Simulation time advances when further signal assignments are scheduled for the current simulation time. Simulation time is then advanced to the next time a scheduled transaction is present in a projected output waveform for the modeled design hierarchy.
An after schedules a signal event in the projected output waveform for it's target. In the example that would be the current value of B that is scheduled to be assigned to D.
Simulation time advances to the next time with an event (wait, after), wait for 0 ns refers to the current simulation time, will that cause a delta cycle?

Why use concurrent statements in VHDL?

I am just starting with learning vhdl.
Consider the code here : - http://esd.cs.ucr.edu/labs/tutorial/jkff.vhd
I can't understand what are concurrent statements and why are they needed here?
Will it be correct if we modify Q and Qbar directly in process p without using internal signal 'state'? Also why are J,K not in sensitivity list of process p in the snippet?
Concurrent statements, as you may know, in a pure functional sense (i.e. not considering hardware implementation) do not incur any delay. So when you write
Q <= state;
Functionally, Q exactly follows state without any delay.
I am going to guess that the reason an intermediate signal state was used, instead of directly assigning Q inside the process, is that if you directly assign one of your outputs Q in the process, then you cannot "read" the output to derive your Qbar signal.
That is, you couldn't do this:
Qbar <= not Q;
This is because it is not strictly allowable to read an output signal in VHDL. By using "state" you have an internal signal from which you can derive both Q and Qbar.
An alternative, equivalent implementation to this would be to assign both outputs Q and Qbar in each of the cases in the state machine, and eliminate the intermediate state signal completely. However, this seems a bit more complicated since you will have nearly twice as many lines of code for an equivalent functionality.
To answer your second question: J,K are not in the sensitivity list because the process p is a synchronous process. You are describing a memory element (JK FlipFlop), which by definition only updates its outputs when clock or reset change. Input signals J and K can change and the process will not update its outputs. Every time there is a clock edge, or reset is asserted, the process "wakes up" and evaluates inputs, and determines what the output should be. Even in J,K were included in the sensitivity list, provided your ouputs were only updated on rising_edge(clock), then the overall function would be the same (although your code would be confusing).
There is no reason not to have the Q and Qbar assignments inside the process. You need to be slightly careful though.
Whenever a signal is assigned to, the value does not update until the simulator moves on to the next "delta-cycle". This means that within processes, when you assign to a signal, you are axtually only cheduling and update and if you read the signal you will get the "old" value. In order to have the sort of sequential updates you might expect, you use a variable. So you could model the JKFF like this:
architecture behv of JK_FF is
begin
p : process(clock, reset) is
variable state : std_logic;
variable input : std_logic_vector(1 downto 0);
begin
if (reset = '1') then
state := '0';
elsif (rising_edge(clock)) then
input := J & K;
case (input) is
when "11" =>
state := not state;
when "10" =>
state := '1';
when "01" =>
state := '0';
when others =>
null;
end case;
end if;
Q <= state;
Qbar <= not state;
end process;
end behv;
A synthesis note: the assignments to Q and Qbar occur outside of the if rising_edge(clk) so will be interpreted as just like concurrent drivers.

VHDL if condition not working properly

I'm having some difficulties determining why my code is not working properly. I'm trying to create an ALU with a 3-bit op-code.
All but one condition doesn't work properly; op code 011 (SEQ). It's defined as if(a==b) z<='0' and output<='0'. a and b are the inputs, and z is the zero flag. I expected to get this functionality with the following VHDL code
....
BEGIN
result <= dummy_result;
PROCESS (a, b, op)
VARIABLE carry: STD_LOGIC:='0';
--DEFINE ALIAS TO SEPERATE ALU-OP SIGNAL
alias NEG_TSEL : STD_LOGIC IS op(2);
alias ALU_SELECT : STD_LOGIC_VECTOR(1 downto 0) IS op(1 downto 0);
BEGIN
if ALU_SELECT="11" THEN
if NEG_TSEL='0' THEN -- SEQ
if a = b THEN
dummy_result <="00000";
end if;
elsif NEG_TSEL='1' THEN --SCO
cout <= '1';
result <= "XXXXX";
end if;
elsif ALU_SELECT="00" THEN...
With this code, when op = 011, results is always set to zero.
When I change the code to:
.....
if a = b THEN
dummy_result <="00000";
else
dummy_result <= "10101";
end if;
.....
it works fine, but results must not change so instead of the "10101" vector, I change it to "dummy_result <= dummy_result;" but that gives me the same results as the original case gives me.
Any suggestions? Am I doing something wrong?
Here are my issues with your code:
Personally, I feel that on every cycle you should be outputting something to result, cout, and zero. Currently, you always output to zero, but you only conditionally output to the other two ports. This likely is creating latches, which is probably not what you want. So, for example, the SCO operation, should also push something to cout, and the SCO operation should push something to the dummy_result signal.
Your subtraction implementation is not working how you might expect.
when "110" => -- SUB
tmp_b <= NOT b;
carry := '1';
for i in 0 to 4 loop
dummy_result(i) <= carry XOR a(i) XOR tmp_b(i);
carry := (a(i) AND tmp_b(i)) OR (tmp_b(i) AND carry) OR (carry AND a(i));
end loop;
cout <= carry;
tmp_b is a signal. The logic that happens there will happen concurrently with everything else in this process, not sequentially. You likely want that to be a variable, just like carry is.
I also wanted to let you know about case statements (versus if chains). Your code could look like this:
PROCESS (a, b, op)
VARIABLE carry: STD_LOGIC:='0';
BEGIN
case op is
when "011" => -- SEQ
dummy_result <= "00000";
when "111" => -- SCO
cout <= '1';
when "000" => -- AND
dummy_result <= a AND b;
...
when others =>
dummy_result <= "00000";
cout <= '0';
end case;
end process;
To get back to the original problem, SEQ, your original code looked like:
when op = "011" =>
if a = b then
dummy_result <= "00000";
end if;
The problem here, as I mentioned above, is that this is likely a latch. You need to output what you expect the value to be when a /= b, and that can't just be dummy_result <= dummy_result. What do you expect that to become if you pushed it to physical wires and chips?
Instead, you should pass into this entity the value of the last dummy_result, or if a particular operation should hold the value of result, you should output "00000" and also output an additional signal saying that whatever is holding the previous value (in a register), shouldn't update it.
The intent here is that unless otherwise assigned, dummy_result is intended to retain the previous value of "result". Unfortunately, this unit has been implemented as a combinational process, without a clock.
Therefore the storage cannot be reliably implemented in this unit.
Therefore it must be implemented outside this unit.
It almost certainly already is; in a register implemented as a clocked process.
So, bring that register's output back in as a new input port "prev_result" and use a default assignment to dummy_result. That will overcome not only the specific failure you have found so far, but all the other missing assignments to "dummy_result" (there is another) preserving the old value for "result" in a synchronous manner.
PROCESS (a, b, op, prev_result)
-- declarations here
BEGIN
-- default assignment
dummy_result <= prev_result;
if ALU_SELECT="11" THEN
if NEG_TSEL='0' THEN -- SEQ
if a = b THEN
dummy_result <="00000";
end if;
...
END PROCESS;
I think you would be better restructuring the design to make the ALU a clocked process, but if you are under instructions not to, then you will have to adopt this (or similar) solution.

How to represent sequential algorithm in VHDL

I'm coming from software land, and trying to find out how to code sequential algorithm in VHDL. From the text book, it says that the statements inside a process are executed sequentially. But I realized it's only true when it comes to variable, rather than signals. Re signals inside a process,, they get updated at the end of process, and the evaluation is using right operand's previous value. So for my understanding, it's still concurrent. For performance purpose, I cannot always use variables for complex computation.
But how to use signals to present sequential algorithm? My initial
thoughts are using FSM. Is that true? Is FSM the only way to
properly code sequential algorithm in VHDL?
If I'm right that the signals statements within a process is kind of
concurrent, then what's the difference between this and the signal
concurrent assignment in the architecture level? Does the process's
sequential nature only apply to variable assignment?
As you are trying to execute steps of an algorithm in different cycles, you have realised that the "sequential" constructs within a process do not, by themselves, do this - and in fact, variables do not help. A sequential program - unless it uses explicit "wait for some_event" e.g. wait for rising_edge(clk) - will be unrolled and execute in a single clock cycle.
As you have probably discovered using variables, this may be rather a long clock cycle.
There are three main ways of sequentialising execution in VHDL, with different purposes.
Let's try them to implement a linear interpolation between a and b,
a, b, c, x : unsigned(15 downto 0);
x <= ((a * (65536 - c)) + (b * c)) / 65536;
(1) is the classic state machine; the best form being the single process SM.
Here the computation is broken down into several cycles which ensure that at most one multiply is in progress at a time (multipliers are expensive!) but C1 is computed in parallel (addition/subtraction is cheap!). It could safely be re-written with variables instead of signals for the intermediate results.
type state_type is (idle, step_1, step_2, done);
signal state : state_type := idle;
signal start : boolean := false;
signal c1 : unsigned(16 downto 0); -- range includes 65536!
signal p0, p1, s : unsigned(31 downto 0);
process(clk) is
begin
if rising_edge(clk) then
case state is
when idle => if start then
p1 <= b * c;
c1 <= 65536 - c;
state <= step_1;
end if;
when step_1 => P0 <= a * c1;
state <= step_2;
when step_2 => s <= p0 + p1;
state <= done;
when done => x <= s(31 downto 16);
if not start then -- avoid retriggering
state <= idle;
end if;
end case;
end if;
end process;
(2) is the "implicit state machine" linked by Martin Thompson (excellent article!) ... edited to add link as Martin's answer disappeared.
Same remarks apply to it as for the explicit state machine.
process(clk) is
begin
if start then
p1 <= b * c;
c1 <= 65536 - c;
wait for rising_edge(clk);
p0 <= a * c1;
wait for rising_edge(clk);
s <= p0 + p1;
wait for rising_edge(clk);
x <= s(31 downto 16);
while start loop
wait for rising_edge(clk);
end loop;
end if;
end process;
(3) is a pipelined processor. Here, execution takes several cycles, yet everything happens in parallel! The depth of the pipeline (in cycles) allows each logically sequential step to happen in sequential manner. This allows high performance as long chains of computations are broken into cycle-sized steps...
signal start : boolean := false;
signal c1 : unsigned(16 downto 0); -- range includes 65536!
signal pa, pb, pb2, s : unsigned(31 downto 0);
signal a1 : unsigned(15 downto 0);
process(clk) is
begin
if rising_edge(clk) then
-- first cycle
pb <= b * c;
c1 <= 65536 - c;
a1 <= a; -- save copy of a for next cycle
-- second cycle
pa <= a1 * c1; -- NB this is the LAST cycle copy of c1 not the new one!
pb2 <= pb; -- save copy of product b
-- third cycle
s <= pa + pb2;
-- fourth cycle
x <= s(31 downto 16);
end if;
end process;
Here, resources are NOT shared; it will use 2 multipliers since there are
2 multiplies in each clock cycle. It will also use a lot more registers for
the intermediate results and copies. However, given new values for a,b,c in every cycle it will spit out a new result every cycle - four cycles delayed from the inputs.
Most multi-cycle algorithms can be implemented either by using an FSM as you suggest, or by using pipelined logic. Pipelined logic is probably the better choice if the algorithm consists of strictly sequential steps (i.e., no loops), an FSM would typically only be used for more complex algorithms that require different control flows depending on the input.
Pipelined logic is effectively a very long chain of combinatorial logic split into multiple "stages" using registers, with data flowing from one stage to the next. The registers are added to reduce the delay of each stage (between two registers), allowing higher clock frequencies at the cost of increased latency. Note however that higher latency does not mean lower throughput, since new data can begin processing before the previous data item has completed! This is generally not possible with an FSM.
The biggest difference between signal assignment within a process as opposed to the architecture is that you may assign a value to a signal in multiple places within the process, with the last assignment "winning". At the architecture level, only a single assignment statement to a signal is possible. Many control flow statements (if, case/when, etc.) are also only available within a process, not at the architecture level.

Resources