What's wrong with this simple VHDL for loop? - vhdl

For some reason the OutputTmp variable will always be uninitialized in the simulation. I can make it work without a for loop but I really want to automate it so I can later move on to bigger vectors. The intermediate variable works fine.
Note: I'm a DBA and C# programmer, really new to VHDL, sorry if this is a stupid question.
architecture Arch of VectorMultiplier4 is
signal Intermediate : std_logic_vector(0 to 4);
signal OutputTmp : std_logic;
begin
process (Intermediate)
begin
for i in 0 to 4 loop
Intermediate(i) <= (VectorA(i) AND VectorB_Reduced(i));
end loop;
--THIS IS WHAT DOES NOT WORK APPARENTLY
OutputTmp <= '0';
for i in 0 to 4 loop
OutputTmp <= OutputTmp XOR Intermediate(i);
end loop;
Output <= OutputTmp;
end process;
end architecture;
Thanks!

This is slightly different from the answer fru1tbat points to.
One characteristic of a signal assignment is that it is scheduled for the current or a future simulation time. No signal assignment actually takes effect while any simulation process is pending (and all signal involved statements are devolved into either block statements preserving hierarchy and processes or just processes).
You can't rely on the signal value you have just assigned (scheduled for update) during the same simulation cycle.
The new signal value isn't available in the current simulation cycle.
A signal assignment without a delay in the waveform (no after Time) will be available in the next simulation cycle, which will be a delta cycle. You can only 'see' the current value of signal.
Because OutputTmp appears to be named as an intermediary value you could declare it as a variable in the process (deleting the signal declaration, or renaming one or the other).
process (VectorA, VectorB_Reduced)
variable OutputTmpvar: std_logic;
variable Intermediate: std_logic_vector (0 to 4);
begin
for i in 0 to 4 loop
Intermediate(i) := (VectorA(i) AND VectorB_Reduced(i));
end loop;
-- A variable assignment takes effect immediately
OutputTmpvar := '0';
for i in 0 to 4 loop
OutputTmpvar := OutputTmpv XOR Intermediate(i);
end loop;
Output := OutputTmpvar;
end process;
And this will produce an odd parity value of the elements of the Intermediate array elements.
Note that Intermediate has also been made a variable for the same reason and VectorA and VectorB_Reduced have been placed in the sensitivity list instead of Intermediate.
And all of this can be further reduced.
process (VectorA, VectorB_Reduced)
variable OutputTmpvar: std_logic;
begin
-- A variable assignment takes effect immediately
OutputTmpvar := '0';
for i in 0 to 4 loop
OutputTmpvar := OutputTmpvar XOR (VectorA(i) AND VectorB_Reduced(i));
end loop;
Output <= OutputTmpvar;
end process;
Deleting Intermediate.
Tailoring for synthesis and size extensibility
And if you need to synthesis the loop:
process (VectorA, VectorB_Reduced)
variable OutputTmp: std_logic_vector (VectorA'RANGE) := (others => '0');
begin
for i in VectorA'RANGE loop
if i = VectorA'LEFT then
OutputTmp(i) := (VectorA(i) AND VectorB_Reduced(i));
else
OutputTmp(i) := OutputTmp(i-1) XOR (VectorA(i) AND VectorB_Reduced(i));
end if;
end loop;
Output <= OutputTmp(VectorA'RIGHT);
end process;
Where there's an assumption VectorA and VectorB_reduced have the same dimensionality (bounds).
What this does is provide ever node of the synthesis result 'netlist' with a unique name and will generate a chain of four XOR gates fed by five AND gates.
This process also shows how to deal with any size matching bounds input arrays (VectorA and VectorB_Reduced in the example) by using attributes. If you need to deal with the case of the two inputs having different bounds but the same length you can create
variable copies of them with the same bounds, something you'd like do as a matter of form if this were implemented in a function.
Flattening the chain of XORs is something handled in the synthesis domain using performance constraints. (For a lot of FPGA architectures the XOR's will fit in one LUT because of XOR's commutative and associative properties).
(The above process has been analyzed, elaborated and simulated in a VHDL model).

When you enter a VHDL process, signals keeps their value until the process is done (or a wait is reached). So, all the lines that assign OutputTmp can be replaced by
OutputTmp <= OutputTmp XOR Intermediate(4);
Which clearly keep OutputTmp unknown if it is unknown when you enter the process.
When programming, all statement are executed one after the other. In HDL, all statement are executed at the same time. You can use variables in VHDL to achieve the same comportment as in C, but I would not recommend it for a beginner willing to learn VHDL for synthesis.

Related

Assign multiple values to a signal during 1 process

If you assign a value to a signal in a process, does it only become the correct value of the signal at the end of the process?
So there would be no point in assigning a value to a signal more than once per process, because the last assignment would be the only one that would be implemented, correct?
I'm a bit desperate because I'm trying to implement the booth algorithm in VHDL with signals and I can't get it baked. It wasn't a problem with variables, but signals make it all more difficult.
I tried a for loop, but that doesn't work because I have to update the values within the loop.
My next idea is a counter in the testbench.
Would be very thanksful for an idea!
my current Code look like this:
architecture behave of booth is
signal buffer_result1, buffer_result2, buffer_result3: std_logic_vector(7 downto 0) := "0000"&b;
signal s: std_logic:= '0';
signal count1, count2: integer:=0;
begin
assignment: process(counter) is
begin
if counter = "000" then
buffer_result1 <= "0000"&b;
end if;
end process;
add_sub: process(counter) is
begin
if counter <= "011" then
if(buffer_result1(0) = '1' and s = '0') then
buffer_result2 <= buffer_result1(7 downto 4)-a;
else if (buffer_result1(0) = '0' and s = '1') then
buffer_result2 <= buffer_result1(7 downto 4)+a;
end if;
end if;
end process;
shift:process(counter) is
begin
if counter <= "011"
buffer_result3(7) <= buffer_result2(7);
buffer_result3(6 downto 0) <= buffer_result2(7 downto 1);
s<= buffer_result3(0);
else
result<=buffer_result3;
end if;
end behave;
Short answer: that's correct. A signal's value will not update until the end of your process.
Long answer: A signal will only update when its assignment takes effect. Some signal assignments will use after and specify a time, making the transaction time explicit. Without an explicit time given, signals will update after the default "time-delta," an "instant" of simulation time that passes as soon as all concurrently executing statements at the given sim time have completed (e.g. a process). So your signals will hold their initial values until the process completes, at which point sim time moves forward one "delta," and the values update.
That does not mean that multiple signal assignment statements to the same signal don't accomplish anything in a process. VHDL will take note of all assignments, but of a series of assignments given with the same transaction time, only the last assignment will take effect. This can be used for a few tricky things, although I've encountered differences of opinion on how often they should be tried. For instance:
-- Assume I have a 'clk' coming in
signal pulse : std_ulogic;
signal counter : unsigned(2 downto 0);
pulse_on_wrap : process(clk) is
begin
clock : if rising_edge(clk):
pulse <= '0'; -- Default assignment to "pulse" is 0
counter <= counter + 1; -- Counter will increment each clock cycle
if counter = 2**3-1 then
pulse <= '1'; -- Pulse high when the counter drops to 0 (after this cycle)
end if;
end if clock;
end process pulse_on_wrap;
Here, the typical behavior is to assign the value '0' to pulse on each clock cycle. But if counter hits its max value, there will be a following assignment to pulse, which will set it to '1' once simulation time advances. Because it comes after the '0' assignment and also has a "delta" transaction delay, it will override the earlier assignment. So this process will cause the signal pulse, fittingly, to go high for one cycle each time the counter wraps to zero and then drop the next - it's a pulse, after all! :^)
I provide that example only to illustrate the potential benefit of multiple assignments within a process, as you mention that in your question. I do not advise trying anything fancy with assignments until you're clear on the difference between variable assignment and signal assignment - and how that needs to be reflected in your code!
Try to think of things in terms of simulation time and hardware when it comes to signals. Everything is static until time moves forward, then you can handle the new values. It's a learning curve, but it'll happen! ;^)

Warning about missing signal in VHDL process sensitivity list

I'm currently designing a simple multiple input SPI master in Quartus. Given it is a serial protocol, I have a serial clock and a signal that stores the current bit index.
One of the processes I have written looks like this:
store_bits : process(bit_clk) is
begin
if rising_edge(bit_clk) and bit_index >= LEADING_BITS and bit_index < LEADING_BITS+DATA_BITS then
data_valid <= '0';
for input in 0 to INPUTS-1 loop
data(input)(bit_index - LEADING_BITS) <= spi_miso(input);
end loop;
if bit_index = LEADING_BITS+DATA_BITS-1 then
data_valid <= '1';
end if;
end if;
end process store_bits;
Now, bit_index is incremented in a separate process on the falling edge of bit_clk. The process above shouldn't be sensitive to bit_index transitions, so I've left it out of the sensitivity list.
Unfortunately, Quartus II throws a warning during analysis:
10492 VHDL Process Statement warning at multi_spi.vhd(68): signal "bit_index" is read inside the Process Statement but isn't in the Process Statement's sensitivity list
Is it correct? Should I add it to the sensitivity list even though the actual process will only do anything when there is also a bit_clk rising edge?

Why use concurrent statements in VHDL?

I am just starting with learning vhdl.
Consider the code here : - http://esd.cs.ucr.edu/labs/tutorial/jkff.vhd
I can't understand what are concurrent statements and why are they needed here?
Will it be correct if we modify Q and Qbar directly in process p without using internal signal 'state'? Also why are J,K not in sensitivity list of process p in the snippet?
Concurrent statements, as you may know, in a pure functional sense (i.e. not considering hardware implementation) do not incur any delay. So when you write
Q <= state;
Functionally, Q exactly follows state without any delay.
I am going to guess that the reason an intermediate signal state was used, instead of directly assigning Q inside the process, is that if you directly assign one of your outputs Q in the process, then you cannot "read" the output to derive your Qbar signal.
That is, you couldn't do this:
Qbar <= not Q;
This is because it is not strictly allowable to read an output signal in VHDL. By using "state" you have an internal signal from which you can derive both Q and Qbar.
An alternative, equivalent implementation to this would be to assign both outputs Q and Qbar in each of the cases in the state machine, and eliminate the intermediate state signal completely. However, this seems a bit more complicated since you will have nearly twice as many lines of code for an equivalent functionality.
To answer your second question: J,K are not in the sensitivity list because the process p is a synchronous process. You are describing a memory element (JK FlipFlop), which by definition only updates its outputs when clock or reset change. Input signals J and K can change and the process will not update its outputs. Every time there is a clock edge, or reset is asserted, the process "wakes up" and evaluates inputs, and determines what the output should be. Even in J,K were included in the sensitivity list, provided your ouputs were only updated on rising_edge(clock), then the overall function would be the same (although your code would be confusing).
There is no reason not to have the Q and Qbar assignments inside the process. You need to be slightly careful though.
Whenever a signal is assigned to, the value does not update until the simulator moves on to the next "delta-cycle". This means that within processes, when you assign to a signal, you are axtually only cheduling and update and if you read the signal you will get the "old" value. In order to have the sort of sequential updates you might expect, you use a variable. So you could model the JKFF like this:
architecture behv of JK_FF is
begin
p : process(clock, reset) is
variable state : std_logic;
variable input : std_logic_vector(1 downto 0);
begin
if (reset = '1') then
state := '0';
elsif (rising_edge(clock)) then
input := J & K;
case (input) is
when "11" =>
state := not state;
when "10" =>
state := '1';
when "01" =>
state := '0';
when others =>
null;
end case;
end if;
Q <= state;
Qbar <= not state;
end process;
end behv;
A synthesis note: the assignments to Q and Qbar occur outside of the if rising_edge(clk) so will be interpreted as just like concurrent drivers.

Why can't I synthesize this VHDL program?

I am new at VHDL, and I am trying to do a Binary to BCD converter, I have serached on Internet and now I am trying to make my own to understand it and VHDL, here is my program:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
use IEEE.NUMERIC_STD.ALL;
-- Uncomment the following library declaration if instantiating
-- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity Binary_to_BCD is
--generic(n: integer := 2);
Port ( data : in unsigned (7 downto 0);
bcdout : out unsigned (11 downto 0));
end Binary_to_BCD;
architecture Behavioral of Binary_to_BCD is
-- Inicio el proceso de conversion
begin
convert : process (data) is
variable i : integer := 0;
variable bin : unsigned (7 downto 0) := data;
variable bcd : unsigned (11 downto 0) := to_unsigned(0, 12);
begin
-- Repito por el numero de bits
for i in 0 to 7 loop
bcd := bcd sll 1; -- Desplazo un lugar a la izquierda el BCD
bcd(0) := bin(7); -- Ingreso el nuevo bit al BCD
bin := bin sll 1; -- Desplazo el bit que saque antes a la izquierda
-- Compruebo cada grupo de 4 bits del BCD, si se pasa de 4 le sumo 3
if(bcd(11 downto 8) > "0101") then
bcd(11 downto 8) := bcd(11 downto 8) + "0011";
end if;
if(bcd(7 downto 4) > "0101") then
bcd(7 downto 4) := bcd(7 downto 4) + "0011";
end if;
if(bcd(3 downto 0) > "0101") then
bcd(3 downto 0) := bcd(3 downto 0) + "0011";
end if;
end loop;
bcdout := bcd;
end process convert;
end Behavioral;
I get this error on line 66 which is bcdout := bcd;:
Signal 'bcdout' bcdout is at left hand side of variable assignment statement.
After reading on the web and books I used unsigned instead of std_logic_vector because I need to rotate bits and arithmetic operations but still it doesn't synthesize.
Tried changing unsigned to integer and := to <= but nothing works. It should be something very stupid but I don't realize. Thank you very much in advance.
The immediate problem is the incorrect use of variable assignment := instead of signal assignment <= for the bcdout signal - exactly as the error message and other answers point out.
However there is an underlying confusion about where you are in a VHDL process, that is not unusual when starting out - as revealed in the comments about functions.
A common approach to this confusion is to point out tht "VHDL is used for hardware design and not programming" that - while useful in some ways - can lead to artificially primitive and painfully low level uses of VHDL that are really holding it back.
Writing VHDL in a "software way" CAN work - and very well - however it does require a wider perspective on software AND hardware engineering than you can pick up through merely learning C.
The above code is probably synthesisable and will probably work - but it will almost certainly NOT do what you think it does. However a few small changes are in order rather than a completely different approach.
A couple of pointers may help :
the VHDL equivalent of a C function is a VHDL function.
the C equivalent of VHDL procedure is a void function.
(yes, C has procedures : it just calls them void functions to be contrary! :-)
the C equivalent of a VHDL process is ... a process. In other words, an entire C program as long as it doesn't use pthreads or fork/join.
And now you can see that VHDL is designed for parallel computation in a vastly more streamlined way than any dialect of C - processes are just building blocks, and signals are reliable forms of message passing or shared storage between processes.
So, within a process, you can (to a certain extent) think in software terms - but it is a HUGE mistake to think about "calling" a process as if it were a function.
Apologies if you've seen this Q&A before but it will help understand the semantics of a VHDL process, and the use of signals between processes.
Now, as to the specific problems with your code:
1) It is asynchronous, i.e. unclocked. That means, guaranteeing how it responds to glitches on the input is ... difficult ... and knowing when the result is valid is harder than you need. Like uncontrolled use of global variables in C - not best practice!
So move to a clocked process for a safer, more analyzable design. This is also a step towards increasing its speed later. But for now, think of a VHDL clocked process as an event loop or perhaps an interrupt handler in C. It wakes up when told to, executes in (effectively) zero time, and sleep()s until next time.
convert : process (clk) is
variable bin : unsigned (7 downto 0);
...
begin
if rising_edge(clk) then
bin := data;
for i in 0 to 7 loop
...
end loop;
end if;
bcdout <= bcd;
end process convert;
2) the loops will be unrolled and generate a lot of hardware. This may not be a problem : it will deliver a result reasonably quickly (unlike the software equivalent!) There are ways to reduce the hardware use (state machines) or increase its speed (pipelining, link above) but they can wait for now...
3) This is actually the biggest problem with your original : your assignment of data to bin is actually a process variable initialisation not an assignment! It is only executed once, at t=0... And this is the most likely cause of any mis-operation you have seen.
The modified clocked example above assigns the latest data value every time the process is woken : i.e. every clock cycle, and is thus more likely to do what you want.
4) Minor niggle : your declaration of "i" is redundant and actually hidden by a new implicit "i" created by the loop statement. This implicit declaration is both safer and better than an explicit one because it takes its type explicitly from the loop bounds. Imagine what might happen with for(int i; i<= 100000; i++) when int is a 16-bit type...
Huh, strange. Have you tried making bcd a signal instead of a variable?
However, I think your main problem here is that you are trying to write VHDL in a "software" way, using a for loop and sequential logic. That is generally not the way you should write hardware descriptions. You should either use combinational logic, which involves concurrent assignment, or sequential logic, which involves doing things on the rising edge of the clock. It seems that what you are trying to implement is a combinational circuit. In that case, you should write separate concurrent assignments for each of your decimal digits. Take a look at http://www.csee.umbc.edu/portal/help/VHDL/concurrent.html for some examples of concurrent signal assignments. You will probably want to use either selected or conditional signal assignment.
bcdout is a signal, and you are using the variable assignment operator := with it
replace line
bcdout := bcd;
with
bcdout <= bcd;
I've not tried to compile to see if there are any other problems, but that should answer your question.

How to represent sequential algorithm in VHDL

I'm coming from software land, and trying to find out how to code sequential algorithm in VHDL. From the text book, it says that the statements inside a process are executed sequentially. But I realized it's only true when it comes to variable, rather than signals. Re signals inside a process,, they get updated at the end of process, and the evaluation is using right operand's previous value. So for my understanding, it's still concurrent. For performance purpose, I cannot always use variables for complex computation.
But how to use signals to present sequential algorithm? My initial
thoughts are using FSM. Is that true? Is FSM the only way to
properly code sequential algorithm in VHDL?
If I'm right that the signals statements within a process is kind of
concurrent, then what's the difference between this and the signal
concurrent assignment in the architecture level? Does the process's
sequential nature only apply to variable assignment?
As you are trying to execute steps of an algorithm in different cycles, you have realised that the "sequential" constructs within a process do not, by themselves, do this - and in fact, variables do not help. A sequential program - unless it uses explicit "wait for some_event" e.g. wait for rising_edge(clk) - will be unrolled and execute in a single clock cycle.
As you have probably discovered using variables, this may be rather a long clock cycle.
There are three main ways of sequentialising execution in VHDL, with different purposes.
Let's try them to implement a linear interpolation between a and b,
a, b, c, x : unsigned(15 downto 0);
x <= ((a * (65536 - c)) + (b * c)) / 65536;
(1) is the classic state machine; the best form being the single process SM.
Here the computation is broken down into several cycles which ensure that at most one multiply is in progress at a time (multipliers are expensive!) but C1 is computed in parallel (addition/subtraction is cheap!). It could safely be re-written with variables instead of signals for the intermediate results.
type state_type is (idle, step_1, step_2, done);
signal state : state_type := idle;
signal start : boolean := false;
signal c1 : unsigned(16 downto 0); -- range includes 65536!
signal p0, p1, s : unsigned(31 downto 0);
process(clk) is
begin
if rising_edge(clk) then
case state is
when idle => if start then
p1 <= b * c;
c1 <= 65536 - c;
state <= step_1;
end if;
when step_1 => P0 <= a * c1;
state <= step_2;
when step_2 => s <= p0 + p1;
state <= done;
when done => x <= s(31 downto 16);
if not start then -- avoid retriggering
state <= idle;
end if;
end case;
end if;
end process;
(2) is the "implicit state machine" linked by Martin Thompson (excellent article!) ... edited to add link as Martin's answer disappeared.
Same remarks apply to it as for the explicit state machine.
process(clk) is
begin
if start then
p1 <= b * c;
c1 <= 65536 - c;
wait for rising_edge(clk);
p0 <= a * c1;
wait for rising_edge(clk);
s <= p0 + p1;
wait for rising_edge(clk);
x <= s(31 downto 16);
while start loop
wait for rising_edge(clk);
end loop;
end if;
end process;
(3) is a pipelined processor. Here, execution takes several cycles, yet everything happens in parallel! The depth of the pipeline (in cycles) allows each logically sequential step to happen in sequential manner. This allows high performance as long chains of computations are broken into cycle-sized steps...
signal start : boolean := false;
signal c1 : unsigned(16 downto 0); -- range includes 65536!
signal pa, pb, pb2, s : unsigned(31 downto 0);
signal a1 : unsigned(15 downto 0);
process(clk) is
begin
if rising_edge(clk) then
-- first cycle
pb <= b * c;
c1 <= 65536 - c;
a1 <= a; -- save copy of a for next cycle
-- second cycle
pa <= a1 * c1; -- NB this is the LAST cycle copy of c1 not the new one!
pb2 <= pb; -- save copy of product b
-- third cycle
s <= pa + pb2;
-- fourth cycle
x <= s(31 downto 16);
end if;
end process;
Here, resources are NOT shared; it will use 2 multipliers since there are
2 multiplies in each clock cycle. It will also use a lot more registers for
the intermediate results and copies. However, given new values for a,b,c in every cycle it will spit out a new result every cycle - four cycles delayed from the inputs.
Most multi-cycle algorithms can be implemented either by using an FSM as you suggest, or by using pipelined logic. Pipelined logic is probably the better choice if the algorithm consists of strictly sequential steps (i.e., no loops), an FSM would typically only be used for more complex algorithms that require different control flows depending on the input.
Pipelined logic is effectively a very long chain of combinatorial logic split into multiple "stages" using registers, with data flowing from one stage to the next. The registers are added to reduce the delay of each stage (between two registers), allowing higher clock frequencies at the cost of increased latency. Note however that higher latency does not mean lower throughput, since new data can begin processing before the previous data item has completed! This is generally not possible with an FSM.
The biggest difference between signal assignment within a process as opposed to the architecture is that you may assign a value to a signal in multiple places within the process, with the last assignment "winning". At the architecture level, only a single assignment statement to a signal is possible. Many control flow statements (if, case/when, etc.) are also only available within a process, not at the architecture level.

Resources