VHDL Loops - Only last increment is done - vhdl

I have a problem with for loops in below code - in the simulation it shows like the only last increment of the loop is done, for example:
On the inputs I give (obviously in 8-bit SIGNED for the w0, w1, w2):
x1 = 1; x2 = 1; w0 = -32; w1 = 63; w2 = 63
and on the output I recieve u = 31 instead of u = 94.
So it seems the equation is:
u = (x2 * w2) - w0
Instead of:
u = (x1 * w1) + (x2 * w2) - w0
I know that the loops in VHDL works differently than in C, but the usage of variables should do the trick. Unfortunately, I'm missing something. What it might be?
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;
ENTITY NeuronBehavioral IS
GENERIC ( n: INTEGER := 1;
m: INTEGER := 2;
b: INTEGER := 8);
PORT ( x1 : in STD_LOGIC;
x2 : in STD_LOGIC;
w0 : in SIGNED (b-1 downto 0); --11100000 (-32)
w1 : in SIGNED (b-1 downto 0); --00111111 (63)
w2 : in SIGNED (b-1 downto 0); --00111111 (63)
u : out STD_LOGIC_VECTOR (b-1 downto 0));
END NeuronBehavioral;
ARCHITECTURE Behavioral OF NeuronBehavioral IS
TYPE weights IS ARRAY (1 TO n*m) OF SIGNED(b-1 DOWNTO 0);
TYPE inputs IS ARRAY (1 TO m) OF SIGNED(b-1 DOWNTO 0);
TYPE outputs IS ARRAY (1 TO n) OF SIGNED(b-1 DOWNTO 0);
BEGIN
PROCESS (w0, w1, w2, x1, x2)
VARIABLE weight: weights;
VARIABLE input: inputs;
VARIABLE output: outputs;
VARIABLE prod, acc: SIGNED(b-1 DOWNTO 0);
BEGIN
input(1) := "0000000" & x1;
input(2) := "0000000" & x2;
weight(1) := w1;
weight(2) := w2;
L1: FOR i IN 1 TO n LOOP
acc := (OTHERS => '0');
L2: FOR j IN 1 TO m LOOP
prod := input(j)*weight(m*(i-1)+j);
acc := acc + prod;
END LOOP L2;
output(i) := acc + w0;
END LOOP L1;
u <= STD_LOGIC_VECTOR(output(1));
END PROCESS;
END Behavioral;
Testbench:
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;
ENTITY NeuronTB IS
END NeuronTB;
ARCHITECTURE behavior OF NeuronTB IS
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT NeuronBehavioral
PORT(
x1 : IN std_logic;
x2 : IN std_logic;
w0 : IN SIGNED(7 downto 0);
w1 : IN SIGNED(7 downto 0);
w2 : IN SIGNED(7 downto 0);
u : OUT std_logic_vector(7 downto 0)
);
END COMPONENT;
--Inputs
signal x1 : std_logic := '0';
signal x2 : std_logic := '0';
signal w0 : SIGNED(7 downto 0) := (others => '0');
signal w1 : SIGNED(7 downto 0) := (others => '0');
signal w2 : SIGNED(7 downto 0) := (others => '0');
--Outputs
signal u : std_logic_vector(7 downto 0);
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: NeuronBehavioral PORT MAP (
x1 => x1,
x2 => x2,
w0 => w0,
w1 => w1,
w2 => w2,
u => u
);
-- Stimulus process
stim_proc: process
begin
-- hold reset state for 100 ns.
wait for 100 ns;
x1 <= '1';
x2 <= '1';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait for 100 ns;
x1 <= '1';
x2 <= '0';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait for 100 ns;
x1 <= '0';
x2 <= '1';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait for 100 ns;
x1 <= '0';
x2 <= '0';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait;
end process;
END;

The question did not originally provide a Minimal, Complete and Verifiable example, lacking the means to replicate the error much less the expected result. That's not the total barrier to the actual problems.
There's an out of bounds error for
prod := input(j) * weight( m * (i - 1) + j);
The right hand expression of type signed will have a length of the sum of the lengths of multiplicand (input(j)) and multiplier (weight( m * (i - 1) + j)).
Detecting the eventual effective value produced by evaluating the projected output waveform from the right hand expression in an assignment statement has a matching element for each element of the target is required by the standard (see IEEE Std 1076-2008 14.7.3.4 Signal update, -1993 thru -2002 12.6.2 Propagation of signal values).
(When tools allow suspension of performing this check by command line flag or configuration it's with the expectation that it would have been done at some point and that there's a performance increase in eliminating it.)
With regards to no needing an MCVe some simulators allow running a model with top level ports. This problem can identified by providing default values for all inputs. Depending on VHDL revision a report statement with to_string(output(1) can show the original cited answer.
port (
x1: in std_logic := '1'; -- default added
x2: in std_logic := '1'; -- default added
w0: in signed (b-1 downto 0) := to_signed(-32,b); --11100000 (-32) -- default added
w1: in signed (b-1 downto 0) := to_signed(63, b); --00111111 (63)
w2: in signed (b-1 downto 0) := to_signed(63, b); --00111111 (63)
u: out std_logic_vector (b-1 downto 0)
);
When run with ghdl the design specification produced a bounds failure in loop L2.
In the unlabeled process changing the declaration of prod:
variable prod: signed(b * 2 - 1 downto 0);
And the assignment to acc:
acc := acc + prod (b - 1 downto 0);
Allowed the calculation to complete, producing
neuronbehavioral.vhdl:58:9:#0ms:(report note): u = 01011110
With an added last statement to the process:
report "u = " & to_string (output(1));
For non VHDL-2008 compliant simulators a to_string function can be added to the declarative region of the process statement:
function to_string (inp: signed) return string is
variable image_str: string (1 to inp'length);
alias input_str: signed (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'value(std_ulogic'image(input_str(i)));
end loop;
return image_str;
end function;
Note the report value is the 8 bit signed value for 94.
Also the declarations for prod, acc and u should be examined to insure the design is capable of producing a result within the bounds of input values for w0, w1 and w2.
Not only is VHDL strongly typed, it's particular about mathematical meaning. It's an error if a result is incorrect, hence the product of the "*" operator has a length sufficient to produce a valid mathematical result. This can be seen in the numeric_std package body.
With the above patches to the design specification the testbench produces:
ghdl -r neurontb
neuronbehavioral.vhdl:58:9:#0ms:(report note): u = 00000000
neuronbehavioral.vhdl:58:9:#100ns:(report note): u = 01011110
neuronbehavioral.vhdl:58:9:#200ns:(report note): u = 00011111
neuronbehavioral.vhdl:58:9:#300ns:(report note): u = 00011111
neuronbehavioral.vhdl:58:9:#400ns:(report note): u = 11100000
Because input(j) can only be "00000000" or "000000001" based on the inputs x1 and x2) there's an alternative to the above changes:
prod := resize(input(j) * weight( m * (i - 1) + j), b);
The multiplier result can be resized (taking the least significant b length bits). the left most multiply is either by 0 or by 1.
Because the value of input(j) is either zero or one (as an 8 bit signed value) the first multiply can eliminated:
architecture foo of neuronbehavioral is
type weights is array (1 to n*m) of signed(b-1 downto 0);
-- type inputs is array (1 to m) of signed(b-1 downto 0); -- CHANGED
type inputs is array (1 to m) of std_logic;
type outputs is array (1 to n) of signed(b-1 downto 0);
begin
process (w0, w1, w2, x1, x2)
variable weight: weights;
variable input: inputs;
variable output: outputs;
-- variable prod: signed(b * 2 - 1 downto 0); -- RESTORED:
variable prod: signed(b - 1 downto 0);
variable acc: signed(b - 1 downto 0);
function to_string (inp: signed) return string is
variable image_str: string (1 to inp'length);
alias input_str: signed (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'value(std_ulogic'image(input_str(i)));
end loop;
return image_str;
end function;
begin
-- input(1) := "0000000" & x1; -- CHANGED
-- input(2) := "0000000" & x2; -- CHANGED
input := x1 & x2; -- ADDED
weight(1) := w1;
weight(2) := w2;
l1:
for i in 1 to n loop
acc := (others => '0');
l2:
for j in 1 to m loop
if input(j) = '1' then -- ADDED
-- prod := input(j) * weight( m * (i - 1) + j); -- CHANGED
prod := weight(m * (i - 1) + j); -- ADDED
else -- ADDED
prod := (others => '0'); -- ADDED
end if; -- ADDED
-- acc := acc + prod (b - 1 downto 0); -- RESTORED:
acc := acc + prod;
end loop l2;
output(i) := acc + w0;
end loop l1;
u <= std_logic_vector(output(1));
report "u = " & to_string (output(1));
end process;
end architecture foo;
For the second multiplier calculating the index for weight observe that all the variables are either generic constants or declared implicitly in loop statements. While the latter are dynamically elaborated at execution time in VHDL their value is considered static during traversal of the sequential statements in the each loop statement.
The sequence of statements in a loop statement are unrolled in synthesis. The equivalent in concurrent statements would be through the use of for generate statement replicating the various statements as concurrent statements. Note this would require signals (shared variables are not portable nor guaranteed to be supported for disparate vendor tool chains).
A concurrent statement version would look something like:
architecture foo of neuronbehavioral is
type weights is array (1 to n*m) of signed(b - 1 downto 0);
type inputs is array (1 to m) of std_logic;
type outputs is array (1 to n) of signed(b - 1 downto 0);
signal weight: weights;
signal input: inputs;
signal output: outputs;
function to_string (inp: signed) return string is
variable image_str: string (1 to inp'length);
alias input_str: signed (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'value(std_ulogic'image(input_str(i)));
end loop;
return image_str;
end function;
begin
weight <= w1 & w2;
input <= x1 & x2;
l1:
for i in 1 to n generate
type accums is array (1 to m) of signed (b - 1 downto 0);
signal accum: accums;
function acc (inp: accums) return signed is
variable retval: signed (b - 1 downto 0) := (others => '0');
begin
for i in accums'range loop
retval := retval + inp(i);
end loop;
return retval;
end function;
begin
l2:
for j in 1 to m generate
accum(j) <= weight(m * (i - 1) + j) when input(j) = '1' else
(others => '0');
end generate;
output(i) <= acc(accum) + w0;
end generate;
u <= std_logic_vector(output(1));
MONITOR:
process
begin
wait on x1, x2, w0, w1, w2;
wait for 0 ns;
wait for 0 ns;
wait for 0 ns;
wait for 0 ns;
report "u = " & to_string (output(1));
end process;
end architecture foo;
Where no multiply is used and all the statically indexed elements are accumulated in two places. The wait for 0 ns; statements in the MONITOR process are to overcome delta delays in 0 delay assignment through successive signals. (Somewhere there's something doing discrete events, for x1 and x2 if for no other purpose.)
This gives the same answer as above:
ghdl -r neurontb
neuronbehavioral.vhdl:169:9:#100ns:(report note): u = 01011110
neuronbehavioral.vhdl:169:9:#200ns:(report note): u = 00011111
neuronbehavioral.vhdl:169:9:#300ns:(report note): u = 00011111
neuronbehavioral.vhdl:169:9:#400ns:(report note): u = 11100000
and represents the same hardware.

Related

Assign 2d std_logic_vector with another 1d std_logic_vector in VHDL

I have this port
PORT (
A : IN STD_LOGIC_VECTOR(31 downto 0);
B : IN STD_LOGIC_VECTOR(31 downto 0);
C : IN STD_LOGIC_VECTOR(31 downto 0);
F : OUT STD_LOGIC_VECTOR(31 downto 0);
);
and this signal
SIGNAL data : std_logic_2d(31 downto 0, 2 downto 0);
I need to assign data(all,0) with A and data(all,1) with B and so on
like that
data(?,0) <= A;
data(?,1) <= B;
data(?,2) <= C;
what can I put instead of "?" in the code to perform it??
Solution 1 - process and for-loop:
A process is used to host a sequential for loop. You need to add all read signals to the sensitivity list: A, B, C.
process(A, B, C)
begin
for i in A'range loop
data(i, 0) <= A(i);
data(i, 1) <= B(i);
data(i, 2) <= C(i);
end loop;
end process;
Solution 2 - for-generate:
A generate loop is used to create lots of concurrent assignments.
gen: for i in A'range generate
data(i, 0) <= A(i);
data(i, 1) <= B(i);
data(i, 2) <= C(i);
end generate;
Solution 3 - a assignment procesdure:
A procedure is used to encapsulate the assignment to rows.
procedure assign_col(signal slm : out T_SLM; slv : std_logic_vector; constant ColIndex : natural) is
variable temp : std_logic_vector(slm'range(1));
begin
temp := slv;
for i in temp'range loop
slm(i, ColIndex) <= temp(i);
end loop;
end procedure;
Source: PoC.vectors.assign_col
Usage:
assign_col(data, A, 0);
assign_col(data, B, 1);
assign_col(data, C, 2);
The package PoC.vectors contains a lot of new types, functions and procedures to handle true std_logic based 2D arrays in VHDL.

Generating sine using cordic algorithm

I want to apologize for this + - popular question, but nowhere did I find a specific implementation on vhdl. I write the algorithm from scratch and I have a problem with math implementation. The output is invalid. nothing counts, but just shows 1 value. If someone knows what i need to do, how to fix it, would be very grateful for any help.
Math part
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
--use IEEE.NUMERIC_STD.ALL;
-- Uncomment the following library declaration if instantiating
-- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity massive is
port (
clk : in std_logic;
reset : in std_logic;
sinus : out std_logic_vector (15 downto 0));
end massive;
architecture Behavioral of massive is
type my_type is array (0 to 16) of signed (15 downto 0);
signal x : my_type;
signal y : my_type;
signal z : my_type;
signal j : my_type := ("1010111111001000", "0110011111000101", "0011011011010100", "0001101111010101", "0000110111111000",
"0000011011111110", "0000001101111111", "0000000111000000", "0000000011100000", "0000000001110000",
"0000000000111000", "0000000000011100", "0000000000001110", "0000000000000111", "0000000000000100",
"0000000000000010", "0000000000000001");
begin
process(clk)
begin
x(0) <= "0000010100000110";
y(0) <= "0000000000000000";
z(0) <= "0000000000000000";
if rising_edge(clk) then
if reset <= '1' then
For n in 0 to 15 loop
if (z(n) >= 0) then
x(n+1) <= x(n) - (y(n)/2**n);
y(n+1) <= y(n) + (x(n)/2**n);
z(n+1) <= z(n) + j(n);
else
x(n+1) <= x(n) +(y(n)/2**n);
y(n+1) <= y(n) -(x(n)/2**n);
z(n+1) <= z(n) - j(n);
end if;
end loop;
sinus <= std_logic_vector(y(16));
end if;
end if;
end process;
end Behavioral;
Rotation part
entity step_control is
generic (
first : integer := 0;
second : integer := 1;
third : integer := 2;
fourth : integer := 3;
);
Port ( clk : in STD_LOGIC;
Angle : out STD_LOGIC_VECTOR (12 downto 0);
quarter_in : out STD_LOGIC_VECTOR (1 downto 0));
end step_control;
architecture Behavioral of step_control is
signal Ang : std_logic_vector (12 downto 0) := (others => '0');
signal state : unsigned (1 downto 0) := to_unsigned(first,2);
signal count_ang : std_logic_vector (11 downto 0) := (others => '0');
begin
process (clk)
begin
if (rising_edge(clk)) then
case(state) is
when to_unsigned(first,2) => if (count_ang >= 3999) then --00
state <= to_unsigned(second,2);
count_ang <= "000000010000";
quarter_in <= "01";
Ang <= Ang - 16;
else
state <= to_unsigned(first,2);
quarter_in <= "00";
Ang <= Ang + 16;
count_ang <= count_ang + 16;
end if;
when to_unsigned(second,2) => if (count_ang >= 3999) then --01
state <= to_unsigned(third,2);
count_ang <= "000000010000";
quarter_in <= "10";
Ang <= Ang + 16;
else
state <= to_unsigned(second,2);
quarter_in <= "01";
Ang <= Ang - 16;
count_ang <= count_ang + 16;
end if;
when to_unsigned(third,2) => if (count_ang >= 3999) then
state <= to_unsigned(fourth,2);
count_ang <= "000000010000";
quarter_in <= "11";
Ang <= Ang - 16;
else
state <= to_unsigned(third,2);
quarter_in <= "10";
Ang <= Ang + 16;
count_ang <= count_ang + 16;
end if;
when to_unsigned(fourth,2) => if (count_ang >= 3999) then
state <= to_unsigned(first,2);
count_ang <= "000000010000";
quarter_in <= "00";
Ang <= Ang + 16;
else
state <= to_unsigned(fourth,2);
quarter_in <= "11";
Ang <= Ang - 16;
count_ang <= count_ang + 16;
end if;
when others => count_ang <= (others => '0');
end case;
end if;
end process;
Angle <= Ang;
end Behavioral;
And testbench (but I do not know, I'm kind of all asking in the module. and my "empty" tesbench is obtained)
ENTITY testmass IS
END testmass;
ARCHITECTURE behavior OF testmass IS
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT massive
PORT(
clk : IN std_logic;
reset : IN std_logic;
sinus : OUT std_logic_vector(15 downto 0)
);
END COMPONENT;
--Inputs
signal clk : std_logic := '0';
signal reset : std_logic := '0';
--Outputs
signal sinus : std_logic_vector(15 downto 0);
-- Clock period definitions
constant clk_period : time := 10 ns;
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: massive PORT MAP (
clk => clk,
reset => reset,
sinus => sinus
);
-- Clock process definitions
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
-- Stimulus process
stim_proc: process
begin
-- hold reset state for 100 ns.
wait for 100 ns;
wait for clk_period*10;
-- insert stimulus here
wait;
end process;
END;
Your question isn't a Minimal, Complete and Verifiable example, it's not verifiable:
Describe the problem. "It doesn't work" is not a problem statement. Tell us what the expected behavior should be. Tell us what the exact wording of the error message is, and which line of code is producing it. Put a brief summary of the problem in the title of your question.
The output is invalid. nothing counts, but just shows 1 value.
What's the one value? When someone attempts to duplicate your problem one thing we see is assertion warnings for each evaluation of z(n) in the process in entity massive:
if (z(n) >= 0) then
The issue is subtle as basic to VHDL signals.
You assign values to signals in the process and expect them to be immediately available. That does not occur. No signal is updated while any process has yet to have been resumed and subsequently suspended in the current simulation cycle.
For each simulation time in the projected output waveform (a queue) there is only one entry. Subsequent assignments (which don't occur here) would result in only the last value being queued.
More important is that the future value isn't available in the current simulation cycle.
x, y and z can be variables declared in the process instead:
architecture foo of massive is
-- not predefined before -2008:
function to_string (inp: signed) return string is
variable image_str: string (1 to inp'length);
alias input_str: signed (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'VALUE(std_ulogic'IMAGE(input_str(i)));
end loop;
return image_str;
end function;
begin
process (clk)
type my_type is array (0 to 16) of signed (15 downto 0);
variable x: my_type;
variable y: my_type;
variable z: my_type;
constant j: my_type := ("1010111111001000", "0110011111000101",
"0011011011010100", "0001101111010101",
"0000110111111000", "0000011011111110",
"0000001101111111", "0000000111000000",
"0000000011100000", "0000000001110000",
"0000000000111000", "0000000000011100",
"0000000000001110", "0000000000000111",
"0000000000000100", "0000000000000010",
"0000000000000001");
begin
x(0) := "0000010100000110";
y(0) := "0000000000000000";
z(0) := "0000000000000000";
if rising_edge(clk) then
if reset = '0' then -- reset not driven condition was <=
report "init values:" & LF & HT &
"x(0) = " & to_string(x(0)) & LF & HT &
"y(0) = " & to_string(y(0)) & LF & HT &
"z(0) = " & to_string(z(0));
for n in 0 to 15 loop
if z(n) >= 0 then
x(n + 1) := x(n) - y(n) / 2 ** n;
y(n + 1) := y(n) + x(n) / 2 ** n;
z(n + 1) := z(n) + j(n);
else
x(n + 1) := x(n) + y(n) / 2 ** n;
y(n + 1) := y(n) - x(n) / 2 ** n;
z(n + 1) := z(n) - j(n);
end if;
report "n = " & integer'image(n) & LF & HT &
"x(" & integer'image(n + 1) & ") = " &
to_string(x(n + 1)) & LF & HT &
"y(" & integer'image(n + 1) & ") = " &
to_string(y(n + 1)) & LF & HT &
"z(" & integer'image(n + 1) & ") = " &
to_string(z(n + 1));
end loop;
sinus <= std_logic_vector(y(16));
report "sinus = " & to_string(y(16));
end if;
end if;
end process;
end architecture foo;
The report statements are added to allow the values to be output to the simulation console. Without the passage of time between successive assignments to variables values of variables in waveform have no useful meaning. There are simulators that won't report variables in waveform dumps.
And the above architecture produces:
ghdl -a testmass.vhdl
ghdl -e testmass
ghdl -r testmass
testmass.vhdl:86:17:#5ns:(report note): init values:
x(0) = 0000010100000110
y(0) = 0000000000000000
z(0) = 0000000000000000
testmass.vhdl:100:21:#5ns:(report note): n = 0
x(1) = 0000010100000110
y(1) = 0000010100000110
z(1) = 1010111111001000
testmass.vhdl:100:21:#5ns:(report note): n = 1
x(2) = 0000011110001001
y(2) = 0000001010000011
z(2) = 0100100000000011
testmass.vhdl:100:21:#5ns:(report note): n = 2
x(3) = 0000011011101001
y(3) = 0000010001100101
z(3) = 0111111011010111
testmass.vhdl:100:21:#5ns:(report note): n = 3
x(4) = 0000011001011101
y(4) = 0000010101000010
z(4) = 1001101010101100
testmass.vhdl:100:21:#5ns:(report note): n = 4
x(5) = 0000011010110001
y(5) = 0000010011011101
z(5) = 1000110010110100
testmass.vhdl:100:21:#5ns:(report note): n = 5
x(6) = 0000011011010111
y(6) = 0000010010101000
z(6) = 1000010110110110
testmass.vhdl:100:21:#5ns:(report note): n = 6
x(7) = 0000011011101001
y(7) = 0000010010001101
z(7) = 1000001000110111
testmass.vhdl:100:21:#5ns:(report note): n = 7
x(8) = 0000011011110010
y(8) = 0000010010000000
z(8) = 1000000001110111
testmass.vhdl:100:21:#5ns:(report note): n = 8
x(9) = 0000011011110110
y(9) = 0000010001111010
z(9) = 0111111110010111
testmass.vhdl:100:21:#5ns:(report note): n = 9
x(10) = 0000011011110100
y(10) = 0000010001111101
z(10) = 1000000000000111
testmass.vhdl:100:21:#5ns:(report note): n = 10
x(11) = 0000011011110101
y(11) = 0000010001111100
z(11) = 0111111111001111
testmass.vhdl:100:21:#5ns:(report note): n = 11
x(12) = 0000011011110101
y(12) = 0000010001111100
z(12) = 0111111111101011
testmass.vhdl:100:21:#5ns:(report note): n = 12
x(13) = 0000011011110101
y(13) = 0000010001111100
z(13) = 0111111111111001
testmass.vhdl:100:21:#5ns:(report note): n = 13
x(14) = 0000011011110101
y(14) = 0000010001111100
z(14) = 1000000000000000
testmass.vhdl:100:21:#5ns:(report note): n = 14
x(15) = 0000011011110101
y(15) = 0000010001111100
z(15) = 0111111111111100
testmass.vhdl:100:21:#5ns:(report note): n = 15
x(16) = 0000011011110101
y(16) = 0000010001111100
z(16) = 0111111111111110
testmass.vhdl:109:17:#5ns:(report note): sinus = 0000010001111100
Where we see the values of your array elements are changing instead of propagating 'X's through addition (or subtraction when z(n) < 0), assigned in the previous loop iteration.
Also note the reset doesn't change value in the testbench and there is an erroneous evaluation for it's value using the relational operator "<=" in the original massive process.
j is not assigned other than an initial value and is shown as a constant in the above architecture.
I'm personally somewhat skeptical you can perform these 16 chained additions or subtractions along with selecting which operation in one 10 ns clock.

Vivado synthesis: complex assignment not supported

I implemented a Booth modified multiplier in vhdl. I need to make a synthesis with Vivado but it's not possible because of this error:
"complex assignment not supported".
This is the shifter code that causes the error:
entity shift_register is
generic (
N : integer := 6;
M : integer := 6
);
port (
en_s : in std_logic;
cod_result : in std_logic_vector (N+M-1 downto 0);
position : in integer;
shift_result : out std_logic_vector(N+M-1 downto 0)
);
end shift_register;
architecture shift_arch of shift_register is
begin
process(en_s)
variable shift_aux : std_logic_vector(N+M-1 downto 0);
variable i : integer := 0; --solo per comoditÃ
begin
if(en_s'event and en_s ='1') then
i := position;
shift_aux := (others => '0');
shift_aux(N+M-1 downto i) := cod_result(N+M-1-i downto 0); --ERROR!!
shift_result <= shift_aux ;
end if;
end process;
end shift_arch;
the booth multiplier works with any operator dimension. So I can not change this generic code with a specific one.
Please help me! Thanks a lot
There's a way to make your index addressing static for synthesis.
First, based on the loop we can tell position must have a value within the range of shift_aux, otherwise you'd end up with null slices (IEEE Std 1076-2008 8.5 Slice names).
That can be shown in the entity declaration:
library ieee;
use ieee.std_logic_1164.all;
entity shift_register is
generic (
N: integer := 6;
M: integer := 6
);
port (
en_s: in std_logic;
cod_result: in std_logic_vector (N + M - 1 downto 0);
position: in integer range 0 to N + M - 1 ; -- range ADDED
shift_result: out std_logic_vector(N + M - 1 downto 0)
);
end entity shift_register;
What's changed is the addition of a range constraint to the port declaration of position. The idea is to support simulation where the default value of can be integer is integer'left. Simulating your shift_register would fail on the rising edge of en_s if position (the actual driver) did not provide an initial value in the index range of shift_aux.
From a synthesis perspective an unbounded integer requires you take both positive and negative integer values in to account. Your for loop is only using positive integer values.
The same can be done in the declaration of the variable i in the process:
variable i: integer range 0 to N + M - 1 := 0; -- range ADDED
To address the immediate synthesis problem we look at the for loop.
Xilinx support issue AR# 52302 tells us the issue is using dynamic values for indexes.
The solution is to modify what the for loop does:
architecture shift_loop of shift_register is
begin
process (en_s)
variable shift_aux: std_logic_vector(N + M - 1 downto 0);
-- variable i: integer range 0 to N + M - 1 := 0; -- range ADDED
begin
if en_s'event and en_s = '1' then
-- i := position;
shift_aux := (others => '0');
for i in 0 to N + M - 1 loop
-- shift_aux(N + M - 1 downto i) := cod_result(N + M - 1 - i downto 0);
if i = position then
shift_aux(N + M - 1 downto i)
:= cod_result(N + M - 1 - i downto 0);
end if;
end loop;
shift_result <= shift_aux;
end if;
end process;
end architecture shift_loop;
If i becomes a static value when the loop is unrolled in synthesis it can be used in calculation of indexes.
Note this gives us an N + M input multiplexer where each input is selected when i = position.
This construct can actually be collapsed into a barrel shifter by optimization, although you might expect the number of variables involved for large values of N and M might take a prohibitive synthesis effort or simply fail.
When synthesis is successful you'll collapse each output element in the assignment into a separate multiplexer that will match Patrick's
barrel shifter.
For sufficiently large values of N and M we can defined the depth in number of multiplexer layers in the barrel shifter based on the number of bits in a binary expression of the integer range of distance.
That either requires a declared integer type or subtype for position or finding the log2 value of N + M. We can use the log2 value because it would only be used statically. (XST supports log2(x) where x is a Real for determining static values, the function is found in IEEE package math_real). This gives us the binary length of position. (How many bits are required to to describe the shift distance, the number of levels of multiplexers).
architecture barrel_shifter of shift_register is
begin
process (en_s)
use ieee.math_real.all; -- log2 [real return real]
use ieee.numeric_std.all; -- to_unsigned, unsigned
constant DISTLEN: natural := integer(log2(real(N + M))); -- binary lengh
type muxv is array (0 to DISTLEN - 1) of
unsigned (N + M - 1 downto 0);
variable shft_aux: muxv;
variable distance: unsigned (DISTLEN - 1 downto 0);
begin
if en_s'event and en_s = '1' then
distance := to_unsigned(position, DISTLEN); -- position in binary
shft_aux := (others => (others =>'0'));
for i in 0 to DISTLEN - 1 loop
if i = 0 then
if distance(i) = '1' then
shft_aux(i) := SHIFT_LEFT(unsigned(cod_result), 2 ** i);
else
shft_aux(i) := unsigned(cod_result);
end if;
else
if distance(i) = '1' then
shft_aux(i) := SHIFT_LEFT(shft_aux(i - 1), 2 ** i);
else
shft_aux(i) := shft_aux(i - 1);
end if;
end if;
end loop;
shift_result <= std_logic_vector(shft_aux(DISTLEN - 1));
end if;
end process;
end architecture barrel_shifter;
XST also supports ** if the left operand is 2 and the value of i is treated as a constant in the sequence of statements found in a loop statement.
This could be implemented with signals instead of variables or structurally in a generate statement instead of a loop statement inside a process, or even as a subprogram.
The basic idea here with these two architectures derived from yours is to produce something synthesis eligible.
The advantage of the second architecture over the first is in reduction in the amount of synthesis effort during optimization for larger values of N + M.
Neither of these architectures have been verified lacking a testbench in the original. They both analyze and elaborate.
Writing a simple case testbench:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity shift_register_tb is
end entity;
architecture foo of shift_register_tb is
constant N: integer := 6;
constant M: integer := 6;
signal clk: std_logic := '0';
signal din: std_logic_vector (N + M - 1 downto 0)
:= (0 => '1', others => '0');
signal dout: std_logic_vector (N + M - 1 downto 0);
signal dist: integer := 0;
begin
DUT:
entity work.shift_register
generic map (
N => N,
M => M
)
port map (
en_s => clk,
cod_result => din,
position => dist,
shift_result => dout
);
CLOCK:
process
begin
wait for 10 ns;
clk <= not clk;
if now > (N + M + 2) * 20 ns then
wait;
end if;
end process;
STIMULI:
process
begin
for i in 1 to N + M loop
wait for 20 ns;
dist <= i;
din <= std_logic_vector(SHIFT_LEFT(unsigned(din),1));
end loop;
wait;
end process;
end architecture;
And simulating reveals that the range of position and the number of loop iterations only needs to cover the number of bits in the multiplier and not the multiplicand. We don't need a full barrel shifter.
That can be easily fixed in both shift_register architectures and has the side effect of making the shift_loop architecture much more attractive, it would be easier to synthesize based on the multiplier bit length (presumably M) and not the product bit length (N+ M).
And that would give you:
library ieee;
use ieee.std_logic_1164.all;
entity shift_register is
generic (
N: integer := 6;
M: integer := 6
);
port (
en_s: in std_logic;
cod_result: in std_logic_vector (N + M - 1 downto 0);
position: in integer range 0 to M - 1 ; -- range ADDED
shift_result: out std_logic_vector(N + M - 1 downto 0)
);
end entity shift_register;
architecture shift_loop of shift_register is
begin
process (en_s)
variable shift_aux: std_logic_vector(N + M - 1 downto 0);
-- variable i: integer range 0 to M - 1 := 0; -- range ADDED
begin
if en_s'event and en_s = '1' then
-- i := position;
shift_aux := (others => '0');
for i in 0 to M - 1 loop
-- shift_aux(N + M - 1 downto i) := cod_result(N + M - 1 - i downto 0);
if i = position then -- This creates an N + M - 1 input MUX
shift_aux(N + M - 1 downto i)
:= cod_result(N + M - 1 - i downto 0);
end if;
end loop; -- The loop is unrolled in synthesis, i is CONSTANT
shift_result <= shift_aux;
end if;
end process;
end architecture shift_loop;
Modifying the testbench:
STIMULI:
process
begin
for i in 1 to M loop -- WAS N + M loop
wait for 20 ns;
dist <= i;
din <= std_logic_vector(SHIFT_LEFT(unsigned(din),1));
end loop;
wait;
end process;
gives a result showing the shifts are over the range of the multiplier value (specified by M):
So the moral here is you don't need a full barrel shifter, only one that works over the multiplier range and not the product range.
The last bit of code should be synthesis eligible.
You are trying to create a range using a run-time varying value, and this is not supported by the synthesis tool. cod_result(N+M-1 downto 0); would be supported, because N, M, and 1 are all known at synthesis time.
If you're trying to implement a multiplier, you will get the best result using x <= a * b, and letting the synthesis tool choose the best way to implement it. If you have operands wider than the multiplier widths in your device, then you need to look at the documentation to determine the best route, which will normally involve pipelining of some sort.
If you need a run-time variable shift, look for a 'Barrel Shifter'. There are existing answers on these, for example this one.

Output is always zeros (quotient and remainder) in divider code VHDL

Output is always zeros (quotient and remainder) in the code shown below.
Even if I assign value of b to remainder,it is giving 0. I have checked for many times but I am not able to understand what the issue is. While compiling, it is showing 2 warnings:
- Initial value of "b" depends on value of signal "divisor".
What is the problem?
-- DIVIDER
library ieee;
use ieee.numeric_bit.all;
entity unsigned_divider is
port(
-- the two inputs
dividend: in bit_vector(15 downto 0);
divisor : in bit_vector(15 downto 0);
-- the two outputs
quotient : out bit_vector(15 downto 0);
remainder : out bit_vector(15 downto 0)
);
end entity unsigned_divider;
architecture behave of unsigned_divider is
begin
process
variable a : bit_vector(15 downto 0):=dividend;
variable b : bit_vector(15 downto 0):=divisor;
variable p : bit_vector(15 downto 0):= (others => '0');
variable i : integer:=0;
begin
for i in 0 to 15 loop
p(15 downto 1) := p(14 downto 0);
p(0) := a(15);
a(15 downto 1) := a(14 downto 0);
p := bit_vector(unsigned(p) - unsigned(b));
if(p(15) ='1') then
a(0) :='0';
p := bit_vector(unsigned(p) + unsigned(b));
else
a(0) :='1';
end if;
wait for 1 ns;
end loop;
quotient <= a after 1 ns;
remainder <= p after 1 ns;
end process;
end behave;
You should have explicit assignments to the variables a and b inside the process statement part (as sequential signal assignments). The declarations:
variable a : bit_vector(15 downto 0):=dividend;
variable b : bit_vector(15 downto 0):=divisor;
Should be:
variable a : bit_vector(15 downto 0);
variable b : bit_vector(15 downto 0);
And in the process statement part (following the begin in the process):
a := dividend;
b := divisor;
These overcome the issue natipar mentions, that the values are only assigned to a and b during initialization.
Further should you desire to have a 1 ns delay you should have an explicit wait statement as the last sequential statement of the process statement process statement part:
wait on dividend, divisor;
These make your process statement look something like this (with indentation added):
process
variable a : bit_vector(15 downto 0); -- := dividend;
variable b : bit_vector(15 downto 0); -- := divisor;
variable p : bit_vector(15 downto 0) := (others => '0');
variable i : integer := 0;
begin
a := dividend;
b := divisor;
for i in 0 to 15 loop
p(15 downto 1) := p(14 downto 0);
p(0) := a(15);
a(15 downto 1) := a(14 downto 0);
p := bit_vector(unsigned(p) - unsigned(b));
if p(15) = '1' then
a(0) :='0';
p := bit_vector(unsigned(p) + unsigned(b));
else
a(0) := '1';
end if;
wait for 1 ns;
end loop;
quotient <= a after 1 ns;
remainder <= p after 1 ns;
wait on dividend, divisor;
end process;
(Note the space between the numeric literal and the units, required by IEEE Std 1076-2008, 15.3 Lexical elements, separators and delimiters paragraph 4, the last sentence "At least one separator is required between an identifier or an abstract literal and an adjacent identifier or abstract literal.", despite Modelsim not requiring it).
Writing a simple testbench we find at least one error in your restoring division algorithm:
entity unsigned_divider_tb is
end entity;
architecture foo of unsigned_divider_tb is
signal dividend, divisor: bit_vector (15 downto 0) := (others => '0');
signal quotient, remainder: bit_vector (15 downto 0);
function to_string(inp: bit_vector) return string is
variable image_str: string (1 to inp'length);
alias input_str: bit_vector (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'VALUE(BIT'IMAGE(input_str(i)));
end loop;
return image_str;
end;
begin
DUT:
entity work.unsigned_divider
port map (
dividend,
divisor,
quotient,
remainder
);
MONITOR:
process (quotient, remainder)
begin
report "quotient = " & to_string (quotient) severity NOTE;
report "remainder = " & to_string (remainder) severity NOTE;
end process;
end architecture;
ghdl -a unsigned_divider.vhdl
ghdl -e unsigned_divider_tb
ghdl -r unsigned_divider_tb
unsigned_divider.vhdl:83:9:#0ms:(report note): quotient = 0000000000000000
unsigned_divider.vhdl:84:9:#0ms:(report note): remainder = 0000000000000000
unsigned_divider.vhdl:83:9:#17ns:(report note): quotient = 1111111111111111
unsigned_divider.vhdl:84:9:#17ns:(report note): remainder = 0000000000000000
(And a note on interpretation, the transactions reported at time 0 ms are the default assignments performed as a result of elaboration).
Your algorithm gives a wrong answer for division by 0.
Adding a stimulus process to the testbench:
STIMULUS:
process
begin
wait for 20 ns;
dividend <= x"ffff";
divisor <= x"000f";
end process;
Shows it can get the right answer too:
unsigned_divider.vhdl:83:9:#37ns:(report note): quotient = 0001000100010001
unsigned_divider.vhdl:84:9:#37ns:(report note): remainder = 0000000000000000
And with the testbench and added wait statements and assignments in the stimulus process you can explore further.
I've always been a fan of non-restoring division myself, because the adds or subtracts take a clock in a clocked divider.
Variable assignments take effect immediately; but the signal, at the moment of the creation of that variable, has no value, so you cannot expect the assignments
variable a : bit_vector(15 downto 0):=dividend;
variable b : bit_vector(15 downto 0):=divisor;
to work correctly. I'm a bit surprised that there are no complaints for the assignment to the variable a though. Perhaps it is your second warning. You should define the variables the way you do, but leave the assignment for later, in the begin segment of your process.
P.S. Also, you might want to change remainder <= p after 1ns; to remainder <= p after 1 ns;.

Subtractor Module VHDL generating wrong values

I have a code as such below that is designed to do subtraction and addition. Basically, when Binv is set, it should subtract, and Binv is 0, it should add. Unfortunately, it seems to be adding when Binv is set sometimes, and subtracting when it isn't set sometimes. Here is a snapshot of the simulation:
entity ADD_SUB is
Port ( A : in STD_LOGIC_VECTOR (31 downto 0);
B : in STD_LOGIC_VECTOR (31 downto 0);
Binv : in STD_LOGIC;
C_in: in STD_LOGIC;
S : out STD_LOGIC_VECTOR (31 downto 0);
TEST : out STD_LOGIC_VECTOR (31 downto 0);
C_out : out STD_LOGIC
);
end ADD_SUB;
architecture ADD_SUB_ARCH of ADD_SUB is
signal S_wider : std_logic_vector(32 downto 0);
begin
process (A,B,C_in,Binv)
begin
if Binv = '0' then
S_wider <= ( A(31) & A) + ( B(31) & B) + C_in;
elsif Binv = '1' then
S_wider <= (A(31)& A) + ('1'& not B) + '1';
else
S_wider <= std_logic_vector(to_signed(0,32));
end if;
S <= S_wider(31 downto 0);
C_out <= S_wider(32);
end process;
I am getting nonsensical results which make no sense. In the first case, you can see that I tried to do (50 - 30) (Binv is 1). I get 80 which is wrong. You can see however that it works on (30 - 50) where I get -20. Second problem is where I try to do (30 + (-50)), however it shows up as 20.
The results are completely off and I can't see where I am going wrong
Jim is absolutely correct.
There are a couple of points that may be worth making.
First, the + C_in or + not C_in implies two 32 bit adds, one of which gets optimized away during synthesis leaving just the carry in to the remaining add.
Second, you are really only manipulating B and C_in using Binv. Subtraction is the equivalent of adding the two's complement, for B the one's complement + X"00000001. Note that Jim inverts C_in with Binv which allows C_in to be used for daisy chain operations (e.g. a 64 bit add or subtract with a 32 bit ALU).
Both points are illustrated with the following code, which also only uses numeric_std.unsigned and and only needs the unsigned numeric_std."+":
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity add_sub is
port (
a: in std_logic_vector (31 downto 0);
b: in std_logic_vector (31 downto 0);
binv: in std_logic;
c_in: in std_logic;
s: out std_logic_vector (31 downto 0);
test: out std_logic_vector (31 downto 0);
c_out: out std_logic
);
end entity;
architecture foo of add_sub is
begin
UNLABELLED:
process (a,b,c_in,binv)
variable x,y,z: std_logic_vector (33 downto 0);
begin
x := a(31) & a & '1'; -- this '1' generates a true carry in to z(1)
-- z(0) is optimized away as unused it's carry
-- retained as carry in to the next MS bit.
if binv = '0' then
y := b(31) & b & c_in;
elsif binv = '1' then
y := not b(31) & not b & not c_in;
else
y := (others => 'X'); -- 'X' on binv is propagated from b onto y
end if;
z := std_logic_vector( unsigned(x) + unsigned(y)); -- only one add
c_out <= z(33);
s <= z(32 downto 1);
end process;
end architecture;
This above example connects C_in a bit more directly to the adder stage with the LS bits of A and B and gives:
(The image is can be clicked to open)
(Synthesis software is generally smart enough to do all this with using Jim's form modified to either add or subtract based on Binv and A and B extended to 33 bits without any direct bit or bitfield manipulation, our synthesis tools have had more than 25 years to get it right.)
The waveform was produced with the following test bench:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity tb_add_sub is
end entity;
architecture foo of tb_add_sub is
signal a: std_logic_vector (31 downto 0) := (others =>'0');
signal b: std_logic_vector (31 downto 0) := (others =>'0');
signal binv: std_logic := '0';
signal c_in: std_logic := '0';
signal s: std_logic_vector (31 downto 0);
signal test: std_logic_vector (31 downto 0);
signal c_out: std_logic;
begin
DUT:
entity work.add_sub
port map (
a => a,
b => b,
binv => binv,
c_in => c_in,
s => s,
test => test,
c_out => c_out
);
STIMULUS:
process
begin
wait for 100 ns;
a <= std_logic_vector(to_signed(50,a'length));
b <= std_logic_vector(to_signed(30,b'length));
wait for 100 ns;
binv <= '1';
wait for 100 ns;
binv <= '0';
a <= std_logic_vector(to_signed(30,a'length));
b <= std_logic_vector(to_signed(-50,b'length));
wait for 100 ns;
binv <= '1';
b <= std_logic_vector(to_signed(50,b'length));
wait for 600 ns;
wait;
end process;
end architecture;
enter code hereYour equation for subtraction is not correct. Like #neodelphi suggested, it should be:
A - B = A + (not B) + 1
However, this does not account for carry in and what to do with it. If I remember right, the borrow is subtracted:
A - B - C_in = A + (not B) + 1 - C_in = A + (not B) + (1 - C_in)
Now note that:
(1 - C_in) = not C_in
Now, to convert it to VHDL. If I overlook the fact that you are doing signed math with the package, std_logic_unsigned (Ahem), you could write (similar to #neodelphi):
S_wider <= (A(31)& A) + (not B(31) & not B) + not C_in ;
Note in the package std_logic_unsigned as well as numeric_std with VHDL-2008, there are no issues with adding with a std_ulogic.
My suggestion about types and packages is very simple. If you are doing math, use a math type like, signed (matching your math here) or unsigned (for other cases). I consider these part of the documentation.
Furthermore, using the appropriate type is important as the math packages allow you to add two array values that are different sizes. If you use the appropriate type, they do the appropriate extension replicate sign bit for signed or '0' fill for unsigned.
Hence, had you used type signed, then you could have used the first argument (A) to size the result and been lazy about B and written:
S_wider <= (A(31)& A) + not B + not C_in ;
BTW, testing for both '0' and '1' does not help the hardware in any way. My recommendation is to either be lazy (and safe) and write:
if Binv = '0' then
S_wider <= ( A(31) & A) + ( B(31) & B) + C_in;
else
S_wider <= (A(31)& A) + (not B(31) & not B) + not C_in;
end if;
Alternately be paranoid and vigilant and make the output is 'X' when the control input is an 'X'. However be sure to double check your "elsif" expression - get this wrong when it is more complex and it may be challenging to find the bug (meaning you better have test cases that cover all possible input values of the controls):
if Binv = '0' then
S_wider <= ( A(31) & A) + ( B(31) & B) + C_in;
elsif Binv = '1' then
S_wider <= (A(31)& A) + (not B(31) & not B) + not C_in;
else
S_wider <= (others => 'X') ; -- X in propagates as X out can help debug
end if;
An AddSub module has only one control input lets call it \bar{add}/sub. This means, if add_sub is zero perform an add operation, if its one perform a subtraction.
There is a solid relation between C_in and Binv. If you want to add Binv and C_in are zero, if you want to subtract both are one.
The equation for an adder is simply S := A + B + 0 for a subtracter it can be retrieved by some transformations:
S := A - B -- transform into an add operation
S := A + (- B) -- transform negative number using 2's complement
S := A + ( 2's complement of B) -- transform 2's complement into 1's complement
S := A + ((1's complement of B) + 1) -- transform 1's complement into bit wise not operation
S := A + ((bit wise not of B) + 1)
If you combine both equations you will get:
S := A + (B xor vector(add_sub)) + add_sub
So in VHDL this would be:
S_wider <= unsigned('0' & A) + unsigned('0' & (B xor (B'range => add_sub))) + unsigned((B'range => '0') & add_sub);
S <= S_wider(S'range);
C_out <= S_wider(S_width'high);
Synthesis is smart enough to find a 3:1 adder with a switchable constant input 3 to be an addsub-macro block. If you want to perform signed add/sub then exchange the conversion functions and sign-extension accordingly.

Resources