How to find minimum of n sfixed array values using VHDL - vhdl

I need to find the minimum of 5 s-fixed values, as I have an array of 5 values val3=[x1 x2 x3 x4 x5]
and I need to fill another array val33=[y1 y2 y3 y4 y5], while y1 is the minimum value of the array val3 except the value x1 and y2 is the minimum of val3 except x2, and so on.
I have tried the following lines:
TYPE T IS ARRAY (NATURAL RANGE <>) of sfixed(6 downto -5);
SIGNAL val3 :T(4 downto 0);
SIGNAL val33 :T(4 downto 0);
SIGNAL V_two :sfixed(6 DOWNTO -5);
for k in 0 to 4 loop
for l in 0 to 4 loop
V_two := val3(k);
val3(k) <= to_sfixed(32,6,-5);
if V_two < val3(l) then val33(k) <= V_two;
elsif V_two >= val3(l) then val33(k) <=val3(l);
end if;
end loop;
end loop;
The problem is I always have V-two equal to 32 and not the indexed value from val3, so val33 is filled only with 32 values.
I don't know if there is a simple way to do this or where I have made the mistake
Thank you so much for your help

Related

If statement in a for loop VHDL

I want to do a for loop for 8 inputs and an if statement.My purpose is to find minimum of these 8 portsI know what the error is but i want to make (Ι-1) when the (i) take the value of 7.Any ideas?
if (a_unss(i)
LIBRARY ieee;
USE ieee.std_logic_1164 .all;
USe ieee.numeric_std .all;
---------------------------------------
ENTITY bitmin IS
generic
(
size: integer :=8
);
PORT
(
A0,A1,A2,A3,A4,A5,A6,A7 : IN UNSIGNED (size-1 downto 0);
MinOut:out UNSIGNED (size-1 downto 0)
);
END Entity;
-------------------------------------------------------------------------
ARCHITECTURE compare OF bitmin IS
type a_uns is array (0 to 7) of unsigned(7 downto 0);
signal a_unss:a_uns;
begin
a_unss(0)<=(A0);
a_unss(1)<=(A1);
a_unss(2)<=(A2);
a_unss(3)<=(A3);
a_unss(4)<=(A4);
a_unss(5)<=(A5);
a_unss(6)<=(A6);
a_unss(7)<=(A7);
process(a_unss)
begin
MinOut<="00000000";
for i in 0 to 7 loop
if (a_unss(i)<a_unss(i+1))and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1))and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1)) then
MinOut<=a_unss(i);
end if;
end loop;
end process;
END compare;
Error:
Error (10385): VHDL error at bitmin.vhd(48): index value 8 is outside the range (0 to 7) of object "a_unss"
Error (10658): VHDL Operator error at bitmin.vhd(48): failed to evaluate call to operator ""<""
Error (10658): VHDL Operator error at bitmin.vhd(48): failed to evaluate call to operator ""and""
Error (12153): Can't elaborate top-level user hierarchy
Error: Quartus Prime Analysis & Synthesis was unsuccessful. 4 errors, 1 warning
Error: Peak virtual memory: 4826 megabytes
Error: Processing ended: Thu Apr 09 19:39:04 2020
Error: Elapsed time: 0enter code here0:00:17
Error: Total CPU time (on all processors): 00:00:43
As others have pointed out, the for-loop index goes out of range of the array length. You also need to produce a chain of minimums. And the bit width within the Compare architecture should be dependent upon the generic SIZE.
In Version 1 below, a single long chain is used.
In Version 2 below, two half-length chains are used which gives a shorter overall propagation delay.
In Version 3 below, a tree structure is used which gives the shortest overall propagation delay.
Version 1 - One long chain
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;
entity BitMin is
generic
(
SIZE: integer := 8
);
port
(
a0, a1, a2, a3, a4, a5, a6, a7: in unsigned(SIZE - 1 downto 0);
minout: out unsigned(SIZE - 1 downto 0)
);
end entity;
architecture Compare of BitMin is
subtype TBits is unsigned(SIZE - 1 downto 0); -- Changed TByte to TBits because the bit width is dependent upon the generic SIZE.
type TBitsArray is array(0 to 7) of TBits;
signal inputs: TBitsArray;
signal min_chain: TBitsArray;
function Minimum(a, b: TBits) return TBits is
begin
if a < b then
return a;
end if;
return b;
end function;
begin
inputs <= ( a0, a1, a2, a3, a4, a5, a6, a7 );
-- Version 1 (one long chain)
process(inputs, min_chain)
begin
min_chain(0) <= inputs(0); -- Assume the first element in the array is the minimum.
for i in 1 to 7 loop -- Cycle through the remaining items to find the minimum.
min_chain(i) <= Minimum(min_chain(i - 1), inputs(i));
end loop;
minout <= min_chain(7);
end process;
end Compare;
Version 2 - Two half-length chains
-- Version 2 (two half-length chains: 0..3 and 7..4)
process(inputs, min_chain)
begin
min_chain(0) <= inputs(0); -- Assume the first element in the array is the minimum.
min_chain(7) <= inputs(7); -- Assume the last element in the array is the minimum.
for i in 1 to 3 loop -- Cycle through the remaining items to find the minimum.
min_chain(i) <= Minimum(min_chain(i - 1), inputs(i)); -- Work forwards from element 1.
min_chain(7 - i) <= Minimum(min_chain(7 - i + 1), inputs(7 - i)); -- Work backwards from element 6.
end loop;
minout <= Minimum(min_chain(3), min_chain(4)); -- Find the minimum of the two chains.
end process;
Version 3 - Tree
-- Version 3 (tree structure)
process(inputs)
constant NUM_INPUTS: natural := inputs'length;
constant NUM_STAGES: natural := natural(ceil(log2(real(NUM_INPUTS))));
type TTree is array(0 to NUM_STAGES) of TBitsArray; -- This declares a matrix, but we only use half of it (a triangle shape). The unused part will not be synthesized.
variable min_tree: TTree;
variable height: natural;
variable height_int: natural;
variable height_rem: natural;
variable a, b: TBits;
begin
-- Stage 0 is simply the inputs
min_tree(0) := inputs;
height := NUM_INPUTS;
for i in 1 to NUM_STAGES loop
-- Succeeding stages are half the height of the preceding stage.
height_int := height / 2;
height_rem := height rem 2; -- Remember the odd one out.
-- Process pairs in the preceding stage and assign the result to the succeeding stage.
for j in 0 to height_int - 1 loop
a := min_tree(i - 1)(j);
b := min_tree(i - 1)(j + height_int);
min_tree(i)(j) := Minimum(a, b);
end loop;
-- Copy the odd one out in the preceding stage to the succeeding stage
if height_rem = 1 then
a := min_tree(i - 1)(height - 1);
min_tree(i)(height_int) := a;
end if;
-- Adjust the ever-decreasing height for the succeeding stage.
height := height_int + height_rem;
end loop;
-- Get the value at the point of the triangle which is the minimum of all inputs.
minout <= min_tree(NUM_STAGES)(0);
end process;
Test Bench
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity BitMin_TB is
end entity;
architecture V1 of BitMin_TB is
constant SIZE_TB: natural := 8;
component BitMin is
generic
(
SIZE: integer := 8
);
port
(
a0, a1, a2, a3, a4, a5, a6, a7: in unsigned (SIZE - 1 downto 0);
minout: out unsigned (SIZE - 1 downto 0)
);
end component;
signal a0_tb, a1_tb, a2_tb, a3_tb, a4_tb, a5_tb, a6_tb, a7_tb: unsigned(SIZE_TB - 1 downto 0);
signal minout_tb: unsigned(SIZE_TB - 1 downto 0);
begin
DUT: BitMin
generic map
(
SIZE => SIZE_TB
)
port map
(
a0 => a0_tb,
a1 => a1_tb,
a2 => a2_tb,
a3 => a3_tb,
a4 => a4_tb,
a5 => a5_tb,
a6 => a6_tb,
a7 => a7_tb,
minout => minout_tb
);
process
begin
wait for 10 ns;
a0_tb <= "00000100";
a1_tb <= "00001000";
a2_tb <= "00010000";
a3_tb <= "00100000";
a4_tb <= "01000000";
a5_tb <= "10000000";
a6_tb <= "00000010";
a7_tb <= "00000001";
wait for 10 ns;
--std.env.stop;
wait;
end process;
end architecture;
Synthesis Comparison
All three versions synthesise to the same amount of logic elements, but Version 3 is the fastest.
Version 1 RTL - one long chain
Version 2 RTL - two half-length chains
Version 3 RTL - tree
if (a_unss(i)<a_unss(i+1))and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1))and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1)) and (a_unss(i)<a_unss(i+1)) then
The indexing of a_unss(i+1) is causing a problem as you are iterating form 0 to 7. When i reaches 7, i+1 is equal to 8 which is greater than the boundaries of a_unss. This is what the message : Error (10385): VHDL error at bitmin.vhd(48): index value 8 is outside the range (0 to 7) of object "a_unss" is saying.
EDIT
Suggestion to update the code:
LIBRARY ieee;
USE ieee.std_logic_1164 .all;
USe ieee.numeric_std .all;
---------------------------------------
ENTITY bitmin IS
generic
(
size: integer :=8
);
PORT
(
A0,A1,A2,A3,A4,A5,A6,A7 : IN UNSIGNED (size-1 downto 0);
MinOut:out UNSIGNED (size-1 downto 0)
);
END Entity;
-------------------------------------------------------------------------
ARCHITECTURE compare OF bitmin IS
type a_uns is array (0 to 7) of unsigned(7 downto 0);
signal a_unss:a_uns;
signal MinOut_tmp : UNSIGNED (size-1 downto 0) := 0;
signal done_flag: STD_LOGIC := '0';
begin
a_unss(0)<=(A0);
a_unss(1)<=(A1);
a_unss(2)<=(A2);
a_unss(3)<=(A3);
a_unss(4)<=(A4);
a_unss(5)<=(A5);
a_unss(6)<=(A6);
a_unss(7)<=(A7);
process(a_unss) begin
done_flag <= '0';
for i in 0 to 7 loop
if (a_unss(i) < MinOut_tmp) then
MinOut_tmp<=a_unss(i);
end if;
end loop;
done_flag <= '1';
end process;
END compare;
process(done_flag) begin
if (done_flag == '1') then
MinOut <= MinOut_tmp;
end if;
end process;

VHDL Loops - Only last increment is done

I have a problem with for loops in below code - in the simulation it shows like the only last increment of the loop is done, for example:
On the inputs I give (obviously in 8-bit SIGNED for the w0, w1, w2):
x1 = 1; x2 = 1; w0 = -32; w1 = 63; w2 = 63
and on the output I recieve u = 31 instead of u = 94.
So it seems the equation is:
u = (x2 * w2) - w0
Instead of:
u = (x1 * w1) + (x2 * w2) - w0
I know that the loops in VHDL works differently than in C, but the usage of variables should do the trick. Unfortunately, I'm missing something. What it might be?
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;
ENTITY NeuronBehavioral IS
GENERIC ( n: INTEGER := 1;
m: INTEGER := 2;
b: INTEGER := 8);
PORT ( x1 : in STD_LOGIC;
x2 : in STD_LOGIC;
w0 : in SIGNED (b-1 downto 0); --11100000 (-32)
w1 : in SIGNED (b-1 downto 0); --00111111 (63)
w2 : in SIGNED (b-1 downto 0); --00111111 (63)
u : out STD_LOGIC_VECTOR (b-1 downto 0));
END NeuronBehavioral;
ARCHITECTURE Behavioral OF NeuronBehavioral IS
TYPE weights IS ARRAY (1 TO n*m) OF SIGNED(b-1 DOWNTO 0);
TYPE inputs IS ARRAY (1 TO m) OF SIGNED(b-1 DOWNTO 0);
TYPE outputs IS ARRAY (1 TO n) OF SIGNED(b-1 DOWNTO 0);
BEGIN
PROCESS (w0, w1, w2, x1, x2)
VARIABLE weight: weights;
VARIABLE input: inputs;
VARIABLE output: outputs;
VARIABLE prod, acc: SIGNED(b-1 DOWNTO 0);
BEGIN
input(1) := "0000000" & x1;
input(2) := "0000000" & x2;
weight(1) := w1;
weight(2) := w2;
L1: FOR i IN 1 TO n LOOP
acc := (OTHERS => '0');
L2: FOR j IN 1 TO m LOOP
prod := input(j)*weight(m*(i-1)+j);
acc := acc + prod;
END LOOP L2;
output(i) := acc + w0;
END LOOP L1;
u <= STD_LOGIC_VECTOR(output(1));
END PROCESS;
END Behavioral;
Testbench:
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;
ENTITY NeuronTB IS
END NeuronTB;
ARCHITECTURE behavior OF NeuronTB IS
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT NeuronBehavioral
PORT(
x1 : IN std_logic;
x2 : IN std_logic;
w0 : IN SIGNED(7 downto 0);
w1 : IN SIGNED(7 downto 0);
w2 : IN SIGNED(7 downto 0);
u : OUT std_logic_vector(7 downto 0)
);
END COMPONENT;
--Inputs
signal x1 : std_logic := '0';
signal x2 : std_logic := '0';
signal w0 : SIGNED(7 downto 0) := (others => '0');
signal w1 : SIGNED(7 downto 0) := (others => '0');
signal w2 : SIGNED(7 downto 0) := (others => '0');
--Outputs
signal u : std_logic_vector(7 downto 0);
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: NeuronBehavioral PORT MAP (
x1 => x1,
x2 => x2,
w0 => w0,
w1 => w1,
w2 => w2,
u => u
);
-- Stimulus process
stim_proc: process
begin
-- hold reset state for 100 ns.
wait for 100 ns;
x1 <= '1';
x2 <= '1';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait for 100 ns;
x1 <= '1';
x2 <= '0';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait for 100 ns;
x1 <= '0';
x2 <= '1';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait for 100 ns;
x1 <= '0';
x2 <= '0';
w0 <= "11100000";
w1 <= "00111111";
w2 <= "00111111";
wait;
end process;
END;
The question did not originally provide a Minimal, Complete and Verifiable example, lacking the means to replicate the error much less the expected result. That's not the total barrier to the actual problems.
There's an out of bounds error for
prod := input(j) * weight( m * (i - 1) + j);
The right hand expression of type signed will have a length of the sum of the lengths of multiplicand (input(j)) and multiplier (weight( m * (i - 1) + j)).
Detecting the eventual effective value produced by evaluating the projected output waveform from the right hand expression in an assignment statement has a matching element for each element of the target is required by the standard (see IEEE Std 1076-2008 14.7.3.4 Signal update, -1993 thru -2002 12.6.2 Propagation of signal values).
(When tools allow suspension of performing this check by command line flag or configuration it's with the expectation that it would have been done at some point and that there's a performance increase in eliminating it.)
With regards to no needing an MCVe some simulators allow running a model with top level ports. This problem can identified by providing default values for all inputs. Depending on VHDL revision a report statement with to_string(output(1) can show the original cited answer.
port (
x1: in std_logic := '1'; -- default added
x2: in std_logic := '1'; -- default added
w0: in signed (b-1 downto 0) := to_signed(-32,b); --11100000 (-32) -- default added
w1: in signed (b-1 downto 0) := to_signed(63, b); --00111111 (63)
w2: in signed (b-1 downto 0) := to_signed(63, b); --00111111 (63)
u: out std_logic_vector (b-1 downto 0)
);
When run with ghdl the design specification produced a bounds failure in loop L2.
In the unlabeled process changing the declaration of prod:
variable prod: signed(b * 2 - 1 downto 0);
And the assignment to acc:
acc := acc + prod (b - 1 downto 0);
Allowed the calculation to complete, producing
neuronbehavioral.vhdl:58:9:#0ms:(report note): u = 01011110
With an added last statement to the process:
report "u = " & to_string (output(1));
For non VHDL-2008 compliant simulators a to_string function can be added to the declarative region of the process statement:
function to_string (inp: signed) return string is
variable image_str: string (1 to inp'length);
alias input_str: signed (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'value(std_ulogic'image(input_str(i)));
end loop;
return image_str;
end function;
Note the report value is the 8 bit signed value for 94.
Also the declarations for prod, acc and u should be examined to insure the design is capable of producing a result within the bounds of input values for w0, w1 and w2.
Not only is VHDL strongly typed, it's particular about mathematical meaning. It's an error if a result is incorrect, hence the product of the "*" operator has a length sufficient to produce a valid mathematical result. This can be seen in the numeric_std package body.
With the above patches to the design specification the testbench produces:
ghdl -r neurontb
neuronbehavioral.vhdl:58:9:#0ms:(report note): u = 00000000
neuronbehavioral.vhdl:58:9:#100ns:(report note): u = 01011110
neuronbehavioral.vhdl:58:9:#200ns:(report note): u = 00011111
neuronbehavioral.vhdl:58:9:#300ns:(report note): u = 00011111
neuronbehavioral.vhdl:58:9:#400ns:(report note): u = 11100000
Because input(j) can only be "00000000" or "000000001" based on the inputs x1 and x2) there's an alternative to the above changes:
prod := resize(input(j) * weight( m * (i - 1) + j), b);
The multiplier result can be resized (taking the least significant b length bits). the left most multiply is either by 0 or by 1.
Because the value of input(j) is either zero or one (as an 8 bit signed value) the first multiply can eliminated:
architecture foo of neuronbehavioral is
type weights is array (1 to n*m) of signed(b-1 downto 0);
-- type inputs is array (1 to m) of signed(b-1 downto 0); -- CHANGED
type inputs is array (1 to m) of std_logic;
type outputs is array (1 to n) of signed(b-1 downto 0);
begin
process (w0, w1, w2, x1, x2)
variable weight: weights;
variable input: inputs;
variable output: outputs;
-- variable prod: signed(b * 2 - 1 downto 0); -- RESTORED:
variable prod: signed(b - 1 downto 0);
variable acc: signed(b - 1 downto 0);
function to_string (inp: signed) return string is
variable image_str: string (1 to inp'length);
alias input_str: signed (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'value(std_ulogic'image(input_str(i)));
end loop;
return image_str;
end function;
begin
-- input(1) := "0000000" & x1; -- CHANGED
-- input(2) := "0000000" & x2; -- CHANGED
input := x1 & x2; -- ADDED
weight(1) := w1;
weight(2) := w2;
l1:
for i in 1 to n loop
acc := (others => '0');
l2:
for j in 1 to m loop
if input(j) = '1' then -- ADDED
-- prod := input(j) * weight( m * (i - 1) + j); -- CHANGED
prod := weight(m * (i - 1) + j); -- ADDED
else -- ADDED
prod := (others => '0'); -- ADDED
end if; -- ADDED
-- acc := acc + prod (b - 1 downto 0); -- RESTORED:
acc := acc + prod;
end loop l2;
output(i) := acc + w0;
end loop l1;
u <= std_logic_vector(output(1));
report "u = " & to_string (output(1));
end process;
end architecture foo;
For the second multiplier calculating the index for weight observe that all the variables are either generic constants or declared implicitly in loop statements. While the latter are dynamically elaborated at execution time in VHDL their value is considered static during traversal of the sequential statements in the each loop statement.
The sequence of statements in a loop statement are unrolled in synthesis. The equivalent in concurrent statements would be through the use of for generate statement replicating the various statements as concurrent statements. Note this would require signals (shared variables are not portable nor guaranteed to be supported for disparate vendor tool chains).
A concurrent statement version would look something like:
architecture foo of neuronbehavioral is
type weights is array (1 to n*m) of signed(b - 1 downto 0);
type inputs is array (1 to m) of std_logic;
type outputs is array (1 to n) of signed(b - 1 downto 0);
signal weight: weights;
signal input: inputs;
signal output: outputs;
function to_string (inp: signed) return string is
variable image_str: string (1 to inp'length);
alias input_str: signed (1 to inp'length) is inp;
begin
for i in input_str'range loop
image_str(i) := character'value(std_ulogic'image(input_str(i)));
end loop;
return image_str;
end function;
begin
weight <= w1 & w2;
input <= x1 & x2;
l1:
for i in 1 to n generate
type accums is array (1 to m) of signed (b - 1 downto 0);
signal accum: accums;
function acc (inp: accums) return signed is
variable retval: signed (b - 1 downto 0) := (others => '0');
begin
for i in accums'range loop
retval := retval + inp(i);
end loop;
return retval;
end function;
begin
l2:
for j in 1 to m generate
accum(j) <= weight(m * (i - 1) + j) when input(j) = '1' else
(others => '0');
end generate;
output(i) <= acc(accum) + w0;
end generate;
u <= std_logic_vector(output(1));
MONITOR:
process
begin
wait on x1, x2, w0, w1, w2;
wait for 0 ns;
wait for 0 ns;
wait for 0 ns;
wait for 0 ns;
report "u = " & to_string (output(1));
end process;
end architecture foo;
Where no multiply is used and all the statically indexed elements are accumulated in two places. The wait for 0 ns; statements in the MONITOR process are to overcome delta delays in 0 delay assignment through successive signals. (Somewhere there's something doing discrete events, for x1 and x2 if for no other purpose.)
This gives the same answer as above:
ghdl -r neurontb
neuronbehavioral.vhdl:169:9:#100ns:(report note): u = 01011110
neuronbehavioral.vhdl:169:9:#200ns:(report note): u = 00011111
neuronbehavioral.vhdl:169:9:#300ns:(report note): u = 00011111
neuronbehavioral.vhdl:169:9:#400ns:(report note): u = 11100000
and represents the same hardware.

Declare a variable number of signals with variable bitwidth in VHDL'93

I'm trying to implement an generic adder tree similar to here. For storing the intermediate results, I need to declare a variable number of signals with variable bitwidth. For example:
4 input values with bitwidth = 8:
after first stage: 2 values with bitwidth = 9
after second stage: 1 value with bitwidth = 10
9 input values with bitwidth = 8:
after first stage: 5 values with bitwidth = 9
after second stage: 3 values with bitwidth = 10
after third stage: 2 values with bitwidth = 11
after forth stage: 1 value with bitwidth = 12
I just found one solution to instantiate an array with length = # input values and bitwidth = bitwidth of the last signal. But I want to have something like the following. A record including the values of each stage concatenated to an std_logic_vector, but it's obviously not working:
lb(INPUT_VALUES) == number of stages
nr_val(i) == number of values at stage -> calculated in a separate function
type adder_stages is record
for i in 1 to lb(INPUT_VALUES) generate
stage(i-1) : std_logic_vector(nr_val(i)*(BITWIDTH+i)-1 downto 0);
end generate;
end record adder_stages;
Is it possible to declare a variable amount of signals with increasing bitwidth and dependent on the number of input values in VHDL '93?
Contrary to NiM's assertion that it's impossible to declare a variable amount of signals with increasing bitwidth and dependent on the number of input values in any version (revision) of VHDL, it is possible in -2008.
The secret is to use component instantiation recursion with an input port whose type is an unbounded array with an element subtype indication provided in the object declaration. The number of inputs and their length can be changed (number of inputs down, element subtype length up) in successive recursion levels. The output port is of a constant width and is driven by the lowest level adder output.
Defining an unbounded array definition with a deferred element subtype indication is not supported in -1993.
This code hasn't been verified other than guaranteeing the lengths and numbers of levels work correctly. It uses unsigned arithmetic because the OP didn't specify otherwise. Resize is used to increase the adder result length.
The report statements were used for debugging and can be removed (amazing how many simple errors you can make in something only mildly convoluted).
library ieee;
use ieee.std_logic_1164.all;
package adder_tree_pkg is
function clog2 (n: positive) return natural;
type input_array is array (natural range <>) of
std_logic_vector; -- -2008 unbounded array definition
function isodd (n: positive) return natural;
end package;
package body adder_tree_pkg is
function clog2 (n: positive) return natural is
variable r: natural := 0;
variable m: natural := n - 1;
begin
while m /= 0 loop
r := r + 1;
m := m / 2;
end loop;
return r;
end function clog2;
function isodd (n: positive) return natural is
begin
if (n/2 * 2 < n) then
return 1;
else
return 0;
end if;
end function;
end package body;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std_unsigned.all;
use work.adder_tree_pkg.all;
entity adder_tree_level is
generic (
constant INPUTS: positive := 9;
constant BITS: positive := 8;
constant LEVEL: positive := clog2(INPUTS);
constant Y_OUT_LEN: positive := LEVEL + BITS
);
port (
clk: in std_logic;
rst_n: in std_logic;
x_in: in input_array (INPUTS - 1 downto 0) (BITS - 1 downto 0);
y_out: out std_logic_vector (Y_OUT_LEN - 1 downto 0)
);
end entity;
architecture foo of adder_tree_level is
constant ODD_NUM_IN: natural := isodd(INPUTS);
constant NXT_INPS: natural := INPUTS/2 + ODD_NUM_IN;
signal x: input_array (INPUTS - 1 downto 0) (BITS - 1 downto 0);
signal nxt_x: input_array (NXT_INPS - 1 downto 0)
(BITS downto 0);
constant NPAIRS: natural := (INPUTS)/2;
begin
INPUT_REGISTER:
process (clk, rst_n)
begin
if rst_n = '0' then
x <= (others =>(others => '0'));
elsif rising_edge (clk) then
x <= x_in;
end if;
end process;
ADDERS:
process (x)
begin
report "LEVEL = " & integer'image(LEVEL);
report "y_out'length = " & integer'image(y_out'length);
report "nxt_x(0)'length = " & integer'image(nxt_x(0)'length);
for i in 0 to NPAIRS - 1 loop -- odd out is x'high ('left)
nxt_x(i) <= resize(x(i * 2), BITS + 1) + x(i * 2 + 1);
report "i * 2 = " & integer'image (i * 2);
report "i * 2 + 1 = " & integer'image (i * 2 + 1);
end loop;
if ODD_NUM_IN = 1 then
report "x'left = " & integer'image(x'left);
nxt_x(nxt_x'HIGH) <= resize(x(x'LEFT), BITS + 1);
end if;
end process;
RECURSE:
if LEVEL > 1 generate
NEXT_LEVEL:
entity work.adder_tree_level
generic map (
INPUTS => NXT_INPS,
BITS => BITS + 1,
LEVEL => LEVEL - 1,
Y_OUT_LEN => Y_OUT_LEN
)
port map (
clk => clk,
rst_n => rst_n,
x_in => nxt_x,
y_out => y_out
);
end generate;
OUTPUT:
if LEVEL = 1 generate
FINAL_OUTPUT:
y_out <= nxt_x(0);
end generate;
end architecture;
This example doesn't meet the criteria for answering Yes to the OP's question (which is a yes/no question) and simply refutes NiM's assertion that you can't do it in any version (revision) of VHDL.
It's ports are inspired by the Pipelined Adder Tree VHDL code found by the image the OP linked.
What you are asking for is not possible in any version of VHDL, v93 or otherwise. You can define a type inside a generate statement, but not use a generate within a type definition.
Your initial solution is the way that I would do it personally - if targeting an FPGA using modern tools the unused MSBs at each stage will be optimised away during synthesis, so the resulting circuit is as you've described with no additional overhead (i.e. the tools are clever enough to know that adding two 8-bit numbers can never occupy more than 9 bits).

Vivado synthesis: complex assignment not supported

I implemented a Booth modified multiplier in vhdl. I need to make a synthesis with Vivado but it's not possible because of this error:
"complex assignment not supported".
This is the shifter code that causes the error:
entity shift_register is
generic (
N : integer := 6;
M : integer := 6
);
port (
en_s : in std_logic;
cod_result : in std_logic_vector (N+M-1 downto 0);
position : in integer;
shift_result : out std_logic_vector(N+M-1 downto 0)
);
end shift_register;
architecture shift_arch of shift_register is
begin
process(en_s)
variable shift_aux : std_logic_vector(N+M-1 downto 0);
variable i : integer := 0; --solo per comoditÃ
begin
if(en_s'event and en_s ='1') then
i := position;
shift_aux := (others => '0');
shift_aux(N+M-1 downto i) := cod_result(N+M-1-i downto 0); --ERROR!!
shift_result <= shift_aux ;
end if;
end process;
end shift_arch;
the booth multiplier works with any operator dimension. So I can not change this generic code with a specific one.
Please help me! Thanks a lot
There's a way to make your index addressing static for synthesis.
First, based on the loop we can tell position must have a value within the range of shift_aux, otherwise you'd end up with null slices (IEEE Std 1076-2008 8.5 Slice names).
That can be shown in the entity declaration:
library ieee;
use ieee.std_logic_1164.all;
entity shift_register is
generic (
N: integer := 6;
M: integer := 6
);
port (
en_s: in std_logic;
cod_result: in std_logic_vector (N + M - 1 downto 0);
position: in integer range 0 to N + M - 1 ; -- range ADDED
shift_result: out std_logic_vector(N + M - 1 downto 0)
);
end entity shift_register;
What's changed is the addition of a range constraint to the port declaration of position. The idea is to support simulation where the default value of can be integer is integer'left. Simulating your shift_register would fail on the rising edge of en_s if position (the actual driver) did not provide an initial value in the index range of shift_aux.
From a synthesis perspective an unbounded integer requires you take both positive and negative integer values in to account. Your for loop is only using positive integer values.
The same can be done in the declaration of the variable i in the process:
variable i: integer range 0 to N + M - 1 := 0; -- range ADDED
To address the immediate synthesis problem we look at the for loop.
Xilinx support issue AR# 52302 tells us the issue is using dynamic values for indexes.
The solution is to modify what the for loop does:
architecture shift_loop of shift_register is
begin
process (en_s)
variable shift_aux: std_logic_vector(N + M - 1 downto 0);
-- variable i: integer range 0 to N + M - 1 := 0; -- range ADDED
begin
if en_s'event and en_s = '1' then
-- i := position;
shift_aux := (others => '0');
for i in 0 to N + M - 1 loop
-- shift_aux(N + M - 1 downto i) := cod_result(N + M - 1 - i downto 0);
if i = position then
shift_aux(N + M - 1 downto i)
:= cod_result(N + M - 1 - i downto 0);
end if;
end loop;
shift_result <= shift_aux;
end if;
end process;
end architecture shift_loop;
If i becomes a static value when the loop is unrolled in synthesis it can be used in calculation of indexes.
Note this gives us an N + M input multiplexer where each input is selected when i = position.
This construct can actually be collapsed into a barrel shifter by optimization, although you might expect the number of variables involved for large values of N and M might take a prohibitive synthesis effort or simply fail.
When synthesis is successful you'll collapse each output element in the assignment into a separate multiplexer that will match Patrick's
barrel shifter.
For sufficiently large values of N and M we can defined the depth in number of multiplexer layers in the barrel shifter based on the number of bits in a binary expression of the integer range of distance.
That either requires a declared integer type or subtype for position or finding the log2 value of N + M. We can use the log2 value because it would only be used statically. (XST supports log2(x) where x is a Real for determining static values, the function is found in IEEE package math_real). This gives us the binary length of position. (How many bits are required to to describe the shift distance, the number of levels of multiplexers).
architecture barrel_shifter of shift_register is
begin
process (en_s)
use ieee.math_real.all; -- log2 [real return real]
use ieee.numeric_std.all; -- to_unsigned, unsigned
constant DISTLEN: natural := integer(log2(real(N + M))); -- binary lengh
type muxv is array (0 to DISTLEN - 1) of
unsigned (N + M - 1 downto 0);
variable shft_aux: muxv;
variable distance: unsigned (DISTLEN - 1 downto 0);
begin
if en_s'event and en_s = '1' then
distance := to_unsigned(position, DISTLEN); -- position in binary
shft_aux := (others => (others =>'0'));
for i in 0 to DISTLEN - 1 loop
if i = 0 then
if distance(i) = '1' then
shft_aux(i) := SHIFT_LEFT(unsigned(cod_result), 2 ** i);
else
shft_aux(i) := unsigned(cod_result);
end if;
else
if distance(i) = '1' then
shft_aux(i) := SHIFT_LEFT(shft_aux(i - 1), 2 ** i);
else
shft_aux(i) := shft_aux(i - 1);
end if;
end if;
end loop;
shift_result <= std_logic_vector(shft_aux(DISTLEN - 1));
end if;
end process;
end architecture barrel_shifter;
XST also supports ** if the left operand is 2 and the value of i is treated as a constant in the sequence of statements found in a loop statement.
This could be implemented with signals instead of variables or structurally in a generate statement instead of a loop statement inside a process, or even as a subprogram.
The basic idea here with these two architectures derived from yours is to produce something synthesis eligible.
The advantage of the second architecture over the first is in reduction in the amount of synthesis effort during optimization for larger values of N + M.
Neither of these architectures have been verified lacking a testbench in the original. They both analyze and elaborate.
Writing a simple case testbench:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity shift_register_tb is
end entity;
architecture foo of shift_register_tb is
constant N: integer := 6;
constant M: integer := 6;
signal clk: std_logic := '0';
signal din: std_logic_vector (N + M - 1 downto 0)
:= (0 => '1', others => '0');
signal dout: std_logic_vector (N + M - 1 downto 0);
signal dist: integer := 0;
begin
DUT:
entity work.shift_register
generic map (
N => N,
M => M
)
port map (
en_s => clk,
cod_result => din,
position => dist,
shift_result => dout
);
CLOCK:
process
begin
wait for 10 ns;
clk <= not clk;
if now > (N + M + 2) * 20 ns then
wait;
end if;
end process;
STIMULI:
process
begin
for i in 1 to N + M loop
wait for 20 ns;
dist <= i;
din <= std_logic_vector(SHIFT_LEFT(unsigned(din),1));
end loop;
wait;
end process;
end architecture;
And simulating reveals that the range of position and the number of loop iterations only needs to cover the number of bits in the multiplier and not the multiplicand. We don't need a full barrel shifter.
That can be easily fixed in both shift_register architectures and has the side effect of making the shift_loop architecture much more attractive, it would be easier to synthesize based on the multiplier bit length (presumably M) and not the product bit length (N+ M).
And that would give you:
library ieee;
use ieee.std_logic_1164.all;
entity shift_register is
generic (
N: integer := 6;
M: integer := 6
);
port (
en_s: in std_logic;
cod_result: in std_logic_vector (N + M - 1 downto 0);
position: in integer range 0 to M - 1 ; -- range ADDED
shift_result: out std_logic_vector(N + M - 1 downto 0)
);
end entity shift_register;
architecture shift_loop of shift_register is
begin
process (en_s)
variable shift_aux: std_logic_vector(N + M - 1 downto 0);
-- variable i: integer range 0 to M - 1 := 0; -- range ADDED
begin
if en_s'event and en_s = '1' then
-- i := position;
shift_aux := (others => '0');
for i in 0 to M - 1 loop
-- shift_aux(N + M - 1 downto i) := cod_result(N + M - 1 - i downto 0);
if i = position then -- This creates an N + M - 1 input MUX
shift_aux(N + M - 1 downto i)
:= cod_result(N + M - 1 - i downto 0);
end if;
end loop; -- The loop is unrolled in synthesis, i is CONSTANT
shift_result <= shift_aux;
end if;
end process;
end architecture shift_loop;
Modifying the testbench:
STIMULI:
process
begin
for i in 1 to M loop -- WAS N + M loop
wait for 20 ns;
dist <= i;
din <= std_logic_vector(SHIFT_LEFT(unsigned(din),1));
end loop;
wait;
end process;
gives a result showing the shifts are over the range of the multiplier value (specified by M):
So the moral here is you don't need a full barrel shifter, only one that works over the multiplier range and not the product range.
The last bit of code should be synthesis eligible.
You are trying to create a range using a run-time varying value, and this is not supported by the synthesis tool. cod_result(N+M-1 downto 0); would be supported, because N, M, and 1 are all known at synthesis time.
If you're trying to implement a multiplier, you will get the best result using x <= a * b, and letting the synthesis tool choose the best way to implement it. If you have operands wider than the multiplier widths in your device, then you need to look at the documentation to determine the best route, which will normally involve pipelining of some sort.
If you need a run-time variable shift, look for a 'Barrel Shifter'. There are existing answers on these, for example this one.

Multiplication with Fixed point representation in VHDL

For the fixed point arithmatic I represented 0.166 with 0000 0010101010100110 and multiply it with same. for this I wrote the code in VHDL as below. Output is assigned in y which is signed 41bit. For signed Multiplication A(a1,b1)*A(a2,b2)=A(a1+a2+1,b1+b2). However during the simulation its give an error
Target Size 41 and source size 40 for array dimension 0 does not match.
code:
entity file1 is
Port ( y : out signed(40 downto 0));
end file1;
architecture Behavioral of file1 is
signal a : signed(19 downto 0) := "00000010101010100110";
signal b : signed(19 downto 0) := "00000010101010100110";
begin
y<= (a*b); ----error
end Behavioral;
The result of multiplying 19+1 bits to 19+1 bits is 39+1 bits, while your port is 40+1 bit long. For example let's multiply maximum possible values for 19-bits: 0x7FFFF * 0x7FFFF = 0x3FFFF00001 - so it's 39 bits (19 + 19 + carry) for unsigned result and +1 bit for sign.
So you should either "normalize" result by extending it to 1 more bit, which should be equal to the sign of result (bit#40 = bit#39) or just choose 40-bit port as output:
Port ( y : out signed(39 downto 0))
If you really need redundant 41st bit:
begin
y(39 downto 0) <= (a*b)
y(40) <= y(39)
end Behavioral;
Or just use resize function for signeds: How to convert 8 bits to 16 bits in VHDL?

Resources