Query on VHDL generics in packages - vhdl

I have written a simple VHDL code to add two matrices containing 32 bit floating point numbers. The matrix dimensions have been defined in a package. Currently, I specify the matrix dimensions in the vhdl code and use the corresponding type from the package. However, I would like to use generic in the design to deal with matrices of different dimensions. For this I would have to somehow use the right type defined in the package. How do I go about doing this?
My current VHDL code is as below.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use work.mat_pak.all;
entity newproj is
Port ( clk : in STD_LOGIC;
clr : in STD_LOGIC;
start : in STD_LOGIC;
A_in : in t2;
B_in : in t2;
AplusB : out t2;
parallel_add_done : out STD_LOGIC);
end newproj;
architecture Behavioral of newproj is
COMPONENT add
PORT (
a : IN STD_LOGIC_VECTOR(31 DOWNTO 0);
b : IN STD_LOGIC_VECTOR(31 DOWNTO 0);
clk : IN STD_LOGIC;
sclr : IN STD_LOGIC;
ce : IN STD_LOGIC;
result : OUT STD_LOGIC_VECTOR(31 DOWNTO 0);
rdy: OUT STD_LOGIC
);
END COMPONENT;
signal temp_out: t2 := (others=>(others=>(others=>'0')));
signal add_over: t2bit:=(others=>(others=>'0'));
signal check_all_done,init_val: std_logic:='0';
begin
init_val <= '1';
g0: for k in 0 to 1 generate
g1: for m in 0 to 1 generate
add_instx: add port map(A_in(k)(m), B_in(k)(m), clk, clr, start, temp_out(k)(m), add_over(k)(m));
end generate;
end generate;
g2: for k in 0 to 1 generate
g3: for m in 0 to 1 generate
check_all_done <= add_over(k)(m) and init_val;
end generate;
end generate;
p1_add:process(check_all_done,temp_out)
begin
AplusB <= temp_out;
parallel_add_done <= check_all_done;
end process;
end Behavioral;
My package is as below
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.ALL;
package mat_pak is
subtype small_int is integer range 0 to 2;
type t22 is array (0 to 1) of std_logic_vector(31 downto 0);
type t2 is array (0 to 1) of t22; --2*2 matrix
type t22bit is array (0 to 1) of std_logic;
type t2bit is array (0 to 1) of t22bit; --2*2 matrix bit
type t33 is array (0 to 2) of std_logic_vector(31 downto 0);
type t3 is array (0 to 2) of t33; --3*3 matrix
end mat_pak;
Any suggestions would be welcome. Thank you.

There are some logical issues with your design.
First, there's some maximum number of ports for a sub-hierarchy a design can tolerate, you have 192 'bits' of matrix inputs and outputs. Do you really believe this number should be configurable?
At some point it will only fit in the very large FPGA devices, and shortly thereafter not fit there either.
Imagining some operation taking a variable number of clocks in add and parallel_add_done signifies when an aplusb datum is available comprised of elements of the matrix array contributed by all instantiated add components, the individual rdy signals are ANDed together. If the adds all take the same amount of time you could take the rdy from anyone of them (If you silicon is not that deterministic it would not be usable, there are registers in add).
The nested generate statements all assign the result of the AND between add_over(k,m) and init_val (which is a synthesis constant of 1). The effect or wire ANDing add_over(k.m) bits together (which doesn't work in VHDL and is likely not achievable in synthesis, either).
Note I also showed the proper indexing method for the two dimensional arrays.
Using Jonathan's method of sizing matrixes:
library ieee;
use ieee.std_logic_1164.all;
package mat_pak is
type matrix is array (natural range <>, natural range <>)
of std_logic_vector(31 downto 0);
type bmatrix is array (natural range <>, natural range <>)
of std_logic;
end package mat_pak;
library ieee;
use ieee.std_logic_1164.all;
use work.mat_pak.all;
entity newproj is
generic ( size: natural := 2 );
port (
clk: in std_logic;
clr: in std_logic;
start: in std_logic;
a_in: in matrix (0 to size - 1, 0 to size - 1);
b_in: in matrix (0 to size - 1, 0 to size - 1);
aplusb: out matrix (0 to size - 1, 0 to size - 1);
parallel_add_done: out std_logic
);
end entity newproj;
architecture behavioral of newproj is
component add
port (
a: in std_logic_vector(31 downto 0);
b: in std_logic_vector(31 downto 0);
clk: in std_logic;
sclr: in std_logic;
ce: in std_logic;
result: out std_logic_vector(31 downto 0);
rdy: out std_logic
);
end component;
signal temp_out: matrix (0 to size - 1, 0 to size - 1)
:= (others => (others => (others => '0')));
signal add_over: bmatrix (0 to size - 1, 0 to size - 1)
:= (others => (others => '0'));
begin
g0:
for k in 0 to size - 1 generate
g0x:
for m in 0 to size - 1 generate
add_instx: add
port map (
a => a_in(k,m),
b => b_in(k,m),
clk => clk,
sclr => clr,
ce => start,
result => temp_out(k,m),
rdy => add_over(k,m)
);
end generate;
end generate;
aplusb <= temp_out;
p1_add:
process (add_over)
variable check_all_done: std_logic;
begin
check_all_done := '1';
for k in 0 to size - 1 loop
for m in 0 to size -1 loop
check_all_done := check_all_done and add_over(k,m);
end loop;
end loop;
parallel_add_done <= check_all_done;
end process;
end architecture behavioral;
We find that we really want to AND the various rdy outputs (add_over array) together. In VHDL -2008 this can be done with the unary AND, otherwise you're counting on a synthesis tool to flatten the AND (and they generally do).
I made the assignment to aplusb a concurrent assignment.
So I dummied up an add entity with an empty architecture, the above then analyzes, elaborates and simulates, which shows that none of the connectivity has length mismatches anywhere.

I'm not quite sure to understand perfectly, but I'll try to answer anyway ;)
You can use unconstrained array like this:
package mat_pak is
type matrix is array(natural range <>, natural range <>) of std_logic_vector(31 downto 0);
end package mat_pack;
entity newproj is
Generic ( size : natural );
Port ( clk : in STD_LOGIC;
clr : in STD_LOGIC;
start : in STD_LOGIC;
A_in : in matrix(0 to size-1, 0 to size-1);
B_in : in matrix(0 to size-1, 0 to size-1);
AplusB : out matrix(0 to size-1, 0 to size-1);
parallel_add_done : out STD_LOGIC);
end newproj;

Related

VHDL: big slv array slicing indexed by integer (big mux)

I want to slice a std_logic_vector in VHDL obtaining parts of it of fixed dimensions.
The general problem is:
din N*M bits
dout M bits
sel clog2(N) bits
Expected behaviour in an example (pseudocode): input 16 bit, want to slice it in 4 subvectors of 4bit each.
signal in: std_logic_vector(N*M-1 downto 0);
signal sel: integer;
-- with sel = 0
output <= in(N-1:0);
--with sel = 1 output <= in(2N-1:N)
-- with sel = 2
output <= in(3N-1:2N)
.....
--with sel = M-1
output <= in(M*N-1:(M-1)N)
I know a couples of way to do this, but I don't know which one is the best practice and give the best results in synthesis.
the entity
din: in std_logic_vector(15 downto 0);
dout: out std_logic_vector(3 downto 0);
sel: in std_logic_vecotor(1 downto 0)
CASE STATEMENT
case sel is
when "00" => dout <= din(3:0);
when "01" => dout <= din(7:4);
when "10" => dout <= din(11:8);
when "11" => dout <= din(15:12);
when others => ....`
It clearly implement a mux, but it's not generic at all and If the input gets big it's really hard to write and to codecover.
INTEGER INDEXING
sel_int <= to_integer(unsigned(sel));
dout <= din(4*(sel_int+1) - 1 downto 4*sel_int);
Extremely easy to write and to mantain, BUT it can have problems when the input is not a power of 2. For example, if I want to slice a 24bit vector in chunks of 4, what happen when the integer conversion of sel brings to the index 7?
A STRANGE TRADEOFF
sel_int <= to_integer(unsigned(sel));
for i in 0 to 4 generate
din_slice(i) <= din(4*(i+1)-1 downto 4*i);
end generate dout <= din_slice(sel_int);
I'm searching a solution that is general enough to be used with various input/output relationships and safe enough to be synthesized consistently everytime.
The Case statement is the only one with the Others case (that feels really safe), the other solutions rely on the slv to integer conversion and indexing that feels really comfortable but not so reliable.
Which solution would you use?
practical usecase
I have a 250bit std_logic_vector and I need to select 10 contigous bits inside of it starting from a certain point from 0 to 239. How can I do that in a way that is good for synthesis?
There is another option that is accepted by tools that allow VHDL 2008 (which includes Vivado and Prime Pro). You can use an unconstrained 2d type from a package:
type slv_array_t is array(natural range <>) of std_logic_vector; --vhdl 2008 unconstrained array type
then you can simply select which port you want. And it is as generic as you like.
library ieee;
use ieee.std_logic_1164.all;
use work.my_pkg.all;
entity mux is
generic (
N : natural;
M : natural
);
port (
sel : in natural;
ip : in slv_array_t (N-1 downto 0)(M-1 downto 0);
op : out std_logic_vector (M-1 downto 0);
);
end entity;
architecture rtl of mux is
begin
op <= ip(sel);
end architecture;
First you must extend the incoming data to be sure to have always as much bits as you need for connecting all multiplexer inputs (see the code below, process p_extend).
This will not create any logic at synthesis.
Second you must convert the resulting vector into an array, which you can access later by an index (see the code below, process p_create_array).
Again this will not create any logic at synthesis.
At last you must access this array by the select input signal (see the code below, process p_mux).
library ieee;
use ieee.std_logic_1164.all;
entity mux is
generic (
g_data_width : natural := 250;
g_slice_width : natural := 10;
g_sel_width : natural := 5;
g_start_point : natural := 27
);
port (
d_i : in std_logic_vector(g_data_width-1 downto 0);
sel_i : in std_logic_vector(g_sel_width-1 downto 0);
d_o : out std_logic_vector(g_slice_width-1 downto 0)
);
end entity mux;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
architecture struct of mux is
signal data : std_logic_vector(g_slice_width * 2**g_sel_width-1 downto 0);
type t_std_logic_slice_array is array (natural range <>) of std_logic_vector(g_slice_width-1 downto 0);
signal mux_in : t_std_logic_slice_array (2**g_sel_width-1 downto 0);
begin
p_extend: process(d_i)
begin
for i in 0 to g_slice_width * 2**g_sel_width-1 loop
if i+g_start_point<g_data_width then
data(i) <= d_i(i+g_start_point);
else
data(i) <= '0';
end if;
end loop;
end process;
p_create_array: process (data)
begin
for i in 0 to 2**g_sel_width-1 loop
mux_in(i) <= data((i+1)*g_slice_width-1 downto i*g_slice_width);
end loop;
end process;
p_mux: d_o <= mux_in(to_integer(unsigned(sel_i)));
end architecture;

VHDL No drivers exist on out port

I am doing my first project in VHDL, I try to implement 8-bit barrel shifter using mux.
This is code for one block (8 mux in chain):
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE work.sample_package.all;
-------------------------------------
ENTITY Shifter IS
GENERIC (n : INTEGER );
PORT ( x,y: IN STD_LOGIC_VECTOR (n-1 DOWNTO 0);
redB: IN Integer;
out_m: OUT STD_LOGIC_VECTOR(n-1 downto 0));
END Shifter;
--------------------------------------------------------------
ARCHITECTURE dfl OF Shifter IS
SIGNAL sm : STD_LOGIC;
SIGNAL what_b : STD_LOGIC;
BEGIN
--redB in the number of the red block in the diagram
--The first mux port map is the same for all three blocks
sm <= y(redB);
first : MUX port map(
a => x(0),
b => '0',
s0 => sm,
y => out_m(0)
);
b0: if redB=0 generate --First block - only the first mux has b=0
rest : for i in 1 to n-1 generate
chain : MUX port map(
a => x(i),
b => x(i-1),
s0 => sm,
y => out_m(i)
);
end generate;
end generate;
b1: if redB=1 generate
rest : for i in 1 to n-1 generate
what_b <= '0' when i=1 else --Second block - 2 first mux has b=0
x(i-2);
chain : MUX port map(
a => x(i),
b => what_b,
s0 => sm,
y => out_m(i)
);
end generate;
end generate;
b2: if redB=2 generate
rest : for i in 1 to n-1 generate
what_b <= '0' when i=1 or i=2 or i=3 else --Third block - 4 first mux has b=0
x(i-4);
chain : MUX port map(
a => x(i),
b => what_b,
s0 => sm,
y => out_m(i)
);
end generate;
end generate;
END dfl;
In this is the code for changing 3 shifters:
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE work.sample_package.all;
-------------------------------------
ENTITY Barrel IS
GENERIC (n : INTEGER);
PORT ( x,y: IN STD_LOGIC_VECTOR (n-1 DOWNTO 0);
out_shifter0,out_shifter1,out_shifter2: OUT STD_LOGIC_VECTOR(n-1 downto 0));
END Barrel;
--------------------------------------------------------------
ARCHITECTURE dfl OF Barrel IS
SIGNAL temp_out0 : std_logic_vector(n-1 DOWNTO 0);
SIGNAL temp_out1 : std_logic_vector(n-1 DOWNTO 0);
SIGNAL temp_out2 : std_logic_vector(n-1 DOWNTO 0);
BEGIN
y0: Shifter GENERIC MAP(n) port map (x=>x,y=>y,redB=>0,out_m=>temp_out0);
out_shifter0 <= temp_out0;
y1: Shifter GENERIC MAP(n) port map (x=>temp_out0,y=>y,redB=>1,out_m=>temp_out1);
out_shifter1 <= temp_out1;
y2: Shifter GENERIC MAP(n) port map (x=>temp_out1,y=>y,redB=>2,out_m=>temp_out2);
out_shifter2 <= temp_out2;
END dfl;
All the files are compiling, but when I try to run a simulation I get this warning:
# ** Warning: (vsim-8684) No drivers exist on out port /tb/L0/y1/out_m(7 downto 1), and its initial value is not used.
#
# Therefore, simulation behavior may occur that is not in compliance with
#
# the VHDL standard as the initial values come from the base signal /tb/L0/temp_out1(7 downto 1).
I am using ModelSim.
Anyone got any idea of what could be the problem?
Thanks!
You have done a generate with a signal, and compared its value to something. Integers initialise to -2^31, so none of the generate blocks exist because the values you have assigned externally do not get assigned until after the simulation is started, but the generates get created during elaboration (before the simulation starts) using the initial value of redB. Hence no drivers for out_m. Instead of using a signal in the generate condition, use generics instead, as their values are fixed and assigned during elaboration.

I can't understand why my waveform is coming out this way

I am very new to VHDL coding and I have been trying to debug my code for a 32-bit adder/subtractor. The N-bit adder/subtractor is composed multiple 1-bit adder/subtractor using a generate statement. I have been testing it for 6-bit inputs using simulation. The waveform is constantly incorrect and I have tried changing just about everything. Maybe, it is a problem with the delays and the generate statement not cycling through correctly. (I am just beginning to learn how to code in vhdl.)
My 1-bit adder/subtractor
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity addsub_1bit is
Port ( in_0 : in STD_LOGIC;
in_1 : in STD_LOGIC;
cin : in STD_LOGIC;
AddOrSub : in STD_LOGIC;
sum_sub : out STD_LOGIC;
cout_bout : out STD_LOGIC);
end addsub_1bit;
architecture data_flow_addsub_1bit of addsub_1bit is
begin
sum_sub <= (in_1 and (not in_0) and (not cin)) or ((not in_1) and in_0 and (not cin)) or ((not in_1) and (not in_0) and cin) or (in_1 and in_0 and cin) after 19 ns;
cout_bout <= (in_1 and in_0 and (not AddOrSub)) or ((not in_1)and in_1 and cin) or ((not in_1)and cin and AddOrSub) or (in_0 and cin) or (in_1 and cin and AddOrSub) after 19 ns;
end data_flow_addsub_1bit;
The N-bit adder/subtractor:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
ENTITY adder_sub32 is
GENERIC (BW : INTEGER :=32);
PORT ( a_32 : IN STD_LOGIC_VECTOR (BW -1 downto 0);
b_32 : IN STD_LOGIC_VECTOR (BW -1 downto 0);
cin : IN STD_LOGIC;
sub : IN STD_LOGIC;
sum_32 : out STD_LOGIC_VECTOR (BW -1 downto 0);
cout : INOUT STD_LOGIC ;
ov : OUT STD_LOGIC ); -- ov stands for overflow
END adder_sub32 ;
ARCHITECTURE adder_sub32_arch OF adder_sub32 IS
signal tmp : std_logic_vector (BW downto 0);
BEGIN
tmp(0) <= cin;
gen: for i IN 0 TO BW-1 GENERATE
as1: entity work.addsub_1bit
PORT MAP(
in_0 => a_32(i),
in_1 => b_32(i),
cin => tmp(i),
AddOrSub => sub,
sum_sub => sum_32(i),
cout_bout => tmp(i+1));
end GENERATE;
ov <= tmp(BW) after 95 ns;
END ARCHITECTURE;
My testbench:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY adder_sub32_TB_SHan_53967364 IS
END adder_sub32_TB_SHan_53967364;
ARCHITECTURE behavior OF adder_sub32_TB_SHan_53967364 IS
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT adder_sub32 IS
GENERIC (BW : INTEGER :=32);
PORT ( a_32 : IN STD_LOGIC_VECTOR (BW -1 downto 0);
b_32 : IN STD_LOGIC_VECTOR (BW -1 downto 0);
cin : IN STD_LOGIC ;
sub : IN STD_LOGIC ;
sum_32 : out STD_LOGIC_VECTOR (BW -1 downto 0);
cout : INOUT STD_LOGIC ;
ov : OUT STD_LOGIC ); -- ov stands for overflow
END COMPONENT;
signal a : std_logic_vector(5 downto 0); --:= (others => '0');
signal b : std_logic_vector(5 downto 0); --:= (others => '0');
signal cin : std_logic;
signal sub : std_logic;
signal cout : std_logic;
signal sum_32 : std_logic_vector(5 downto 0);
signal ov : std_logic;
BEGIN
test1: adder_sub32
GENERIC MAP (6)
PORT MAP (a_32 => a,b_32 => b,cin => cin,sub => sub,sum_32 => sum_32,cout => cout,ov => ov);
sub <= '0';
cin <= '0';
a <= "101010";
b <= "110101";
END;
The waveform I got:
The final sum is correct ("101010" + "110101" = "011111") in this case, but not in all cases.
EDIT2: Let's take a closer look, why the carry is not rippling as expected in your addition. The bits 0 (LSB) to 5 of the operands together, request that the carry-in is propagated from bit 0 to the carry-in of bit 6. Bits 6 of the operands generate a carry, which is carry-out of the adder. As the cin of bit 0 is '0', all intermediate carry-ins will be '0' too, but it should ripple through the carry-chain.
Now lets, take a look at the one-bit adder. You are adding two numbers, so that, AddOrSub is '0'. With this, the equation of cout_bout can be simplified to:
cout_bout <= (in_1 and in_0) or (in_0 and cin);
This equation is definitly wrong, because the carry-in is not propagated when in_1 = '1' and in_0 = '0'. Thus, some of the intermediate carries will be computed to '0' just after 19 ns without waiting for the rippling carry. The corresponding sum bit will be valid after 38 ns as shown in your waveform. The final value of the sum is not affected because this shortcuted carry is identical to the expected rippling carry. Please consider here, that all the 1-bit adder (generated by the generate statement) work concurrently.
To fix the equation, I recommend to write a testbench for the 1-bit adder. This testbench would have to check all possible 16 input combinations of in_0, in_1, cin, and AddOrSub.
Another testcase would be to add the above two operands with an cin of '1'.
(End of EDIT2.)
The ov is correct too in this case, but not in all cases.
EDIT: You mixed up the overflow ov with the carry-out cout. The overflow flag indicates an overflow in the signed number space. For the addition, the overflow flag is '1' if and only if:
the addition of two positive numbers results in a negative sum, or
the addition of two negative numbers results in a positive sum.
For subtraction it is the other way round.
Because this is a homework question, I will not solve it completely. But I will give a you a testcase where your current logic fails: if you add 1 ("000001") plus -1 ("111111"), then the sum must be zero, the overflow '0' and the carry-out '1'. (End of Edit.)
The cout is 'U' because you haven't connected it in adder_sub32. The carry-out is the top-most bit in your carry-chain, and thus:
cout <= tmp(BW);
And you should fix the direction of cout in adder_sub32. The carry-out is just an output of this component. So declare it as out instead of inout.

Or Reduce An Array of Vectors

Needs to be placed on a real board, so will have to synthesize.
Using an old VHDL, libraries included:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;
use ieee.std_logic_misc.all;
Some signals:
type my_array is array (N-1 downto 0) of std_logic_vector(31 downto 0);
signal enable : my_array;
signal ored_enable: std_logic_vector(31 downto 0);
Signals get joined up in a generator:
my_gen: for i in 0 to (N-1) generate
woah: entity work.my_entity
port map(
clk => clk,
enable => enable(i)
);
end generate;
ored_enable <= or_reduce(enable); -- this fails
I'm just trying to create a std_logic_vector which holds the ored signals from the array. Any ideas how I can simply achieve this?
First, I expect your last line to be a typo and read
ored_enable <= or_reduce(enable);
But this wouldn't work since or_reduce is only defined for std_logic_vector, not array of std_logic_vector. You can create your own reduce function:
function or_reduce(a : my_array) return std_logic_vector is
variable ret : std_logic_vector(31 downto 0) := (others => '0');
begin
for i in a'range loop
ret := ret or a(i);
end loop;
return ret;
end function or_reduce;
Just put it in your architecture's declarations and it should work.

Making a 4-bit ALU from several 1-bit ALUs

I'm trying to combine several 1 bit ALUs into a 4 bit ALU. I am confused about how to actually do this in VHDL. Here is the code for the 1bit ALU that I am using:
component alu1 -- define the 1 bit alu component
port(a, b: std_logic_vector(1 downto 0);
m: in std_logic_vector(1 downto 0);
result: out std_logic_vector(1 downto 0));
end alu1;
architecture behv1 of alu1 is
begin
process(a, b, m)
begin
case m is
when "00" =>
result <= a + b;
when "01" =>
result <= a + (not b) + 1;
when "10" =>
result <= a and b;
when "11" =>
result <= a or b;
end case
end process
end behv1
I am assuming I define alu1 as a component of the larger entity alu4, but how can I tie them together?
Interesting you would even ask that question. VHDL synthesizers are quite capable of inferring any adder you like. You can just type what you need:
use ieee.numeric_std.all;
...
signal r : unsigned(3 downto 0);
signal a : unsigned(2 downto 0);
signal b : unsigned(2 downto 0);
signal c : unsigned(2 downto 0);
...
r <= a + b + c;
Then you can slice r to fit your needs:
result <= std_logic_vector(r(2 downto 0));
You can't (easily) string together these 1-bit ALUs into a functional multiple bit version. There is no way to handle the carry in/out needed for your add and subtract modes to work properly (the bitwise and & or should work OK, however).
Ignoring the carry issue for the moment, you would typically just setup a for generate loop and instantiate multiple copies of your bitwise logic, possibly special casing the first and/or last elements, ie:
MyLabel : for bitindex in 0 to 3 generate
begin
alu_x4 : entity work.alu1
port map (
a => input_a(bitindex),
b => input_b(bitindex),
m => mode,
result => result_x4(bitindex) );
end generate;

Resources