VHDL: big slv array slicing indexed by integer (big mux) - vhdl

I want to slice a std_logic_vector in VHDL obtaining parts of it of fixed dimensions.
The general problem is:
din N*M bits
dout M bits
sel clog2(N) bits
Expected behaviour in an example (pseudocode): input 16 bit, want to slice it in 4 subvectors of 4bit each.
signal in: std_logic_vector(N*M-1 downto 0);
signal sel: integer;
-- with sel = 0
output <= in(N-1:0);
--with sel = 1 output <= in(2N-1:N)
-- with sel = 2
output <= in(3N-1:2N)
.....
--with sel = M-1
output <= in(M*N-1:(M-1)N)
I know a couples of way to do this, but I don't know which one is the best practice and give the best results in synthesis.
the entity
din: in std_logic_vector(15 downto 0);
dout: out std_logic_vector(3 downto 0);
sel: in std_logic_vecotor(1 downto 0)
CASE STATEMENT
case sel is
when "00" => dout <= din(3:0);
when "01" => dout <= din(7:4);
when "10" => dout <= din(11:8);
when "11" => dout <= din(15:12);
when others => ....`
It clearly implement a mux, but it's not generic at all and If the input gets big it's really hard to write and to codecover.
INTEGER INDEXING
sel_int <= to_integer(unsigned(sel));
dout <= din(4*(sel_int+1) - 1 downto 4*sel_int);
Extremely easy to write and to mantain, BUT it can have problems when the input is not a power of 2. For example, if I want to slice a 24bit vector in chunks of 4, what happen when the integer conversion of sel brings to the index 7?
A STRANGE TRADEOFF
sel_int <= to_integer(unsigned(sel));
for i in 0 to 4 generate
din_slice(i) <= din(4*(i+1)-1 downto 4*i);
end generate dout <= din_slice(sel_int);
I'm searching a solution that is general enough to be used with various input/output relationships and safe enough to be synthesized consistently everytime.
The Case statement is the only one with the Others case (that feels really safe), the other solutions rely on the slv to integer conversion and indexing that feels really comfortable but not so reliable.
Which solution would you use?
practical usecase
I have a 250bit std_logic_vector and I need to select 10 contigous bits inside of it starting from a certain point from 0 to 239. How can I do that in a way that is good for synthesis?

There is another option that is accepted by tools that allow VHDL 2008 (which includes Vivado and Prime Pro). You can use an unconstrained 2d type from a package:
type slv_array_t is array(natural range <>) of std_logic_vector; --vhdl 2008 unconstrained array type
then you can simply select which port you want. And it is as generic as you like.
library ieee;
use ieee.std_logic_1164.all;
use work.my_pkg.all;
entity mux is
generic (
N : natural;
M : natural
);
port (
sel : in natural;
ip : in slv_array_t (N-1 downto 0)(M-1 downto 0);
op : out std_logic_vector (M-1 downto 0);
);
end entity;
architecture rtl of mux is
begin
op <= ip(sel);
end architecture;

First you must extend the incoming data to be sure to have always as much bits as you need for connecting all multiplexer inputs (see the code below, process p_extend).
This will not create any logic at synthesis.
Second you must convert the resulting vector into an array, which you can access later by an index (see the code below, process p_create_array).
Again this will not create any logic at synthesis.
At last you must access this array by the select input signal (see the code below, process p_mux).
library ieee;
use ieee.std_logic_1164.all;
entity mux is
generic (
g_data_width : natural := 250;
g_slice_width : natural := 10;
g_sel_width : natural := 5;
g_start_point : natural := 27
);
port (
d_i : in std_logic_vector(g_data_width-1 downto 0);
sel_i : in std_logic_vector(g_sel_width-1 downto 0);
d_o : out std_logic_vector(g_slice_width-1 downto 0)
);
end entity mux;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
architecture struct of mux is
signal data : std_logic_vector(g_slice_width * 2**g_sel_width-1 downto 0);
type t_std_logic_slice_array is array (natural range <>) of std_logic_vector(g_slice_width-1 downto 0);
signal mux_in : t_std_logic_slice_array (2**g_sel_width-1 downto 0);
begin
p_extend: process(d_i)
begin
for i in 0 to g_slice_width * 2**g_sel_width-1 loop
if i+g_start_point<g_data_width then
data(i) <= d_i(i+g_start_point);
else
data(i) <= '0';
end if;
end loop;
end process;
p_create_array: process (data)
begin
for i in 0 to 2**g_sel_width-1 loop
mux_in(i) <= data((i+1)*g_slice_width-1 downto i*g_slice_width);
end loop;
end process;
p_mux: d_o <= mux_in(to_integer(unsigned(sel_i)));
end architecture;

Related

Accessing array elements using std_logic_vector (VHDL)

Please see the code below:
....
port(
the_input: in std_logic_vector(0 to 3));
...
type dummy_array is array (0 to 2) of std_logic_vector (0 to 7);
signal ins_dummy: dummy_array := ( 8x"1", 8x"2", 8x"3");
...
Now I want to access the elements of this array using bits the_input(0 to 1). How can I do this? as I know array accepts integers as arguments, but this input is std_logic. I tried many solution available on different forums but nothing seems to be working. For example when I apply this: to_integer(unsigned(the_input(0 to 1))), result is zero.
What is happening? I don't know. Any suggestions?
Using the small testbench below, I was able to access elements of the array using the method you mentioned -> some_array(to_integer(unsigned(some_signal))).
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use std.textio.all;
use ieee.std_logic_textio.all;
entity test is
end entity test;
architecture behav of test is
signal the_input : std_logic_vector(0 to 3);
signal test_sig : std_logic_vector(7 downto 0);
type dummy_array is array(0 to 2) of std_logic_vector(7 downto 0);
signal ins_dummy : dummy_array := (x"01", x"02", x"03");
begin
test_sig <= ins_dummy(to_integer(unsigned(the_input)));
process
begin
wait for 1 ns;
the_input <= "0000";
wait for 1 ns;
the_input <= "0001";
wait for 1 ns;
the_input <= "0010";
end process;
end architecture behav;
However, this is a simulation and a synthesizer may complain because the range of the port the_input is larger than the number of possible array options. You might have to add logic to ensure that the array indices which are "out of bounds" cannot be accessed. Hope that helps. Possibly try:
test_sig <= ins_dummy(to_integer(unsigned(the_input))) when (the_input < 3) else
others => '0';

How to write a record to memory and get it back in VHDL?

In VHDL pseudo-code what I would like to achieve is:
type tTest is record
A : std_logic_vector(3 downto 0);
B : std_logic_vector(7 downto 0);
C : std_logic_vector(0 downto 0);
end record tTest;
. . .
signal sTestIn : tTest;
signal sMemWrData : std_logic_vector(fRecordLen(tTest)-1 downto 0);
signal sMemRdData : std_logic_vector(fRecordLen(tTest)-1 downto 0);
signal sTestOut : tTest;
. . .
sMemWrData <= fRecordToVector(sTestIn);
-- At some point sMemRdData gets the data in sMemWrData...
sTestOut <= fVectorToRecord(sMemRdData);
fRecordLen is an imaginary function that returns the aggregate length of record directly from the type and fRecordToVector and fVectorToRecord are hopefully self explanatory. The target is synthesizable code that doesn't produce any extra logic. I post my current solution as an answer to further clarify the operation. However this is extremely awkward method and I don't consider it as a feasible solution due to the amount of boiler plate code.
I am aware of record introspection proposal but not holding my breath and even the proposed method seems very cumbersome.
I've given up hope for a fully general solution, so some concessions are more than acceptable. For example, allow only std_logic_vectors in the record and use several function/procedure calls. However, it would be great to avoid any boiler-plate code that must be hand or external script-adjusted per-record basis.
Also, if any Verilog/SystemVerilog wrappers exist that can input/output the record directly and achieve the same, pointers are extremely welcome.
One way to translate data from a vector (a linear array) to a record would be through the use of an aggregate.
library ieee;
use ieee.std_logic_1164.all;
package TestPck is
subtype A is std_logic_vector (12 downto 9);
subtype B is std_logic_vector (8 downto 1);
subtype C is std_logic_vector (0 downto 0);
constant ABC_len: natural := A'length + B'length + C'length;
type tTest is record
A: std_logic_vector (A'RANGE);
B: std_logic_vector (B'RANGE);
C: std_logic_vector (C'RANGE);
end record tTest;
type tTests is array (natural range <>) of tTest;
end package TestPck;
library ieee;
use ieee.std_logic_1164.all;
use work.TestPck.all;
entity tb is
end entity tb;
architecture sim of tb is
signal sTestIn: tTest;
signal sMemWrData: std_logic_vector(ABC_len - 1 downto 0);
signal sMemRdData: std_logic_vector(ABC_len - 1 downto 0);
signal sTestOut: tTest;
constant tests: tTests (0 to 1) :=
(0 => (x"E", x"A7", "1"), 1 => (x"7", x"AC", "0"));
begin
sMemWrData <= sTestIn.A & sTestIn.B & sTestIn.C;
sMemRdData <= sMemWrData after 5 ns;
sTestOut <=
tTest'(sMemRdData(A'range), sMemRdData(B'range), SMemRdData(C'range));
process is
begin
wait for 10 ns;
sTestIn <= tests(0);
wait for 10 ns;
sTestIn <= tests(1);
wait for 10 ns;
wait;
end process;
end architecture sim;
The qualified expression defines the aggregate as a value of tTest record with positional association which is assigned to the record type sTestOut.
And this gives:
So you can use concatenation for assembling a vector value (or an aggregate in -2008) and use an aggregate as a qualified expression to transfer sMemRdData to sTestOut.
If you have no plans to declare an object of an A, B or C subtype you can declare them as integer subtypes:
library ieee;
use ieee.std_logic_1164.all;
package TestPck is
subtype A is natural range 12 downto 9;
subtype B is natural range 8 downto 1;
subtype C is natural range 0 downto 0;
constant ABC_len: natural := A'left + 1;
type tTest is record
A: std_logic_vector (A);
B: std_logic_vector (B);
C: std_logic_vector (C);
end record tTest;
type tTests is array (natural range <>) of tTest;
end package TestPck;
library ieee;
use ieee.std_logic_1164.all;
use work.TestPck.all;
entity tb is
end entity tb;
architecture sim of tb is
signal sTestIn: tTest;
signal sMemWrData: std_logic_vector(ABC_len - 1 downto 0);
signal sMemRdData: std_logic_vector(ABC_len - 1 downto 0);
signal sTestOut: tTest;
constant tests: tTests (0 to 1) :=
(0 => (x"E", x"A7", "1"), 1 => (x"7", x"AC", "0"));
begin
sMemWrData <= sTestIn.A & sTestIn.B & sTestIn.C;
sMemRdData <= sMemWrData after 5 ns;
sTestOut <=
tTest'(sMemRdData(A), sMemRdData(B), SMemRdData(C));
process is
begin
wait for 10 ns;
sTestIn <= tests(0);
wait for 10 ns;
sTestIn <= tests(1);
wait for 10 ns;
wait;
end process;
end architecture sim;
This may be a little easier to read. It'll produce the same waveform above.
This a one way to achieve what is requested. The shortcomings/improvement ideas are in the comments.
library ieee;
use ieee.std_logic_1164.all;
package TestPck is
type tTest is record
A : std_logic_vector(3 downto 0);
B : std_logic_vector(7 downto 0);
C : std_logic_vector(0 downto 0);
end record tTest;
procedure pSliceToFrom (
signal vec_to : out std_logic_vector;
signal vec_from : in std_logic_vector;
position : inout integer
);
end package TestPck;
package body TestPck is
procedure pSliceToFrom (
signal vec_to : out std_logic_vector;
signal vec_from : in std_logic_vector;
position : inout integer
) is
begin
vec_to <= vec_from(position-1 downto position-vec_to'length);
position := position-vec_to'length;
end pSliceToFrom;
end package body TestPck;
library ieee;
use ieee.std_logic_1164.all;
use work.TestPck.all;
entity tb is
end entity tb;
architecture sim of tb is
signal sTestIn : tTest;
-- How to create this constant in the package,
-- i.e. without needing the signal?
constant cTestLength : integer := sTestIn.A'length + sTestIn.B'length + sTestIn.C'length;
signal sMemWrData : std_logic_vector(cTestLength-1 downto 0);
signal sMemRdData : std_logic_vector(cTestLength-1 downto 0);
signal sTestOut : tTest;
begin
-- How to make this without needing to know what
-- is inside tTest?
sMemWrData <= sTestIn.A & sTestIn.B & sTestIn.C;
-- Memory, Fifo, communication link, doesn't matter...
sMemRdData <= sMemWrData after 5 ns;
-- How to get the data back without needing this
-- process (and the procedure)?
slice_data_to_item : process (all) is
variable vPosition : integer := 0;
begin
vPosition := cTestLength;
pSliceToFrom(sTestOut.A, sMemRdData, vPosition);
pSliceToFrom(sTestOut.B, sMemRdData, vPosition);
pSliceToFrom(sTestOut.C, sMemRdData, vPosition);
end process slice_data_to_item;
process is
begin
wait for 10 ns;
sTestIn <= (x"E", x"A7", "1");
wait for 10 ns;
sTestIn <= (x"7", x"AC", "0");
wait;
end process;
end architecture sim;

Query on VHDL generics in packages

I have written a simple VHDL code to add two matrices containing 32 bit floating point numbers. The matrix dimensions have been defined in a package. Currently, I specify the matrix dimensions in the vhdl code and use the corresponding type from the package. However, I would like to use generic in the design to deal with matrices of different dimensions. For this I would have to somehow use the right type defined in the package. How do I go about doing this?
My current VHDL code is as below.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use work.mat_pak.all;
entity newproj is
Port ( clk : in STD_LOGIC;
clr : in STD_LOGIC;
start : in STD_LOGIC;
A_in : in t2;
B_in : in t2;
AplusB : out t2;
parallel_add_done : out STD_LOGIC);
end newproj;
architecture Behavioral of newproj is
COMPONENT add
PORT (
a : IN STD_LOGIC_VECTOR(31 DOWNTO 0);
b : IN STD_LOGIC_VECTOR(31 DOWNTO 0);
clk : IN STD_LOGIC;
sclr : IN STD_LOGIC;
ce : IN STD_LOGIC;
result : OUT STD_LOGIC_VECTOR(31 DOWNTO 0);
rdy: OUT STD_LOGIC
);
END COMPONENT;
signal temp_out: t2 := (others=>(others=>(others=>'0')));
signal add_over: t2bit:=(others=>(others=>'0'));
signal check_all_done,init_val: std_logic:='0';
begin
init_val <= '1';
g0: for k in 0 to 1 generate
g1: for m in 0 to 1 generate
add_instx: add port map(A_in(k)(m), B_in(k)(m), clk, clr, start, temp_out(k)(m), add_over(k)(m));
end generate;
end generate;
g2: for k in 0 to 1 generate
g3: for m in 0 to 1 generate
check_all_done <= add_over(k)(m) and init_val;
end generate;
end generate;
p1_add:process(check_all_done,temp_out)
begin
AplusB <= temp_out;
parallel_add_done <= check_all_done;
end process;
end Behavioral;
My package is as below
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.ALL;
package mat_pak is
subtype small_int is integer range 0 to 2;
type t22 is array (0 to 1) of std_logic_vector(31 downto 0);
type t2 is array (0 to 1) of t22; --2*2 matrix
type t22bit is array (0 to 1) of std_logic;
type t2bit is array (0 to 1) of t22bit; --2*2 matrix bit
type t33 is array (0 to 2) of std_logic_vector(31 downto 0);
type t3 is array (0 to 2) of t33; --3*3 matrix
end mat_pak;
Any suggestions would be welcome. Thank you.
There are some logical issues with your design.
First, there's some maximum number of ports for a sub-hierarchy a design can tolerate, you have 192 'bits' of matrix inputs and outputs. Do you really believe this number should be configurable?
At some point it will only fit in the very large FPGA devices, and shortly thereafter not fit there either.
Imagining some operation taking a variable number of clocks in add and parallel_add_done signifies when an aplusb datum is available comprised of elements of the matrix array contributed by all instantiated add components, the individual rdy signals are ANDed together. If the adds all take the same amount of time you could take the rdy from anyone of them (If you silicon is not that deterministic it would not be usable, there are registers in add).
The nested generate statements all assign the result of the AND between add_over(k,m) and init_val (which is a synthesis constant of 1). The effect or wire ANDing add_over(k.m) bits together (which doesn't work in VHDL and is likely not achievable in synthesis, either).
Note I also showed the proper indexing method for the two dimensional arrays.
Using Jonathan's method of sizing matrixes:
library ieee;
use ieee.std_logic_1164.all;
package mat_pak is
type matrix is array (natural range <>, natural range <>)
of std_logic_vector(31 downto 0);
type bmatrix is array (natural range <>, natural range <>)
of std_logic;
end package mat_pak;
library ieee;
use ieee.std_logic_1164.all;
use work.mat_pak.all;
entity newproj is
generic ( size: natural := 2 );
port (
clk: in std_logic;
clr: in std_logic;
start: in std_logic;
a_in: in matrix (0 to size - 1, 0 to size - 1);
b_in: in matrix (0 to size - 1, 0 to size - 1);
aplusb: out matrix (0 to size - 1, 0 to size - 1);
parallel_add_done: out std_logic
);
end entity newproj;
architecture behavioral of newproj is
component add
port (
a: in std_logic_vector(31 downto 0);
b: in std_logic_vector(31 downto 0);
clk: in std_logic;
sclr: in std_logic;
ce: in std_logic;
result: out std_logic_vector(31 downto 0);
rdy: out std_logic
);
end component;
signal temp_out: matrix (0 to size - 1, 0 to size - 1)
:= (others => (others => (others => '0')));
signal add_over: bmatrix (0 to size - 1, 0 to size - 1)
:= (others => (others => '0'));
begin
g0:
for k in 0 to size - 1 generate
g0x:
for m in 0 to size - 1 generate
add_instx: add
port map (
a => a_in(k,m),
b => b_in(k,m),
clk => clk,
sclr => clr,
ce => start,
result => temp_out(k,m),
rdy => add_over(k,m)
);
end generate;
end generate;
aplusb <= temp_out;
p1_add:
process (add_over)
variable check_all_done: std_logic;
begin
check_all_done := '1';
for k in 0 to size - 1 loop
for m in 0 to size -1 loop
check_all_done := check_all_done and add_over(k,m);
end loop;
end loop;
parallel_add_done <= check_all_done;
end process;
end architecture behavioral;
We find that we really want to AND the various rdy outputs (add_over array) together. In VHDL -2008 this can be done with the unary AND, otherwise you're counting on a synthesis tool to flatten the AND (and they generally do).
I made the assignment to aplusb a concurrent assignment.
So I dummied up an add entity with an empty architecture, the above then analyzes, elaborates and simulates, which shows that none of the connectivity has length mismatches anywhere.
I'm not quite sure to understand perfectly, but I'll try to answer anyway ;)
You can use unconstrained array like this:
package mat_pak is
type matrix is array(natural range <>, natural range <>) of std_logic_vector(31 downto 0);
end package mat_pack;
entity newproj is
Generic ( size : natural );
Port ( clk : in STD_LOGIC;
clr : in STD_LOGIC;
start : in STD_LOGIC;
A_in : in matrix(0 to size-1, 0 to size-1);
B_in : in matrix(0 to size-1, 0 to size-1);
AplusB : out matrix(0 to size-1, 0 to size-1);
parallel_add_done : out STD_LOGIC);
end newproj;

Dynamic Arrray Size in VHDL

I want to use dynamic range of array , so using "N" for converting an incoming vector signal to integer. Using the specifc incoming port "Size" gives me an error, while fixed vector produces perfect output.
architecture EXAMPLE of Computation is
signal size :std_logic_vector (7 downto 0);
process (ACLK, SLAVE_ARESETN) is
variable N: integer:=conv_integer ("00000111") ; ---WORKING
--variable N: integer:=conv_integer (size) ; -- Not working
type memory is array (N downto 0 ) of std_logic_vector (31 downto 0 );
variable RAM :memory;
Only reason to do this type of coding is send as much data as possible to FPGA .As I need to send Data from DDR to Custom IP via DMA in vivado may be more than 100 MB. so kindly guide me if I am trying to implement in wrong way as stated above.
You can't do that in VHDL. What kind of hardware would be generated by your code? If you don't know, the synthesizer won't either.
The way to do this kind of thing is to set N to the largest value you want to support, and use size in your logic to control your logic appropriately. It's difficult to give more pointers without more information, but as an example, you could use a counter to address your ram, and have it reset when it's greater than size.
Update
Here's a counter example. You have to make sure that size doesn't change while operating or it will fall into an unknown state. A real design should have reset states to ensure correct behaviour.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity example is
port (
clk : std_logic;
rst : in std_logic;
size : in unsigned(7 downto 0);
wr : in std_logic;
din : in std_logic_vector(31 downto 0)
);
end entity;
architecture rtl of example is
signal counter : unsigned(7 downto 0);
type ram_t is array(0 to 255) of std_logic_vector(31 downto 0);
signal ram : ram_t;
begin
RAM_WR: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
counter <= (others => '0');
else
if wr = '1' then
ram(to_integer(counter)) <= din;
if counter = size then
counter <= (others => '0');
else
counter <= counter + 1;
end if;
end if;
end if;
end if;
end process RAM_WR;
end architecture rtl;
I believe you can only have a generic an array constraint in a process. Otherwise, the compiler cannot elaborate.
In a function or procedure, you can have truly variable array bounds.

Making a 4-bit ALU from several 1-bit ALUs

I'm trying to combine several 1 bit ALUs into a 4 bit ALU. I am confused about how to actually do this in VHDL. Here is the code for the 1bit ALU that I am using:
component alu1 -- define the 1 bit alu component
port(a, b: std_logic_vector(1 downto 0);
m: in std_logic_vector(1 downto 0);
result: out std_logic_vector(1 downto 0));
end alu1;
architecture behv1 of alu1 is
begin
process(a, b, m)
begin
case m is
when "00" =>
result <= a + b;
when "01" =>
result <= a + (not b) + 1;
when "10" =>
result <= a and b;
when "11" =>
result <= a or b;
end case
end process
end behv1
I am assuming I define alu1 as a component of the larger entity alu4, but how can I tie them together?
Interesting you would even ask that question. VHDL synthesizers are quite capable of inferring any adder you like. You can just type what you need:
use ieee.numeric_std.all;
...
signal r : unsigned(3 downto 0);
signal a : unsigned(2 downto 0);
signal b : unsigned(2 downto 0);
signal c : unsigned(2 downto 0);
...
r <= a + b + c;
Then you can slice r to fit your needs:
result <= std_logic_vector(r(2 downto 0));
You can't (easily) string together these 1-bit ALUs into a functional multiple bit version. There is no way to handle the carry in/out needed for your add and subtract modes to work properly (the bitwise and & or should work OK, however).
Ignoring the carry issue for the moment, you would typically just setup a for generate loop and instantiate multiple copies of your bitwise logic, possibly special casing the first and/or last elements, ie:
MyLabel : for bitindex in 0 to 3 generate
begin
alu_x4 : entity work.alu1
port map (
a => input_a(bitindex),
b => input_b(bitindex),
m => mode,
result => result_x4(bitindex) );
end generate;

Resources