Memory with non zero low index - any caveats? - vhdl

In the VHDL language a common way to init a memory is:
type mem0_type (0 to MEM0_SIZE-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
signal mem0 : mem0_type;
For ease of use later in a memory and register adressnig table, I am considering:
type mem0_type (MEM0_ADDR to MEM0_ADDR+MEM0_SIZE-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
signal mem0 : mem0_type;
To be able to do something like this:
case switch is
when mem0'range => mem0(switch) <= data;
when mem1'range => mem1(switch) <= data;
when mem2'range => mem2(switch) <= data;
when mem3'range => mem3(switch) <= data;
when REG0_ADDR => reg0 <= data;
when REG1_ADDR => reg1 <= data;
when REG2_ADDR => reg2 <= data;
...
end case;
Instead of something like:
case switch is
when MEM0ADDR to MEM0ADDR+MEM0_SIZE-1 => mem0(switch-MEM0ADDR) <= data;
when MEM1ADDR to MEM1ADDR+MEM1_SIZE-1 => mem1(switch-MEM1ADDR) <= data;
when MEM2ADDR to MEM2ADDR+MEM2_SIZE-1 => mem2(switch-MEM2ADDR) <= data;
when REG0_ADDR => reg0 <= data;
when REG1_ADDR => reg1 <= data;
when REG2_ADDR => reg2 <= data;
...
end case;
Is there any downside of using a none zero index as the start adress if my synthesis tool allows it?
Thank you for taking the time, I'd love to see more HDL activity on stack <3

You can do as suggested, but - since synthesis rules are not strictly standarised - I wouldn't bet on each and every synthesis tool to reliably detect and infer memory from your construct.
This appears to be a perfect use case for an alternative construct using an alias:
entity mem is
generic
(
MEM_SIZE : natural;
DATA_WIDTH : natural;
START_ADDRESS : unsigned(31 downto 0)
);
port
(
...
);
end entity mem;
architecture alias_architecture of mem is
type mem_type is array (natural range <>) of std_ulogic_vector(DATA_WIDTH - 1 downto 0);
signal mem : mem_type(0 to MEM_SIZE - 1);
alias amem : mem_type(to_integer(START_ADDRESS) to to_integer(START_ADDRESS) + MEM_SIZE - 1) is mem;
begin
...
This way you still have the luxury of convenient addressing with the only place you need to deal with the offset is the alias definition. It probably does not guarantee every tool available will be able to guess what you meant but I'd assume it will at least increase the likelyhood.
[P.S.: tried that out of curiosity in Quartus II and unfortunately, it appears it can't be bothered to infer RAM blocks through the aliased array. YMMV with other synthesis tools.]

Related

VHDL: big slv array slicing indexed by integer (big mux)

I want to slice a std_logic_vector in VHDL obtaining parts of it of fixed dimensions.
The general problem is:
din N*M bits
dout M bits
sel clog2(N) bits
Expected behaviour in an example (pseudocode): input 16 bit, want to slice it in 4 subvectors of 4bit each.
signal in: std_logic_vector(N*M-1 downto 0);
signal sel: integer;
-- with sel = 0
output <= in(N-1:0);
--with sel = 1 output <= in(2N-1:N)
-- with sel = 2
output <= in(3N-1:2N)
.....
--with sel = M-1
output <= in(M*N-1:(M-1)N)
I know a couples of way to do this, but I don't know which one is the best practice and give the best results in synthesis.
the entity
din: in std_logic_vector(15 downto 0);
dout: out std_logic_vector(3 downto 0);
sel: in std_logic_vecotor(1 downto 0)
CASE STATEMENT
case sel is
when "00" => dout <= din(3:0);
when "01" => dout <= din(7:4);
when "10" => dout <= din(11:8);
when "11" => dout <= din(15:12);
when others => ....`
It clearly implement a mux, but it's not generic at all and If the input gets big it's really hard to write and to codecover.
INTEGER INDEXING
sel_int <= to_integer(unsigned(sel));
dout <= din(4*(sel_int+1) - 1 downto 4*sel_int);
Extremely easy to write and to mantain, BUT it can have problems when the input is not a power of 2. For example, if I want to slice a 24bit vector in chunks of 4, what happen when the integer conversion of sel brings to the index 7?
A STRANGE TRADEOFF
sel_int <= to_integer(unsigned(sel));
for i in 0 to 4 generate
din_slice(i) <= din(4*(i+1)-1 downto 4*i);
end generate dout <= din_slice(sel_int);
I'm searching a solution that is general enough to be used with various input/output relationships and safe enough to be synthesized consistently everytime.
The Case statement is the only one with the Others case (that feels really safe), the other solutions rely on the slv to integer conversion and indexing that feels really comfortable but not so reliable.
Which solution would you use?
practical usecase
I have a 250bit std_logic_vector and I need to select 10 contigous bits inside of it starting from a certain point from 0 to 239. How can I do that in a way that is good for synthesis?
There is another option that is accepted by tools that allow VHDL 2008 (which includes Vivado and Prime Pro). You can use an unconstrained 2d type from a package:
type slv_array_t is array(natural range <>) of std_logic_vector; --vhdl 2008 unconstrained array type
then you can simply select which port you want. And it is as generic as you like.
library ieee;
use ieee.std_logic_1164.all;
use work.my_pkg.all;
entity mux is
generic (
N : natural;
M : natural
);
port (
sel : in natural;
ip : in slv_array_t (N-1 downto 0)(M-1 downto 0);
op : out std_logic_vector (M-1 downto 0);
);
end entity;
architecture rtl of mux is
begin
op <= ip(sel);
end architecture;
First you must extend the incoming data to be sure to have always as much bits as you need for connecting all multiplexer inputs (see the code below, process p_extend).
This will not create any logic at synthesis.
Second you must convert the resulting vector into an array, which you can access later by an index (see the code below, process p_create_array).
Again this will not create any logic at synthesis.
At last you must access this array by the select input signal (see the code below, process p_mux).
library ieee;
use ieee.std_logic_1164.all;
entity mux is
generic (
g_data_width : natural := 250;
g_slice_width : natural := 10;
g_sel_width : natural := 5;
g_start_point : natural := 27
);
port (
d_i : in std_logic_vector(g_data_width-1 downto 0);
sel_i : in std_logic_vector(g_sel_width-1 downto 0);
d_o : out std_logic_vector(g_slice_width-1 downto 0)
);
end entity mux;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
architecture struct of mux is
signal data : std_logic_vector(g_slice_width * 2**g_sel_width-1 downto 0);
type t_std_logic_slice_array is array (natural range <>) of std_logic_vector(g_slice_width-1 downto 0);
signal mux_in : t_std_logic_slice_array (2**g_sel_width-1 downto 0);
begin
p_extend: process(d_i)
begin
for i in 0 to g_slice_width * 2**g_sel_width-1 loop
if i+g_start_point<g_data_width then
data(i) <= d_i(i+g_start_point);
else
data(i) <= '0';
end if;
end loop;
end process;
p_create_array: process (data)
begin
for i in 0 to 2**g_sel_width-1 loop
mux_in(i) <= data((i+1)*g_slice_width-1 downto i*g_slice_width);
end loop;
end process;
p_mux: d_o <= mux_in(to_integer(unsigned(sel_i)));
end architecture;

Dynamic Arrray Size in VHDL

I want to use dynamic range of array , so using "N" for converting an incoming vector signal to integer. Using the specifc incoming port "Size" gives me an error, while fixed vector produces perfect output.
architecture EXAMPLE of Computation is
signal size :std_logic_vector (7 downto 0);
process (ACLK, SLAVE_ARESETN) is
variable N: integer:=conv_integer ("00000111") ; ---WORKING
--variable N: integer:=conv_integer (size) ; -- Not working
type memory is array (N downto 0 ) of std_logic_vector (31 downto 0 );
variable RAM :memory;
Only reason to do this type of coding is send as much data as possible to FPGA .As I need to send Data from DDR to Custom IP via DMA in vivado may be more than 100 MB. so kindly guide me if I am trying to implement in wrong way as stated above.
You can't do that in VHDL. What kind of hardware would be generated by your code? If you don't know, the synthesizer won't either.
The way to do this kind of thing is to set N to the largest value you want to support, and use size in your logic to control your logic appropriately. It's difficult to give more pointers without more information, but as an example, you could use a counter to address your ram, and have it reset when it's greater than size.
Update
Here's a counter example. You have to make sure that size doesn't change while operating or it will fall into an unknown state. A real design should have reset states to ensure correct behaviour.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity example is
port (
clk : std_logic;
rst : in std_logic;
size : in unsigned(7 downto 0);
wr : in std_logic;
din : in std_logic_vector(31 downto 0)
);
end entity;
architecture rtl of example is
signal counter : unsigned(7 downto 0);
type ram_t is array(0 to 255) of std_logic_vector(31 downto 0);
signal ram : ram_t;
begin
RAM_WR: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
counter <= (others => '0');
else
if wr = '1' then
ram(to_integer(counter)) <= din;
if counter = size then
counter <= (others => '0');
else
counter <= counter + 1;
end if;
end if;
end if;
end if;
end process RAM_WR;
end architecture rtl;
I believe you can only have a generic an array constraint in a process. Otherwise, the compiler cannot elaborate.
In a function or procedure, you can have truly variable array bounds.

How to correctly storage registers in an FPGA

I need to write in VHDL a program that initialize a sensor registers using i2c. My problem is to write an efficent program that don't waste all FPGA space. The number of registers I need to storage are 400 register composed by 8bit address and 8 bit data.
Program I write is:
entity i2cReg is
port (
RegSel : in std_logic;
Address : out std_logic_vector (15 downto 0);
Data : out std_logic_vector (7 downto 0);
RegStop : out std_logic;
ModuleEN : in std_logic
);
end i2cReg;
architecture i2cReg_archi of i2cReg is
signal counter :integer := 0;
begin
process(RegSel, ModuleEN)
begin
if ModuleEN = '0' then
Address <= x"10";
Data <= x"10";
RegStop <= '0';
counter <= 0;
elsif rising_edge(RegSel) then
counter <= counter + 1;
case counter is
when 0 =>
Address <= x"10";
Data <= x"10";
when 1 =>
Address <= x"10";
Data <= x"10";
when 2 =>
Address <= x"10";
Data <= x"10";
when 3 =>
Address <= x"10";
Data <= x"10";
when 4 =>
Address <= x"10";
Data <= x"10";
when 5 =>
Address <= x"10";
Data <= x"10";
when 400 =>
RegStop <= '1';
when others =>
end case;
end if;
end process;
end i2cReg_archi;
There is a way to optimize this code? Or you advice me to use an external eeprom?
Yaro - you have not mentioned the FPGA vendor or the device but the answer is: Yes, you can initialize ROM in an FPGA so that the values you need are present after configuration. Both Altera and Xilinx allow you to provide a file with the initial values during synthesis.
Kevin.
Initialized BlockRAM is in general the correct solution if you are on Xilinx or Altera.
But there are exceptions where a logic implementation can also work:
For example, if the content of your 400 registers has repeating patterns or many registers with the same value (like in your example code). In this case, if you implement it as logic, your synthesis tool will optimize it heavily. You may actually end up with a very small amount of logic if the register content is very repeating. It is sometimes also possible to improve the optimization by clever reordering of the registers.
100-200 logic cells is often considered "cheaper" than a BlockRAM. But it depends mostly on which resource is most scarce in your particular application.
Regardless if you go for initialized BlockRAM or logic, I would suggest that you model it as an array of std_logic_vector instead of using case/when.
The "array of std_logic_vector" approach is platform independent, and can be synthesized to either BlockRAM or logic. Your synthesis tool will usually try to automatically select the best implementation. But you can also force the sythesis tool to use either logic or BlockRAM by using vendor specific attributes. (I can't tell you which attributes to use, since I don't know which platform you are using)
Example:
type REG_TYPE is array (0 to 3) of std_logic_vector(15 downto 0);
constant REGISTERS : REG_TYPE :=
(x"0000",
x"0001",
x"0010",
x"0100");
And in your process, something like:
if rising_edge(RegSel) then
Address <= REGISTERS( counter )(15 downto 8);
Data <= REGISTERS( counter )( 7 downto 0);
end if;

Shift Right And Shift Left (SLL/SRL)

so, I'm developing an ALU for MIPS architecture and I'm trying to make a shift left and a shift right so that the ALU can shift any amount of bits.
the Idea I had is to convert the shift value to an integer and select the piece of the entry that'll be on the result(the integer is stored in X) but Quartus doesn't accept a variable value, only constants.
What could I do to make this?
(Cases are on lines "WHEN "1000" =>..." and "WHEN "1001" =>...")
Thanks.
PROCESS ( ALU_ctl, Ainput, Binput, X )
BEGIN
-- Select ALU operation
--ALU_output_mux <= X"00000000"; --padrao
CASE ALU_ctl IS
WHEN "1000" => ALU_output_mux(31 DOWNTO X) <= (Ainput( 31-X DOWNTO 0 ));
WHEN "1001" => ALU_output_mux(31-X DOWNTO 0) <= (Ainput( 31 DOWNTO X ));
WHEN OTHERS => ALU_output_mux <= X"00000000";
END CASE;
END PROCESS;
If Quartus doesn't like it you have two choices:
Write it some way that Quartus does like - you're trying to infer a barrel shifter, so you could write one out longhand and then instantiate that. Potentially expensive in time
Get a different synthesizer that will accept it. Potentially expensive in money.
I have had issues with this in Quartus as well, although your code also has some implicit latches (you are not assigning all bits of the output in your two shift cases).
The work-around I use is to define an intermediate array with all the possible results, then select one of those results using your selector. In your case, something like the following:
subtype DWORD_T is std_logic_vector( 31 downto 0);
type DWORD_A is array (natural range <>) of DWORD_T;
signal shift_L : DWORD_A(31 downto 0);
signal shift_R : DWORD_A(31 downto 0);
signal zero : DWORD_T;
...
zero <= (others=>'0');
process (Ainput)
begin
for index in Ainput'range loop
shift_L(index) <= Ainput(31 - index downto 0) & zero(index - 1 downto 0);
shift_R(index) <= zero(index - 1 downto 0) & Ainput(31 downto index);
end loop;
end process;
ALR_output_mux <= shift_L(to_integer(X)) when ALU_ctl="1000",
shift_R(to_integer(X)) when ALU_ctl="1001",
(others=>'0') when others;
You could work around this by using generate or for to create each shift/rotate level, or you can use the standard functions ({shift,rotate}_{left,right}) for shifting and rotating.

Making a 4-bit ALU from several 1-bit ALUs

I'm trying to combine several 1 bit ALUs into a 4 bit ALU. I am confused about how to actually do this in VHDL. Here is the code for the 1bit ALU that I am using:
component alu1 -- define the 1 bit alu component
port(a, b: std_logic_vector(1 downto 0);
m: in std_logic_vector(1 downto 0);
result: out std_logic_vector(1 downto 0));
end alu1;
architecture behv1 of alu1 is
begin
process(a, b, m)
begin
case m is
when "00" =>
result <= a + b;
when "01" =>
result <= a + (not b) + 1;
when "10" =>
result <= a and b;
when "11" =>
result <= a or b;
end case
end process
end behv1
I am assuming I define alu1 as a component of the larger entity alu4, but how can I tie them together?
Interesting you would even ask that question. VHDL synthesizers are quite capable of inferring any adder you like. You can just type what you need:
use ieee.numeric_std.all;
...
signal r : unsigned(3 downto 0);
signal a : unsigned(2 downto 0);
signal b : unsigned(2 downto 0);
signal c : unsigned(2 downto 0);
...
r <= a + b + c;
Then you can slice r to fit your needs:
result <= std_logic_vector(r(2 downto 0));
You can't (easily) string together these 1-bit ALUs into a functional multiple bit version. There is no way to handle the carry in/out needed for your add and subtract modes to work properly (the bitwise and & or should work OK, however).
Ignoring the carry issue for the moment, you would typically just setup a for generate loop and instantiate multiple copies of your bitwise logic, possibly special casing the first and/or last elements, ie:
MyLabel : for bitindex in 0 to 3 generate
begin
alu_x4 : entity work.alu1
port map (
a => input_a(bitindex),
b => input_b(bitindex),
m => mode,
result => result_x4(bitindex) );
end generate;

Resources