Accessing 2 elements of the same array in VHDL - vhdl

I am trying to assign 2 values from 2 different addresses in my array in VHDL, but somehow they always return to me a wrong value (most of the time, zero). I tested it with only 1 address and 1 data output it returned the correct value.
architecture Behavioral of registerFile is
type reg_type is array (31 downto 0) of std_logic_vector (31 downto 0);
signal REG : reg_type := (x"00000031", x"00000030", x"00000029", x"00000028", x"00000027", x"00000026", x"00000025", x"00000024", x"00000023", x"00000022", x"00000021", x"00000020",x"00000019",x"00000018", x"00000017", x"00000016", x"00000015", x"00000014", x"00000013", x"00000012", x"00000011", x"00000010", x"00000009", x"00000008", x"00000007",x"00000006", x"00000005", x"00000004", x"00000003", x"00000004", x"00000001", x"00000000");
if clk'event and clk='1' then
if ENABLE = '1' then
if readReg = '1' then -- read from register
DATAone <= REG(conv_integer(ADDRone));
DATAtwo <= REG(conv_integer(ADDRtwo));
REG(conv_integer(ADDRone)) <= DATAone;
REG(conv_integer(ADDRtwo)) <= DATAtwo;
end if;
end if;
end if;
end process;
end Behavioral;
Would appreciate some help, I tried googling but it's all either multidimensional arrays or only accessing 1 element at a time.

I'm not sure that this is synthesizable in most fabric. You could create two copies of the reg array and index into each of them.

It seems like you are trying to implement a quad-port memory. Anyway, even if your register file is not exactly a 4-port memory, it probably can be implemented around one.
Altera has an example of such a memory in their Advanced Synthesis Cookbook. The picture below shows the relevant part:
If use the Altera example files, it will instantiate Altera primitives, and use FPGA block RAM for storage. If you are concerned about portability, or you just want to look at some VHDL code that does what you want, check the example below. It implements roughly the same circuit shown in the figure, and it will most likely be synthesized as distributed memory in the FPGA.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
-- Quad-port RAM with 2 read ports 2 write ports. The design uses 2 memory blocks
-- (MAIN_MEMORY and SHADOW_MEMORY) to allow for simultaneous writes. Port A writes to
-- main memory, Port B writes to shadow memory. On a read from either port, data is
-- read from the memory block that was most recently written at the given position.
entity quad_port_ram is
generic (
ADDRESS_WIDTH: natural := 5;
DATA_WIDTH: natural := 32
port (
clock: in std_logic;
read_addr_a: in natural range 0 to 2**ADDRESS_WIDTH-1;
read_data_a: out std_logic_vector(DATA_WIDTH-1 downto 0);
write_addr_a: in natural range 0 to 2**ADDRESS_WIDTH-1;
write_data_a: in std_logic_vector(DATA_WIDTH-1 downto 0);
write_enable_a: in std_logic;
read_addr_b: in natural range 0 to 2**ADDRESS_WIDTH-1;
read_data_b: out std_logic_vector(DATA_WIDTH-1 downto 0);
write_addr_b: in natural range 0 to 2**ADDRESS_WIDTH-1;
write_data_b: in std_logic_vector(DATA_WIDTH-1 downto 0);
write_enable_b: in std_logic
architecture rtl of quad_port_ram is
type memory_type is (MAIN_MEMORY, SHADOW_MEMORY);
type memory_type_array is array (natural range <>) of memory_type;
-- Keep track of which memory has the most recently written data for each address
signal most_recent_port_for_address: memory_type_array(0 to 2**ADDRESS_WIDTH-1);
type memory_array is array (0 to 2**ADDRESS_WIDTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
type dual_memory_array is array (memory_type) of memory_array;
-- Store the actual memory bits. Access like this:
-- memory_data(memory_type)(address)(bit_position)
signal memory_data: dual_memory_array;
-- Auxiliary signals to decide where to read the data from (main or shadow)
signal most_recent_port_for_addr_a, most_recent_port_for_addr_b: memory_type;
process (clock) begin
if rising_edge(clock) then
if write_enable_a then
memory_data(MAIN_MEMORY)(write_addr_a) <= write_data_a;
most_recent_port_for_address(write_addr_a) <= MAIN_MEMORY;
end if;
if write_enable_b then
if (write_enable_a = '0') or (write_addr_a /= write_addr_b) then
memory_data(SHADOW_MEMORY)(write_addr_b) <= write_data_b;
most_recent_port_for_address(write_addr_b) <= SHADOW_MEMORY;
end if;
end if;
end if;
end process;
most_recent_port_for_addr_a <= most_recent_port_for_address(read_addr_a);
most_recent_port_for_addr_b <= most_recent_port_for_address(read_addr_b);
read_data_a <= memory_data(most_recent_port_for_addr_a)(read_addr_a);
read_data_b <= memory_data(most_recent_port_for_addr_b)(read_addr_b);


VHDL: big slv array slicing indexed by integer (big mux)

I want to slice a std_logic_vector in VHDL obtaining parts of it of fixed dimensions.
The general problem is:
din N*M bits
dout M bits
sel clog2(N) bits
Expected behaviour in an example (pseudocode): input 16 bit, want to slice it in 4 subvectors of 4bit each.
signal in: std_logic_vector(N*M-1 downto 0);
signal sel: integer;
-- with sel = 0
output <= in(N-1:0);
--with sel = 1 output <= in(2N-1:N)
-- with sel = 2
output <= in(3N-1:2N)
--with sel = M-1
output <= in(M*N-1:(M-1)N)
I know a couples of way to do this, but I don't know which one is the best practice and give the best results in synthesis.
the entity
din: in std_logic_vector(15 downto 0);
dout: out std_logic_vector(3 downto 0);
sel: in std_logic_vecotor(1 downto 0)
case sel is
when "00" => dout <= din(3:0);
when "01" => dout <= din(7:4);
when "10" => dout <= din(11:8);
when "11" => dout <= din(15:12);
when others => ....`
It clearly implement a mux, but it's not generic at all and If the input gets big it's really hard to write and to codecover.
sel_int <= to_integer(unsigned(sel));
dout <= din(4*(sel_int+1) - 1 downto 4*sel_int);
Extremely easy to write and to mantain, BUT it can have problems when the input is not a power of 2. For example, if I want to slice a 24bit vector in chunks of 4, what happen when the integer conversion of sel brings to the index 7?
sel_int <= to_integer(unsigned(sel));
for i in 0 to 4 generate
din_slice(i) <= din(4*(i+1)-1 downto 4*i);
end generate dout <= din_slice(sel_int);
I'm searching a solution that is general enough to be used with various input/output relationships and safe enough to be synthesized consistently everytime.
The Case statement is the only one with the Others case (that feels really safe), the other solutions rely on the slv to integer conversion and indexing that feels really comfortable but not so reliable.
Which solution would you use?
practical usecase
I have a 250bit std_logic_vector and I need to select 10 contigous bits inside of it starting from a certain point from 0 to 239. How can I do that in a way that is good for synthesis?
There is another option that is accepted by tools that allow VHDL 2008 (which includes Vivado and Prime Pro). You can use an unconstrained 2d type from a package:
type slv_array_t is array(natural range <>) of std_logic_vector; --vhdl 2008 unconstrained array type
then you can simply select which port you want. And it is as generic as you like.
library ieee;
use ieee.std_logic_1164.all;
use work.my_pkg.all;
entity mux is
generic (
N : natural;
M : natural
port (
sel : in natural;
ip : in slv_array_t (N-1 downto 0)(M-1 downto 0);
op : out std_logic_vector (M-1 downto 0);
end entity;
architecture rtl of mux is
op <= ip(sel);
end architecture;
First you must extend the incoming data to be sure to have always as much bits as you need for connecting all multiplexer inputs (see the code below, process p_extend).
This will not create any logic at synthesis.
Second you must convert the resulting vector into an array, which you can access later by an index (see the code below, process p_create_array).
Again this will not create any logic at synthesis.
At last you must access this array by the select input signal (see the code below, process p_mux).
library ieee;
use ieee.std_logic_1164.all;
entity mux is
generic (
g_data_width : natural := 250;
g_slice_width : natural := 10;
g_sel_width : natural := 5;
g_start_point : natural := 27
port (
d_i : in std_logic_vector(g_data_width-1 downto 0);
sel_i : in std_logic_vector(g_sel_width-1 downto 0);
d_o : out std_logic_vector(g_slice_width-1 downto 0)
end entity mux;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
architecture struct of mux is
signal data : std_logic_vector(g_slice_width * 2**g_sel_width-1 downto 0);
type t_std_logic_slice_array is array (natural range <>) of std_logic_vector(g_slice_width-1 downto 0);
signal mux_in : t_std_logic_slice_array (2**g_sel_width-1 downto 0);
p_extend: process(d_i)
for i in 0 to g_slice_width * 2**g_sel_width-1 loop
if i+g_start_point<g_data_width then
data(i) <= d_i(i+g_start_point);
data(i) <= '0';
end if;
end loop;
end process;
p_create_array: process (data)
for i in 0 to 2**g_sel_width-1 loop
mux_in(i) <= data((i+1)*g_slice_width-1 downto i*g_slice_width);
end loop;
end process;
p_mux: d_o <= mux_in(to_integer(unsigned(sel_i)));
end architecture;

How to write std_logic_vector assignment with input-dependent range in VHDL?

I am trying to copy some part of a std_logic_vector into another, at a position (index) depending on an input. This can be synthesized in Vivado, but I want to use another tool (SymbiYosys, for formal verification. SymbiYosys can use Verific as frontend to process VHDL, but Verific does not accept this. Here is a small piece of code which reproduces the problem. Verific complains that the "left range bound is not constant". So, is there a workaround to make Verific accept such variable range assignments ?
I already found this post VHDL: slice a various part of an array which proposes to use a loop and to assign values bit per bit, but I would rather not change my code now that it works with Vivado. Also I think such a loop would impair code readability, and perhaps implementation efficiency. Therefore, I am looking for a different method (maybe a way to turn this error into a warning, or a less drastic code modification).
library IEEE;
use IEEE.STD_LOGIC_1164.all;
entity test is
clk : in std_logic;
prefix : in std_logic_vector( 8*8 -1 downto 0);
msgIn : in std_logic_vector(128*8 -1 downto 0);
msgLength : in integer range 1 to 128;
test_out : out std_logic_vector((128+8)*8 -1 downto 0)
end test;
architecture behav of test is
process (clk)
if rising_edge(clk) then
test_out <= (others => '0');
test_out((msgLength+8)*8 -1 downto msgLength*8) <= prefix;
test_out( msgLength *8 -1 downto 0) <= msgIn(msgLength*8 -1 downto 0);
end if;
end process;
end behav;
A bit of shifting should make it (if your tools support the srl and sll operators). First left-align your message (left shift), left-pad it with your prefix and, finally, right-shift it:
process (clk)
variable tmp1: std_logic_vector(128*8 -1 downto 0);
variable tmp2: std_logic_vector((128+8)*8 -1 downto 0);
if rising_edge(clk) then
tmp1 := msgIn sll (8 * (128 - msgLength)); -- left-align
tmp2 := prefix & tmp1; -- left-pad
test_out <= tmp2 srl (8 * (128 - msgLength)); -- right-shift
end if;
end process;
In case your tools do not support the srl and sll operators on std_logic_vector, try to work with bit_vector, instead. srl and sll have been introduced in the standard in 1993. Example:
process (clk)
variable tmp1: bit_vector(128*8 -1 downto 0);
variable tmp2: bit_vector((128+8)*8 -1 downto 0);
if rising_edge(clk) then
tmp1 := to_bitvector(msgIn) sll (8 * (128 - msgLength));
tmp2 := to_bitvector(prefix) & tmp1;
test_out <= to_stdlogicvector(tmp2 srl (8 * (128 - msgLength)));
end if;
end process;
The synthesis result may be huge and slow because this 1088 bits barrel shifter with 128 possible different shifts is a kind of monster.
If you have time (I mean several clock cycles) to do it, there are probably much smaller and more efficient solutions.

Read, then write RAM VHDL

in VHDL all the code lines are executed in a parallel way, since its a machine.
i want to create this RAM that reads a certain register from a ram block to the output and only 'afterwards' writes to the same register the input. my code goes like this:
architecture Behavioral of RAM is
type ram_t is array (0 to numOfRegs-1) of std_logic_vector (rLength-1 downto 0);
signal ram_s: ram_t;
signal loc : integer;
if(rising_edge(clk)) then
if(we='1') then
dataout <= ram_s(loc); -- reads the 'old' data to the output
ram_s(loc) <= datain; -- writes the 'new' data to the RAM
loc <= conv_integer(addr);
end if;
end if;
end process;
end Behavioral;
there is a similar case presented
so I'd like to ask, is my code works fine or is there need for tweaking like putting a delay of half clock cycle, and if so, how to implement it.
I'm very new to VHDL thanks for your patience and help.
ive add a testbench simulation below . as can be seen the dataout isnt working at all.
Your question doesn't present a Minimal, Verifiable and Complete example, lacking the ability to replicate your results.
One of the consequences of this is that answers can be ambiguous should there be one or more causes of the problem in portions of your code not shown.
Brian's comment that you aren't reading data when we is invalid is poignant and would be responsible for 'U's in the clock cycle left of your yellow marker in your waveform.
There's also the issue with loc being a signal. Signals are scheduled for update, and no update occurs while any process that is scheduled to resume in the current simulation cycle has not been resumed and suspended.
This means the integer version of your address is delayed and won't be seen in the process until the next rising edge.
Fixing loc by making it a variable as an alternative to pipelining datain and moving the dataout assignment are accomplished in the following changes to your RAM process:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all; -- standard package
entity ram is
generic (
ADDRLENGTH: natural := 8;
RLENGTH: natural := 16;
NUMOFREGS: natural := 256
port (
clk: in std_logic;
we: in std_logic;
addr: in std_logic_vector (ADDRLENGTH - 1 downto 0);
datain: in std_logic_vector (RLENGTH - 1 downto 0);
dataout: out std_logic_vector (RLENGTH - 1 downto 0)
end entity;
architecture behavioral of ram is
type ram_t is array (0 to NUMOFREGS - 1) of
std_logic_vector (RLENGTH - 1 downto 0);
signal ram_s: ram_t;
-- signal loc: integer; -- USE VARIABLE in process instead
variable loc: integer; -- MAKE loc variable so it's immediately available
if rising_edge(clk) then
loc := to_integer(unsigned(addr)); -- MOVED so READ works
if we = '1' then
-- dataout <= ram_s(loc); -- reads the 'old' data to the output
ram_s(loc) <= datain; -- writes the 'new' data to the ram
-- loc <= conv_integer(addr);
end if;
dataout <= ram_s(loc); -- MOVED reads the 'old' data to the output
end if;
end process;
end architecture behavioral;
There's also the liberty of filling in the entity declaration and converting from conv_integer using Synopsys's package std_logic_arith to to_integer in the IEEE's numeric_std package. With a -2008 compliant tool chain you could instead use IEEE's package numeric_std_unsigned and do away with the type conversion to unsigned.
Because the ram_test testbench was also not supplied a testbench was written to replicate your waveform display image:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ram_tb is
end entity;
architecture foo of ram_tb is
constant ADDRLENGTH: natural := 8;
constant RLENGTH: natural := 16;
constant NUMOFREGS: natural := 256;
signal clk: std_logic := '0';
signal we: std_logic := '1';
signal addr: std_logic_vector (ADDRLENGTH - 1 downto 0);
signal datain: std_logic_vector (RLENGTH - 1 downto 0);
signal dataout: std_logic_vector (RLENGTH - 1 downto 0);
entity work.ram
generic map (
port map (
clk => clk,
we => we,
addr => addr,
datain => datain,
dataout => dataout
if now = 500 ps then
wait for 200 ps;
wait for 100 ps;
end if;
clk <= not clk;
if now >= 1100 ps then
end if;
end process;
for i in 0 to 2 loop
addr <= std_logic_vector(to_unsigned (i, ADDRLENGTH));
case i is
when 0 =>
datain <= x"00FF";
when 1 =>
datain <= x"FF00";
when 2 =>
datain <= x"FFFF";
end case;
wait until falling_edge(clk);
if i = 1 then
we <= '0';
end if;
end loop;
for i in 1 to 2 loop
addr <= std_logic_vector(to_unsigned (i, ADDRLENGTH));
case i is
when 1 =>
datain <= x"FF00";
when 2 =>
datain <= x"FFFF";
end case;
wait until falling_edge(clk);
end loop;
end process;
end architecture;
And this produced:
Where the one written address that is subsequently read shows the correct data.
The simulator used does not present non-signals in a waveform dump (bounds in declarations are required to be static) and rst is not found in the portion of your design specification provided.
As noted previously there is no guarantee there isn't another issue with portions of your design specification or testbench not provided in your question.
The testbench shown is by no means comprehensive.

Dynamic Arrray Size in VHDL

I want to use dynamic range of array , so using "N" for converting an incoming vector signal to integer. Using the specifc incoming port "Size" gives me an error, while fixed vector produces perfect output.
architecture EXAMPLE of Computation is
signal size :std_logic_vector (7 downto 0);
process (ACLK, SLAVE_ARESETN) is
variable N: integer:=conv_integer ("00000111") ; ---WORKING
--variable N: integer:=conv_integer (size) ; -- Not working
type memory is array (N downto 0 ) of std_logic_vector (31 downto 0 );
variable RAM :memory;
Only reason to do this type of coding is send as much data as possible to FPGA .As I need to send Data from DDR to Custom IP via DMA in vivado may be more than 100 MB. so kindly guide me if I am trying to implement in wrong way as stated above.
You can't do that in VHDL. What kind of hardware would be generated by your code? If you don't know, the synthesizer won't either.
The way to do this kind of thing is to set N to the largest value you want to support, and use size in your logic to control your logic appropriately. It's difficult to give more pointers without more information, but as an example, you could use a counter to address your ram, and have it reset when it's greater than size.
Here's a counter example. You have to make sure that size doesn't change while operating or it will fall into an unknown state. A real design should have reset states to ensure correct behaviour.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity example is
port (
clk : std_logic;
rst : in std_logic;
size : in unsigned(7 downto 0);
wr : in std_logic;
din : in std_logic_vector(31 downto 0)
end entity;
architecture rtl of example is
signal counter : unsigned(7 downto 0);
type ram_t is array(0 to 255) of std_logic_vector(31 downto 0);
signal ram : ram_t;
RAM_WR: process(clk)
if rising_edge(clk) then
if rst = '1' then
counter <= (others => '0');
if wr = '1' then
ram(to_integer(counter)) <= din;
if counter = size then
counter <= (others => '0');
counter <= counter + 1;
end if;
end if;
end if;
end if;
end process RAM_WR;
end architecture rtl;
I believe you can only have a generic an array constraint in a process. Otherwise, the compiler cannot elaborate.
In a function or procedure, you can have truly variable array bounds.

How to correctly storage registers in an FPGA

I need to write in VHDL a program that initialize a sensor registers using i2c. My problem is to write an efficent program that don't waste all FPGA space. The number of registers I need to storage are 400 register composed by 8bit address and 8 bit data.
Program I write is:
entity i2cReg is
port (
RegSel : in std_logic;
Address : out std_logic_vector (15 downto 0);
Data : out std_logic_vector (7 downto 0);
RegStop : out std_logic;
ModuleEN : in std_logic
end i2cReg;
architecture i2cReg_archi of i2cReg is
signal counter :integer := 0;
process(RegSel, ModuleEN)
if ModuleEN = '0' then
Address <= x"10";
Data <= x"10";
RegStop <= '0';
counter <= 0;
elsif rising_edge(RegSel) then
counter <= counter + 1;
case counter is
when 0 =>
Address <= x"10";
Data <= x"10";
when 1 =>
Address <= x"10";
Data <= x"10";
when 2 =>
Address <= x"10";
Data <= x"10";
when 3 =>
Address <= x"10";
Data <= x"10";
when 4 =>
Address <= x"10";
Data <= x"10";
when 5 =>
Address <= x"10";
Data <= x"10";
when 400 =>
RegStop <= '1';
when others =>
end case;
end if;
end process;
end i2cReg_archi;
There is a way to optimize this code? Or you advice me to use an external eeprom?
Yaro - you have not mentioned the FPGA vendor or the device but the answer is: Yes, you can initialize ROM in an FPGA so that the values you need are present after configuration. Both Altera and Xilinx allow you to provide a file with the initial values during synthesis.
Initialized BlockRAM is in general the correct solution if you are on Xilinx or Altera.
But there are exceptions where a logic implementation can also work:
For example, if the content of your 400 registers has repeating patterns or many registers with the same value (like in your example code). In this case, if you implement it as logic, your synthesis tool will optimize it heavily. You may actually end up with a very small amount of logic if the register content is very repeating. It is sometimes also possible to improve the optimization by clever reordering of the registers.
100-200 logic cells is often considered "cheaper" than a BlockRAM. But it depends mostly on which resource is most scarce in your particular application.
Regardless if you go for initialized BlockRAM or logic, I would suggest that you model it as an array of std_logic_vector instead of using case/when.
The "array of std_logic_vector" approach is platform independent, and can be synthesized to either BlockRAM or logic. Your synthesis tool will usually try to automatically select the best implementation. But you can also force the sythesis tool to use either logic or BlockRAM by using vendor specific attributes. (I can't tell you which attributes to use, since I don't know which platform you are using)
type REG_TYPE is array (0 to 3) of std_logic_vector(15 downto 0);
constant REGISTERS : REG_TYPE :=
And in your process, something like:
if rising_edge(RegSel) then
Address <= REGISTERS( counter )(15 downto 8);
Data <= REGISTERS( counter )( 7 downto 0);
end if;
