vhdl ram module and use of registers - vhdl

Ok another question in VHDL. Below is my code. Suppose that I want my input stored in ram. And lets say I want to add two of them. (do not give emphasis on it, later on it will be replaced). This is my code:
library IEEE;
use IEEE.STD_LOGIC_1164.all;
USE ieee.numeric_std.ALL;
use work.my_package.all;
entity landmark_1 is
generic
(data_length :integer := 8;
address_length:integer:=3 );
port ( clk:in std_logic;
vin:in std_logic;
rst:in std_logic;
flag: in std_logic;
din: in signed(data_length -1 downto 0);
done: out std_logic
);
end landmark_1;
architecture TB_ARCHITECTURE of landmark_1 is
component ram IS
generic
(
ADDRESS_WIDTH : integer := 4;
DATA_WIDTH : integer := 8
);
port
(
clock : IN std_logic;
data : IN signed(DATA_WIDTH - 1 DOWNTO 0);
write_address : IN unsigned(ADDRESS_WIDTH - 1 DOWNTO 0);
read_address : IN unsigned(ADDRESS_WIDTH - 1 DOWNTO 0);
we : IN std_logic;
q : OUT signed(DATA_WIDTH - 1 DOWNTO 0)
);
end component;
signal inp1,inp2: matrix1_t(0 to address_length);
signal out_temp: signed(data_length-1 downto 0);
signal k:unsigned(address_length-1 downto 0);
signal i: integer range 0 to 100:=0;
begin
read1:ram generic map( ADDRESS_WIDTH=>address_length, DATA_WIDTH=>data_length) port map (clk,din,k,k,vin,out_temp);
inp1(i)<=out_temp;
process (clk)
begin
if (clk'event and clk='1') then
if (flag='1') then out_temp<=inp1(0)+inp1(1);
end if;
end if;
end process ;
end TB_ARCHITECTURE;
Below are my questions:
Why to use that ram and not just do inp(i)<=din; . I think that it will help synthesizer understand that this is a ram, but what else? Moreover, do I need inp1 registers. And if I a going to use them, again why use ram as an intermediate?
If inp1 is unnecessary, how I am going to fetch these two elements in my process? I mean I need something like ram(address1)+ram(address2), right?
Below is my ram_code:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY ram IS
GENERIC
(
ADDRESS_WIDTH : integer := 4;
DATA_WIDTH : integer := 8
);
PORT
(
clock : IN std_logic;
data : IN signed(DATA_WIDTH - 1 DOWNTO 0);
write_address : IN unsigned(ADDRESS_WIDTH - 1 DOWNTO 0);
read_address : IN unsigned(ADDRESS_WIDTH - 1 DOWNTO 0);
we : IN std_logic;
q : OUT signed(DATA_WIDTH - 1 DOWNTO 0)
);
END ram;
ARCHITECTURE rtl OF ram IS
TYPE RAM IS ARRAY(0 TO 2 ** ADDRESS_WIDTH - 1) OF signed(DATA_WIDTH - 1 DOWNTO 0);
SIGNAL ram_block : RAM;
BEGIN
PROCESS (clock)
BEGIN
IF (clock'event AND clock = '1') THEN
IF (we = '1') THEN
ram_block(to_integer(unsigned(write_address))) <= data;
END IF;
q <= ram_block(to_integer(unsigned(read_address)));
END IF;
END PROCESS;
END rtl;
3.can anyone tell me why the q (output) is estimated one clock later?
EDIT: To sum up,I was told that I should use a ram and this is my implementation. The question is what I have gained by changing my inp1(i)<=din; when I inserted the ram model. And there fore how can I use it? (before using the ram I waws just wrote inp1(i)+inp2(i+1) for example).
EDIT2: PACKAGE FOR TYPES.
library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;
package my_package is
type matrix1_t is array(integer range<>) of signed(7 downto 0);
type big_matrix is array(integer range<>) of signed(23 downto 0);
type matrix2d is array (integer range<>) of big_matrix(0 to 3);
end my_package;

Why to use that ram and not just do inp(i)<=din;
In real world designs you use RAMs to store large amounts of data because they are smaller (physically, on the chip) that arrays of flip flops. During the synthesis process the RAM is replaced by one from your vendor's library. This RAM looks rather small, but I'm guessing you've been told to use one as an exercise.
Moreover, do I need inp1 registers. And if I a going to use them, again why use ram as an intermediate?
I'm not quite sure what imp1 is, as I don't know what a matrix_t is, but I'm guessing it's a register version of the RAM. In which case it's redundant.
If inp1 is unnecessary, how I am going to fetch these two elements in my process? I mean I need something like ram(address1)+ram(address2), right?
...and there's the real issue. You need to ask yourself 'If you can't read more that one thing in a cycle, how do you add two numbers?'
can anyone tell me why the q (output) is estimated one clock later?
Because that's how RAMs work. You apply an address in one cycle, and the data appears some cycles later (normally one, but not always)
These are real issues that you'll face in real designs. RAMs are necessary because of their smaller size. You need to know the issues that surround using them and how to work with them.

You aren't setting i to anything other than 0 so you will only ever assign to inp1(0).
Assuming the RAM is there to store many values, and you need to read two of them every clock cycle in order to do your adding, you need 2 RAM blocks (or a single dual-ported one) and then put the two addresses into those RAM blocks. The next cycle the two values you want will appear on the data outputs and you can sum them.
The clock-cycle delay you observe is in the nature of synchronous RAMs (which is what most FPGAs have for their "large storage") - some can create smaller asynchronous RAMs where the data appears a short delay after the read address changes, completely asynchronously to the clock.

I assume you're trying to represent a RAM with the signal inp1. However, the signal that represents the RAM is ram_block in your ram entity.
You are using the entity in a strange way because you connect the signal k as read and write address. There are two problems with that. Firstly, k doesn't seem to be driven anywhere in your design. Secondly, you probably don't want the two addresses to be the same.
I assume you want to write some values to the RAM and at the same time read two values and add them up. I suggest you use a process that sets a write address and a process that sets a read address. You also need at least one register with the width of the RAM output. The first value you read from the RAM is stored in that register. Then you add the value in that register and the second value read from the RAM and store the result in a different register.

Related

Add and assign to a signal in VHDL

I am developing a 10 point moving average filter for an assignment. I am taking small steps so that I can be sure each stage of my code is working. My first step is to take an input which is a standard logic vector (5 bits) and convert it to a signal of type integer for processing before converting back to a standard logic vector for output. My first block of code is:
library IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;
entity AveFilter is
port( CLK : in STD_LOGIC;
RST : in STD_LOGIC;
ADC_In : in STD_LOGIC_VECTOR ( 4 downto 0);
AveOut : out STD_LOGIC_VECTOR ( 4 downto 0)
);
end AveFilter;
architecture Behavioral of AveFilter is
signal adc_sum : integer := 0;
type Circ_Buf is array (0 to 9) of STD_LOGIC_VECTOR (4 downto 0);
signal ave_buf : Circ_Buf;
begin
process (CLK, RST, ADC_In)
variable idx : integer := 5;
begin
ave_buf(0) <= ADC_In;
adc_sum <= to_integer(unsigned(ave_buf(0)));
AveOut <= std_LOGIC_VECTOR(to_unsigned(adc_sum, AveOut'length));
end process;
end architecture;
The above code simply takes the input value and assigns to the output; I have tested this with modelsim and it works as expected. I can also assign various hard coded values to adc_sum and they also apear on the out put as expected.
The problem I have is when I modify the code so that the current adc input is added to the previous value of adc_sum and then stored in adc_sum ie by doing this:
adc_sum <= adc_sum + to_integer(unsigned(ave_buf(0)));
When I view AveOut in model sim the values are always XXXX. I have looked at some VHDL examples and it looks like and I beleive that I should be able to perorm the above operation. Could someone please give me a clue as to what I'm missing here?
Thanks
Andrew
ave_buf is probably undefined at the beginning. Try initializing it. If this works, you should also implement reset on it. Also, you should take action on rising edge of the clock. And ADC_In is unnecessary in the sensitivity list.

Dynamic Arrray Size in VHDL

I want to use dynamic range of array , so using "N" for converting an incoming vector signal to integer. Using the specifc incoming port "Size" gives me an error, while fixed vector produces perfect output.
architecture EXAMPLE of Computation is
signal size :std_logic_vector (7 downto 0);
process (ACLK, SLAVE_ARESETN) is
variable N: integer:=conv_integer ("00000111") ; ---WORKING
--variable N: integer:=conv_integer (size) ; -- Not working
type memory is array (N downto 0 ) of std_logic_vector (31 downto 0 );
variable RAM :memory;
Only reason to do this type of coding is send as much data as possible to FPGA .As I need to send Data from DDR to Custom IP via DMA in vivado may be more than 100 MB. so kindly guide me if I am trying to implement in wrong way as stated above.
You can't do that in VHDL. What kind of hardware would be generated by your code? If you don't know, the synthesizer won't either.
The way to do this kind of thing is to set N to the largest value you want to support, and use size in your logic to control your logic appropriately. It's difficult to give more pointers without more information, but as an example, you could use a counter to address your ram, and have it reset when it's greater than size.
Update
Here's a counter example. You have to make sure that size doesn't change while operating or it will fall into an unknown state. A real design should have reset states to ensure correct behaviour.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity example is
port (
clk : std_logic;
rst : in std_logic;
size : in unsigned(7 downto 0);
wr : in std_logic;
din : in std_logic_vector(31 downto 0)
);
end entity;
architecture rtl of example is
signal counter : unsigned(7 downto 0);
type ram_t is array(0 to 255) of std_logic_vector(31 downto 0);
signal ram : ram_t;
begin
RAM_WR: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
counter <= (others => '0');
else
if wr = '1' then
ram(to_integer(counter)) <= din;
if counter = size then
counter <= (others => '0');
else
counter <= counter + 1;
end if;
end if;
end if;
end if;
end process RAM_WR;
end architecture rtl;
I believe you can only have a generic an array constraint in a process. Otherwise, the compiler cannot elaborate.
In a function or procedure, you can have truly variable array bounds.

Small change in VHDL register file results in huge difference in total logical elements

I am new to VHDL and one of my assignments was to create an 8-bit register file. I noticed that by changing a single line of code, I could significantly increase or decrease the total number of logical elements. I am trying to understand why this causes such a significant change.
When enable is high, the register file stores the value of dataIn in the location of selectWrite. dataOut displays the value stored in the location of selectRead.
If dataOut <= entry(readIndex); is placed inside of process(clock), the total number of logical elements used is:
Total logical elements: 9/33,216 ( < 1% )
Total combinatorial functions 9/33,216 ( < 1% )
Dedicated logic registers 0/33,216 ( 0% )
However, if dataOut <= entry(readIndex); is placed outside of process(clock) thousands more logical elements are used:
Total logical elements: 2,672/33,216 ( 8% )
Total combinatorial functions 1,656/33,216 ( 5% )
Dedicated logic registers 2,048/33,216 ( 6% )
I understand that when placed inside of process(clock), dataOut will only change on the clock edge, and when placed outside of process(clock), dataOut will change unpredictably.
Why does this change cause so many more logic elements to be used?
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity RegisterFile is
port(
clock : in std_logic;
reset : in std_logic;
dataIn : in std_logic_vector(7 downto 0);
enable : in std_logic;
selectRead : in std_logic_vector(7 downto 0);
selectWrite : in std_logic_vector(7 downto 0);
dataOut : out std_logic_vector(7 downto 0)
);
end RegisterFile;
architecture RegisterFileArchitecture of RegisterFile is
type RegisterEntry is array (0 to 255) of std_logic_vector(7 downto 0);
signal entry : RegisterEntry;
signal readIndex : integer;
signal writeIndex : integer;
begin
-- Update read/write indices
readIndex <= to_integer(unsigned(selectRead));
writeIndex <= to_integer(unsigned(selectWrite));
process(clock)
begin
if (rising_edge(clock)) then
-- Update selected data
dataOut <= entry(readIndex);
if (reset = '1') then
entry(writeIndex) <= "00000000";
elsif (enable = '1') then
entry(writeIndex) <= dataIn;
end if;
end if;
end process;
end RegisterFileArchitecture;
You need to study the FPGA architecture you are using. When you have a large memory to implement, you most likely want to use the devices dedicated RAM blocks, called Block RAM in Xilinx. Block RAM has a specific structure - especially with respect to reading and writing with a clock edge vs Asynchronously (combinationaly). If your code matches it, it will use the Block RAM and very little other logic. If your code does not match a Block RAM, then you will use logic cells instead.
Look at your report further and see what it reports in each case about block RAM usage. Look at Xilinx's documentation about the structure of Block RAM.

Accessing 2 elements of the same array in VHDL

I am trying to assign 2 values from 2 different addresses in my array in VHDL, but somehow they always return to me a wrong value (most of the time, zero). I tested it with only 1 address and 1 data output it returned the correct value.
architecture Behavioral of registerFile is
type reg_type is array (31 downto 0) of std_logic_vector (31 downto 0);
signal REG : reg_type := (x"00000031", x"00000030", x"00000029", x"00000028", x"00000027", x"00000026", x"00000025", x"00000024", x"00000023", x"00000022", x"00000021", x"00000020",x"00000019",x"00000018", x"00000017", x"00000016", x"00000015", x"00000014", x"00000013", x"00000012", x"00000011", x"00000010", x"00000009", x"00000008", x"00000007",x"00000006", x"00000005", x"00000004", x"00000003", x"00000004", x"00000001", x"00000000");
begin
process(clk)
begin
if clk'event and clk='1' then
if ENABLE = '1' then
if readReg = '1' then -- read from register
DATAone <= REG(conv_integer(ADDRone));
DATAtwo <= REG(conv_integer(ADDRtwo));
else
REG(conv_integer(ADDRone)) <= DATAone;
REG(conv_integer(ADDRtwo)) <= DATAtwo;
end if;
end if;
end if;
end process;
end Behavioral;
Would appreciate some help, I tried googling but it's all either multidimensional arrays or only accessing 1 element at a time.
Thanks.
I'm not sure that this is synthesizable in most fabric. You could create two copies of the reg array and index into each of them.
It seems like you are trying to implement a quad-port memory. Anyway, even if your register file is not exactly a 4-port memory, it probably can be implemented around one.
Altera has an example of such a memory in their Advanced Synthesis Cookbook. The picture below shows the relevant part:
If use the Altera example files, it will instantiate Altera primitives, and use FPGA block RAM for storage. If you are concerned about portability, or you just want to look at some VHDL code that does what you want, check the example below. It implements roughly the same circuit shown in the figure, and it will most likely be synthesized as distributed memory in the FPGA.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
-- Quad-port RAM with 2 read ports 2 write ports. The design uses 2 memory blocks
-- (MAIN_MEMORY and SHADOW_MEMORY) to allow for simultaneous writes. Port A writes to
-- main memory, Port B writes to shadow memory. On a read from either port, data is
-- read from the memory block that was most recently written at the given position.
entity quad_port_ram is
generic (
ADDRESS_WIDTH: natural := 5;
DATA_WIDTH: natural := 32
);
port (
clock: in std_logic;
read_addr_a: in natural range 0 to 2**ADDRESS_WIDTH-1;
read_data_a: out std_logic_vector(DATA_WIDTH-1 downto 0);
write_addr_a: in natural range 0 to 2**ADDRESS_WIDTH-1;
write_data_a: in std_logic_vector(DATA_WIDTH-1 downto 0);
write_enable_a: in std_logic;
read_addr_b: in natural range 0 to 2**ADDRESS_WIDTH-1;
read_data_b: out std_logic_vector(DATA_WIDTH-1 downto 0);
write_addr_b: in natural range 0 to 2**ADDRESS_WIDTH-1;
write_data_b: in std_logic_vector(DATA_WIDTH-1 downto 0);
write_enable_b: in std_logic
);
end;
architecture rtl of quad_port_ram is
type memory_type is (MAIN_MEMORY, SHADOW_MEMORY);
type memory_type_array is array (natural range <>) of memory_type;
-- Keep track of which memory has the most recently written data for each address
signal most_recent_port_for_address: memory_type_array(0 to 2**ADDRESS_WIDTH-1);
type memory_array is array (0 to 2**ADDRESS_WIDTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
type dual_memory_array is array (memory_type) of memory_array;
-- Store the actual memory bits. Access like this:
-- memory_data(memory_type)(address)(bit_position)
signal memory_data: dual_memory_array;
-- Auxiliary signals to decide where to read the data from (main or shadow)
signal most_recent_port_for_addr_a, most_recent_port_for_addr_b: memory_type;
begin
process (clock) begin
if rising_edge(clock) then
if write_enable_a then
memory_data(MAIN_MEMORY)(write_addr_a) <= write_data_a;
most_recent_port_for_address(write_addr_a) <= MAIN_MEMORY;
end if;
if write_enable_b then
if (write_enable_a = '0') or (write_addr_a /= write_addr_b) then
memory_data(SHADOW_MEMORY)(write_addr_b) <= write_data_b;
most_recent_port_for_address(write_addr_b) <= SHADOW_MEMORY;
end if;
end if;
end if;
end process;
most_recent_port_for_addr_a <= most_recent_port_for_address(read_addr_a);
most_recent_port_for_addr_b <= most_recent_port_for_address(read_addr_b);
read_data_a <= memory_data(most_recent_port_for_addr_a)(read_addr_a);
read_data_b <= memory_data(most_recent_port_for_addr_b)(read_addr_b);
end;

Xilinx VHDL Multicycle constraints

I have some code that's running on a Xilinx Spartan 6, and it currently meets timing. However, I'd like to change it so that I use fewer registers.
signal response_ipv4_checksum : std_logic_vector(15 downto 0);
signal response_ipv4_checksum_1 : std_logic_vector(15 downto 0);
signal response_ipv4_checksum_2 : std_logic_vector(15 downto 0);
signal response_ipv4_checksum_3 : std_logic_vector(15 downto 0);
…
process (clk)
begin
if rising_edge(clk) then
response_ipv4_checksum_3 <= utility.ones_complement_sum(x"4622", config.source_ip(31 downto 16));
response_ipv4_checksum_2 <= utility.ones_complement_sum(response_ipv4_checksum_3, config.source_ip(15 downto 8));
response_ipv4_checksum_1 <= utility.ones_complement_sum(response_ipv4_checksum_2, response_group(31 downto 16));
response_ipv4_checksum <= utility.ones_complement_sum(response_ipv4_checksum_1, response_group(15 downto 0));
end if;
end process;
Currently, to meet timing, I need to split up the additions over multiple cycles. However, I have 20 cycles to actually compute this value, during which time the config value can't change.
Is there some attribute I can use (preferred) or line in the constraints (ucf) file that I can use so that I could simply write the same thing, but use no registers?
Just for a bit of extra code, in my UCF, I already have a timespec that looks like this:
NET pin_phy_rxclk TNM_NET = "PIN_PHY_RXCLK";
TIMESPEC "TS_PIN_PHY_RXCLK" = PERIOD "PIN_PHY_RXCLK" 8ns HIGH 50%;
I think you need a FROM:TO constraint:
TIMESPEC TSname=FROM “group1” TO “group2” value;
where value can be based on another timespec, like TS_CLK*4
So you'd adjust your process to only have flipflops on the output signals, create a timegroup with the inputs in it, another with the outputs in it, and use those for group1 and group2 .
So, group 1 would contain all the input nets /path/to/your/instance/config.source_ip and /path/to/your/instance/response_group. It might be easier to create a vector input to the entity and wire up the config/response_group signals outside of it. Then you can just use /path/to/your/instance/name_of_input_signals.
Group 2 would contain /path/to/your/instance/response_ipv4_checksum.
And, as you comment, you can use TS_PIN_PHY_RXCLK*4 (assuming it is a time, not a frequency - otherwise you have to do a /4 I think)

Resources