How to correctly storage registers in an FPGA - vhdl

I need to write in VHDL a program that initialize a sensor registers using i2c. My problem is to write an efficent program that don't waste all FPGA space. The number of registers I need to storage are 400 register composed by 8bit address and 8 bit data.
Program I write is:
entity i2cReg is
port (
RegSel : in std_logic;
Address : out std_logic_vector (15 downto 0);
Data : out std_logic_vector (7 downto 0);
RegStop : out std_logic;
ModuleEN : in std_logic
);
end i2cReg;
architecture i2cReg_archi of i2cReg is
signal counter :integer := 0;
begin
process(RegSel, ModuleEN)
begin
if ModuleEN = '0' then
Address <= x"10";
Data <= x"10";
RegStop <= '0';
counter <= 0;
elsif rising_edge(RegSel) then
counter <= counter + 1;
case counter is
when 0 =>
Address <= x"10";
Data <= x"10";
when 1 =>
Address <= x"10";
Data <= x"10";
when 2 =>
Address <= x"10";
Data <= x"10";
when 3 =>
Address <= x"10";
Data <= x"10";
when 4 =>
Address <= x"10";
Data <= x"10";
when 5 =>
Address <= x"10";
Data <= x"10";
when 400 =>
RegStop <= '1';
when others =>
end case;
end if;
end process;
end i2cReg_archi;
There is a way to optimize this code? Or you advice me to use an external eeprom?

Yaro - you have not mentioned the FPGA vendor or the device but the answer is: Yes, you can initialize ROM in an FPGA so that the values you need are present after configuration. Both Altera and Xilinx allow you to provide a file with the initial values during synthesis.
Kevin.

Initialized BlockRAM is in general the correct solution if you are on Xilinx or Altera.
But there are exceptions where a logic implementation can also work:
For example, if the content of your 400 registers has repeating patterns or many registers with the same value (like in your example code). In this case, if you implement it as logic, your synthesis tool will optimize it heavily. You may actually end up with a very small amount of logic if the register content is very repeating. It is sometimes also possible to improve the optimization by clever reordering of the registers.
100-200 logic cells is often considered "cheaper" than a BlockRAM. But it depends mostly on which resource is most scarce in your particular application.
Regardless if you go for initialized BlockRAM or logic, I would suggest that you model it as an array of std_logic_vector instead of using case/when.
The "array of std_logic_vector" approach is platform independent, and can be synthesized to either BlockRAM or logic. Your synthesis tool will usually try to automatically select the best implementation. But you can also force the sythesis tool to use either logic or BlockRAM by using vendor specific attributes. (I can't tell you which attributes to use, since I don't know which platform you are using)
Example:
type REG_TYPE is array (0 to 3) of std_logic_vector(15 downto 0);
constant REGISTERS : REG_TYPE :=
(x"0000",
x"0001",
x"0010",
x"0100");
And in your process, something like:
if rising_edge(RegSel) then
Address <= REGISTERS( counter )(15 downto 8);
Data <= REGISTERS( counter )( 7 downto 0);
end if;

Related

Shift register with BlockRAM - XILINX [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm very new to VHDL targeting XILINX solutions. Reading XST manual (page 155) I see example to implement shift register on BlockRAM.
entity srl_512_bram is
generic (
LENGTH : integer := 512;
ADDRWIDTH : integer := 9;
WIDTH : integer := 8);
port (
CLK : in std_logic;
SHIFT_IN : in std_logic_vector(WIDTH-1 downto 0);
SHIFT_OUT : out std_logic_vector(WIDTH-1 downto 0));
end srl_512_bram;
architecture behavioral of srl_512_bram is
signal CNTR : std_logic_vector(ADDRWIDTH-1 downto 0);
signal SHIFT_TMP : std_logic_vector(WIDTH-1 downto 0);
type ram_type is array (0 to LENGTH-3) of std_logic_vector(WIDTH-1 downto 0);
signal RAM : ram_type := (others => (others => ’0’));
begin
counter : process (CLK)
begin
if CLK’event and CLK = ’1’ then
if CNTR = conv_std_logic_vector(LENGTH-3, ADDRWIDTH) then
CNTR <= (others => ’0’);
else
CNTR <= CNTR + ’1’;
end if;
end if;
end process counter;
memory : process (CLK)
begin
if CLK’event and CLK = ’1’ then
RAM(conv_integer(CNTR)) <= SHIFT_IN;
SHIFT_TMP <= RAM(conv_integer(CNTR));
SHIFT_OUT <= SHIFT_TMP;
end if;
end process memory;
end behavioral;
Few questions I find there:
how is it obvious that BlockRAM is/will be included in design (ie synthesis) ?
as two processes here work in parallel, which one will start first, knowing that both start on positive CLK edge ?
my perspective is that "memory" process doesn't provide shifting, but rather SHIFT_IN vector insertion at "current" RAM position (the one indexed with CNTR). Where is shifting in this code ?
how is it obvious that BlockRAM is/will be included in design (ie synthesis) ?
AR# 46515 from Xilinx says to try UG627. These references may not be exactly what you're using and possibly a few years dated, but the concepts in them are good. In UG627 have a look near page 170. There's some example VHDL explaining how BRAM is inferred, and it's very similar to what you have here.
as two processes here work in parallel, which one will start first, knowing that both start on positive CLK edge ?
Remember this VHDL turns into dedicated hardware, so both processes have their own circuitry on the FPGA and legitimately happen at the same time. When I was learning VHDL this tripped me up for quite a while - and if I'm honest it still gets me from time to time - so maybe the best approach here is actually to simulate the design and then try to rationalize what the VHDL is doing based on the waveforms.
Also, I don't want to confuse you more, but maybe this helps --> you could just as easily re-write the counter and memory processes as follows. In this case, the code is processed one line at a time (sequentially) as you may expect, but VERY IMPORTANT the signals are not physically updated until after the entire process has completed.
counter_and_memory_combined : process (CLK)
begin
if CLK’event and CLK = ’1’ then
-- From the counter process
if CNTR = conv_std_logic_vector(LENGTH-3, ADDRWIDTH) then
CNTR <= (others => ’0’);
else
CNTR <= CNTR + ’1’;
end if;
-- From the memory process
RAM(conv_integer(CNTR)) <= SHIFT_IN;
SHIFT_TMP <= RAM(conv_integer(CNTR));
SHIFT_OUT <= SHIFT_TMP;
end if;
end process memory;
Sometimes it's easier to look at VHDL when you combine like this, though I'm definitely NOT saying this is always the best approach. Sometimes it makes the code more cluttered.
my perspective is that "memory" process doesn't provide shifting, but rather SHIFT_IN vector insertion at "current" RAM position (the one indexed with CNTR). Where is shifting in this code ?
The code you posted might be clearer if SHIFT_IN was re-labeled DATA_IN and SHIFT_OUT to DATA_OUT (and SHIFT_TMP to DATA_TMP). So, "shifting" in this case means that the data SHIFT_IN gets stored into RAM, and the data that was already in RAM gets shifted out to SHIFT_OUT.

VHDL - converting from level sampling to edge triggered - an intuitive explanation?

I have the following code (a primitive "RS-232 signalling" transmitter)...
LIBRARY ieee;
USE ieee.std_logic_1164.all;
entity SerialTX is
port(
baud_clk : in std_logic;
data : in std_logic_vector(7 downto 0);
send : in std_logic;
serial_out : out std_logic := '0';
busy : out std_logic := '0'
);
end entity;
----------------------------------------
architecture behavioural of SerialTX is
constant IDLE_BITS : std_logic_vector(10 downto 0) := "00000000001";
signal shifter : std_logic_vector(10 downto 0) := IDLE_BITS;
signal shift : std_logic := '0';
signal internal_busy : std_logic := '0';
begin
-------- ALWAYS HAPPENING --------
serial_out <= shifter(0);
busy <= internal_busy;
internal_busy <= '1' when (shifter /= IDLE_BITS) else '0';
----------------------------------
shifting_handler:
process(baud_clk) is
begin
if rising_edge(baud_clk) then
if (send = '1') and (shifter = IDLE_BITS) then
shifter <= "11" & data & '0';
elsif (shifter /= IDLE_BITS) then
shifter <= '0' & shifter(10 downto 1); -- shifter >>= 1;
end if;
end if;
end process;
end architecture behavioural;
... it works well (in simulation) but has a limitation. The send signal (that causes a transmission to begin) has to be a '1' level for longer than at least one full cycle of the baud_clk in order for the transmitter to see it reliably.
I have been trying to find a way to convert this code so that it responds to the rising edge of the send signal instead of testing its level at the rising edge of baud_clk. I want to be able to respond to a send pulse less than 100ns in duration even when the baud_clk is running at a much slower rate (115200 hz for instance).
I've tried (naively) altering the process thus...
shifting_handler:
process(baud_clk) is
begin
if rising_edge(baud_clk) then
if (shifter /= IDLE_BITS) then
shifter <= '0' & shifter(10 downto 1); -- shifter >>= 1;
end if;
elsif rising_edge(send) and (shifter = IDLE_BITS) then
shifter <= "11" & data & '0';
end if;
end process;
Here I was hoping to change the logic to test for a rising edge on send when there isn't a rising edge on baud_clk.
I know that this is not a valid approach to the problem (the synthesizer moans of course) but I was hoping that someone could explain in simple terms why this cannot be done. What would happen if it was possible to use two edge detectors in a process? There is a concept here I cannot grasp and I always seem to end up writing the code in the same way and producing this problem. I'm fighting hard against years of ingrained software programming habits, which doesn't help much!
It sounds like send is asynchronous with respect to baud_clk. You therefore need to perform some form of clock domain crossing (CDC) in order to correctly implement your design, otherwise you will have a design that cannot pass timing and has the potential to not function correctly. CDC is a standard term that you should be able to find more information about in other questions, and elsewhere.
As you have found, you cannot have a design realised in real hardware if it has a process sensitive to edges on two different signals. There's no one 'right' way to do what you want, but here is one example that uses a simple 'toggle' CDC. This is very simple, but note that the design could miss sending a byte if one send request arrives before a previous byte has been transmitted. There will also be some delay introduced between assertion of the send signal, and the transmission starting. It's not clear if these issues matter in your system.
Create another process sensitive to send:
-- The initial state doesn't matter, but we want the design to work in simulation
signal send_toggle : std_logic := '0';
process(send)
begin
if (rising_edge(send)) then
send_toggle <= not send_toggle;
end if;
end process;
Now another process to synchronize this to the baud_clk domain. Use two cascaded registers to produce a design that is largely immune to any metastability (this is another standard term that you can look up) that can result from sampling a signal generated from a different clock domain:
signal send_toggle_r1 : std_logic;
signal send_toggle_r2 : std_logic;
process(baud_clk)
begin
if (rising_edge(baud_clk)) then
send_toggle_r1 <= send_toggle;
send_toggle_r2 <= send_toggle_r1;
end if;
end process;
The above is a very standard circuit block that you can use in many single-bit CDC scenarios.
Your transmit process can then register the send_toggle_r2 signal in order to look for a transition, in order to determine whether it should start sending. This signal is in the correct clock domain:
signal send_toggle_r3 : std_logic;
process(baud_clk) is
begin
if rising_edge(baud_clk) then
send_toggle_r3 <= send_toggle_r2;
if ((send_toggle_r3 /= send_toggle_r2) and (shifter = IDLE_BITS)) then
shifter <= "11" & data & '0';
elsif (shifter /= IDLE_BITS) then
shifter <= '0' & shifter(10 downto 1); -- shifter >>= 1;
end if;
end if;
end process;
Lastly, you will need to implement timing constraints to tell your tool chain not to worry about timing of the send_toggle_r1 register.
You might spot that if you are targeting hardware where the initial states of registers are random, you might get an erroneous byte transmission after the first few baud_clk cycles. To prevent this, you might choose to hold your baud_clk process in reset for some clock cycles after start up, but as I don't know if this is relevant for you, I won't detail this part.
This whole answer addresses your question directly, but my personal approach would be to use whatever higher-rate clock is generating your send signal to drive the entire design. The serial transmission would then in fact use the higher rate clock, with shifting enabled by a CDC > edge detector chain driven from the baud_clk. The bit timing would not be absolutely perfect, but this should not matter for a standard 'UART' scenario.

Dynamic Arrray Size in VHDL

I want to use dynamic range of array , so using "N" for converting an incoming vector signal to integer. Using the specifc incoming port "Size" gives me an error, while fixed vector produces perfect output.
architecture EXAMPLE of Computation is
signal size :std_logic_vector (7 downto 0);
process (ACLK, SLAVE_ARESETN) is
variable N: integer:=conv_integer ("00000111") ; ---WORKING
--variable N: integer:=conv_integer (size) ; -- Not working
type memory is array (N downto 0 ) of std_logic_vector (31 downto 0 );
variable RAM :memory;
Only reason to do this type of coding is send as much data as possible to FPGA .As I need to send Data from DDR to Custom IP via DMA in vivado may be more than 100 MB. so kindly guide me if I am trying to implement in wrong way as stated above.
You can't do that in VHDL. What kind of hardware would be generated by your code? If you don't know, the synthesizer won't either.
The way to do this kind of thing is to set N to the largest value you want to support, and use size in your logic to control your logic appropriately. It's difficult to give more pointers without more information, but as an example, you could use a counter to address your ram, and have it reset when it's greater than size.
Update
Here's a counter example. You have to make sure that size doesn't change while operating or it will fall into an unknown state. A real design should have reset states to ensure correct behaviour.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity example is
port (
clk : std_logic;
rst : in std_logic;
size : in unsigned(7 downto 0);
wr : in std_logic;
din : in std_logic_vector(31 downto 0)
);
end entity;
architecture rtl of example is
signal counter : unsigned(7 downto 0);
type ram_t is array(0 to 255) of std_logic_vector(31 downto 0);
signal ram : ram_t;
begin
RAM_WR: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
counter <= (others => '0');
else
if wr = '1' then
ram(to_integer(counter)) <= din;
if counter = size then
counter <= (others => '0');
else
counter <= counter + 1;
end if;
end if;
end if;
end if;
end process RAM_WR;
end architecture rtl;
I believe you can only have a generic an array constraint in a process. Otherwise, the compiler cannot elaborate.
In a function or procedure, you can have truly variable array bounds.

VHDL beginner - what's going wrong wrt to timing in this circuit?

I'm very new to VHDL and hardware design and was wondering if someone could tell me if my understanding of the following problem I ran into is right.
I've been working on a simple BCD-to-7 segment display driver for the Nexys4 board - this is my VHDL code (with the headers stripped).
entity BCDTo7SegDriver is
Port ( CLK : in STD_LOGIC;
VAL : in STD_LOGIC_VECTOR (31 downto 0);
ANODE : out STD_LOGIC_VECTOR (7 downto 0);
SEGMENT : out STD_LOGIC_VECTOR (6 downto 0));
function BCD_TO_DEC7(bcd : std_logic_vector(3 downto 0))
return std_logic_vector is
begin
case bcd is
when "0000" => return "1000000";
when "0001" => return "1111001";
when "0010" => return "0100100";
when "0011" => return "0110000";
when others => return "1111111";
end case;
end BCD_TO_DEC7;
end BCDTo7SegDriver;
architecture Behavioral of BCDTo7SegDriver is
signal cur_val : std_logic_vector(31 downto 0);
signal cur_anode : unsigned(7 downto 0) := "11111101";
signal cur_seg : std_logic_vector(6 downto 0) := "0000001";
begin
process (CLK, VAL, cur_anode, cur_seg)
begin
if rising_edge(CLK) then
cur_val <= VAL;
cur_anode <= cur_anode rol 1;
ANODE <= std_logic_vector(cur_anode);
SEGMENT <= cur_seg;
end if;
-- Decode segments
case cur_anode is
when "11111110" => cur_seg <= BCD_TO_DEC7(cur_val(3 downto 0));
when "11111101" => cur_seg <= BCD_TO_DEC7(cur_val(7 downto 4));
when "11111011" => cur_seg <= BCD_TO_DEC7(cur_val(11 downto 8));
when "11110111" => cur_seg <= BCD_TO_DEC7(cur_val(15 downto 12));
when "11101111" => cur_seg <= BCD_TO_DEC7(cur_val(19 downto 16));
when "11011111" => cur_seg <= BCD_TO_DEC7(cur_val(23 downto 20));
when "10111111" => cur_seg <= BCD_TO_DEC7(cur_val(27 downto 24));
when "01111111" => cur_seg <= BCD_TO_DEC7(cur_val(31 downto 28));
when others => cur_seg <= "0011111";
end case;
end process;
end Behavioral;
Now, at first I tried to naively drive this circuit from the board clock defined in the constraints file:
## Clock signal
##Bank = 35, Pin name = IO_L12P_T1_MRCC_35, Sch name = CLK100MHZ
set_property PACKAGE_PIN E3 [get_ports clk]
set_property IOSTANDARD LVCMOS33 [get_ports clk]
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clk]
This gave me what looked like almost garbage output on the seven-segment displays - it looked like every decoded digit was being superimposed onto every digit place. Basically if bits 3 downto 0 of the value being decoded were "0001", the display was showing 8 1s in a row instead of 00000001 (but not quite - the other segments were lit but appeared dimmer).
Slowing down the clock to something more reasonable did the trick and the circuit works how I expected it to.
When I look at what elaboration gives me (I'm using Vivado 2014.1), it gives me a circuit with VAL connected to 8 RTL_ROMs in parallel (each one decoding 4 bits of the input). The outputs from these ROMs are fed into an RTL_MUX and the value of cur_anode is being used as the selector. The output of the RTL_MUX feeds the cur_val register; the cur_val and cur_anode registers are then linked to the outputs.
So, with that in mind, which part of the circuit couldn't handle the clock rate? From what I've read I feel like this is related to timing constraints that I may need to add; am I thinking along the right track?
Did your timing report indicate that you had a timing problem? It looks to me like you were just rolling through the segment values extremely fast. No matter how well you design for higher clock speeds, you're rotating cur_anode every clock cycle, and therefore your display will change accordingly. If your clock is too fast, the display will change much faster than a human would be able to read it.
Some other suggestions:
You should split your single process into separate clocked and unclocked processes. It's not that what you're doing won't end up synthesizing (obviously), but it's unconventional, and may lead to unexpected results.
Your initialization on cur_seg won't really do anything, as it's always driven (combinationally) by your process. It's not a problem - just wanted to make sure you were aware.
Well there are two parts to this.
Your segments appeared so dimly because you are basically running them at a 1/8th duty cycle at a faster rate than the segments have time to react(every clock pulse you are changing which segment is lit up and then you stop driving it on the next pulse).
By increasing the period your segments got brighter by switching from a transient current (segments need time to ramp up) to a steady state current (longer period lets current go to desired levels when you drive the segments slower than their inherent driving frequency). Hence the brightness increase.
One other thing about your code. You may be aware of this, but when you latch with your clock there, the variable labeled cur_anode is advanced and actually represents the NEXT anode. You also latch ANODE and SEGMENT to the current anode and segment respectively. Just pointing out that the cur_anode may be a misnomer (and is confusing because its usually the NEXT one).
Keeping in mind Paul Seeb's and fru1bat's answers on clock speed, Paul's comment on NEXT anode, and fru1bat's suggestion on separating clocked and un-clocked processes as well as your noting that you had 8 ROMs, there are alternative architectures.
Your architecture with a ring counter for ANODE and multiple ROMs happens to be optimal for speed, which as both Paul and fru1bat note isn't needed. Instead you can optimize for area.
Because the clock speed is either external or controlled by the addition of an enable supplied periodically it isn't addressed in area optimization:
architecture foo of BCDTo7SegDriver is
signal digit: natural range 0 to 7; -- 3 bit binary counter
signal bcd: std_logic_vector (3 downto 0); -- input to ROM
begin
UNLABELED:
process (CLK)
begin
if rising_edge(CLK) then
if digit = 7 then -- integer/unsigned "+" result range
digit <= 0; -- not tied to digit range in simulation
else
digit <= digit + 1;
end if;
SEGMENT_REG:
SEGMENT <= BCD_TO_DEC7(bcd); -- single ROM look up
ANODE_REG:
for i in ANODE'range loop
if digit = i then
ANODE(i) <= '0';
else
ANODE(i) <= '1';
end if;
end loop;
end if;
end process;
BCD_MUX:
with digit select
bcd <= VAL(3 downto 0) when 0,
VAL(7 downto 4) when 1,
VAL(11 downto 8) when 2,
VAL(15 downto 12) when 3,
VAL(19 downto 16) when 4,
VAL(23 downto 20) when 5,
VAL(27 downto 24) when 6,
VAL(31 downto 28) when 7;
end architecture;
This trades off a 32 bit register (cur_val), an 8 bit ring counter (cur_anode) and seven copies of the ROM implied by function BCD_TO_DEC7 for a three bit binary counter.
In truth the argument over whether or not you should be using separate sequential (clocked) and combinatorial (non clocked) processes is somewhat reminiscent of Liliput and Blefuscu going to war over Endian-ness.
Separate processes generally execute a little more efficiently due to not sharing sensitivity lists. You could also note that all concurrent statements have process or block statement equivalents. There's also nothing in this design that can take particular advantage of using variables which can result in more efficient simulation while implying a single process. (Shared variables aren't supported by XST).
I haven't verified this will synthesize but after reading through the 14.1 version of the XST user guide think it should. If not you can convert digit to a std_logic_vector with a length of 3.
The + 1 for digit will get optimized, an incrementer is smaller than a full adder.

Accessing 2 elements of the same array in VHDL

I am trying to assign 2 values from 2 different addresses in my array in VHDL, but somehow they always return to me a wrong value (most of the time, zero). I tested it with only 1 address and 1 data output it returned the correct value.
architecture Behavioral of registerFile is
type reg_type is array (31 downto 0) of std_logic_vector (31 downto 0);
signal REG : reg_type := (x"00000031", x"00000030", x"00000029", x"00000028", x"00000027", x"00000026", x"00000025", x"00000024", x"00000023", x"00000022", x"00000021", x"00000020",x"00000019",x"00000018", x"00000017", x"00000016", x"00000015", x"00000014", x"00000013", x"00000012", x"00000011", x"00000010", x"00000009", x"00000008", x"00000007",x"00000006", x"00000005", x"00000004", x"00000003", x"00000004", x"00000001", x"00000000");
begin
process(clk)
begin
if clk'event and clk='1' then
if ENABLE = '1' then
if readReg = '1' then -- read from register
DATAone <= REG(conv_integer(ADDRone));
DATAtwo <= REG(conv_integer(ADDRtwo));
else
REG(conv_integer(ADDRone)) <= DATAone;
REG(conv_integer(ADDRtwo)) <= DATAtwo;
end if;
end if;
end if;
end process;
end Behavioral;
Would appreciate some help, I tried googling but it's all either multidimensional arrays or only accessing 1 element at a time.
Thanks.
I'm not sure that this is synthesizable in most fabric. You could create two copies of the reg array and index into each of them.
It seems like you are trying to implement a quad-port memory. Anyway, even if your register file is not exactly a 4-port memory, it probably can be implemented around one.
Altera has an example of such a memory in their Advanced Synthesis Cookbook. The picture below shows the relevant part:
If use the Altera example files, it will instantiate Altera primitives, and use FPGA block RAM for storage. If you are concerned about portability, or you just want to look at some VHDL code that does what you want, check the example below. It implements roughly the same circuit shown in the figure, and it will most likely be synthesized as distributed memory in the FPGA.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
-- Quad-port RAM with 2 read ports 2 write ports. The design uses 2 memory blocks
-- (MAIN_MEMORY and SHADOW_MEMORY) to allow for simultaneous writes. Port A writes to
-- main memory, Port B writes to shadow memory. On a read from either port, data is
-- read from the memory block that was most recently written at the given position.
entity quad_port_ram is
generic (
ADDRESS_WIDTH: natural := 5;
DATA_WIDTH: natural := 32
);
port (
clock: in std_logic;
read_addr_a: in natural range 0 to 2**ADDRESS_WIDTH-1;
read_data_a: out std_logic_vector(DATA_WIDTH-1 downto 0);
write_addr_a: in natural range 0 to 2**ADDRESS_WIDTH-1;
write_data_a: in std_logic_vector(DATA_WIDTH-1 downto 0);
write_enable_a: in std_logic;
read_addr_b: in natural range 0 to 2**ADDRESS_WIDTH-1;
read_data_b: out std_logic_vector(DATA_WIDTH-1 downto 0);
write_addr_b: in natural range 0 to 2**ADDRESS_WIDTH-1;
write_data_b: in std_logic_vector(DATA_WIDTH-1 downto 0);
write_enable_b: in std_logic
);
end;
architecture rtl of quad_port_ram is
type memory_type is (MAIN_MEMORY, SHADOW_MEMORY);
type memory_type_array is array (natural range <>) of memory_type;
-- Keep track of which memory has the most recently written data for each address
signal most_recent_port_for_address: memory_type_array(0 to 2**ADDRESS_WIDTH-1);
type memory_array is array (0 to 2**ADDRESS_WIDTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
type dual_memory_array is array (memory_type) of memory_array;
-- Store the actual memory bits. Access like this:
-- memory_data(memory_type)(address)(bit_position)
signal memory_data: dual_memory_array;
-- Auxiliary signals to decide where to read the data from (main or shadow)
signal most_recent_port_for_addr_a, most_recent_port_for_addr_b: memory_type;
begin
process (clock) begin
if rising_edge(clock) then
if write_enable_a then
memory_data(MAIN_MEMORY)(write_addr_a) <= write_data_a;
most_recent_port_for_address(write_addr_a) <= MAIN_MEMORY;
end if;
if write_enable_b then
if (write_enable_a = '0') or (write_addr_a /= write_addr_b) then
memory_data(SHADOW_MEMORY)(write_addr_b) <= write_data_b;
most_recent_port_for_address(write_addr_b) <= SHADOW_MEMORY;
end if;
end if;
end if;
end process;
most_recent_port_for_addr_a <= most_recent_port_for_address(read_addr_a);
most_recent_port_for_addr_b <= most_recent_port_for_address(read_addr_b);
read_data_a <= memory_data(most_recent_port_for_addr_a)(read_addr_a);
read_data_b <= memory_data(most_recent_port_for_addr_b)(read_addr_b);
end;

Resources