FPGA Ram design issue - vhdl

attribute ram_style: string;
attribute ram_style of ram : signal is "distributed";
type dist_ram is array (0 to 99) of std_logic_vector(7 downto 0);
signal ram : dist_ram := (others => (others => '0'));
begin
--Pseudocode
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
ram(0) <= "0";
ram(2) <= "1";
ram(3) <= "2";
ram(4) <= "3";
ram(5) <= "1";
ram(6) <= "2";
...
...
ram(99) <= "3";
end if;
END PROCESS;
In the above scenario the complete ram gets updated in 1 clock cycle, however if i use a Block ram instead, i would require a minimum of 100 clock cycles to update the entire memory as opposed to 1 clock cycle when used as a distributed ram.
I also understand that it is not advisable to use the distributed ram for large
memory as it will eat up the FPGA resources.
So what is the best design for such situation (say for few KB ram) in order to achieve the best throughput.
Should i use block ram or distributed ram assuming Xilinx FPGA. Your suggestions are highly appreciated.
Thanks for your replies, let me make it a bit more clear.
My purpose is not for ram initialization,
i have 100 x 20 (8 bits) ram block which needs to be updated after
certain computation.
After these computations i have to store and then use it back for next iteration.
This is an iterative process and i am expected to finish atleast 2 iterations
within 3000 clk cycles.
if i use the block ram to store these coefficients then to just read and write
i would need atleast (100*20) cycles with some latency which will not meet my
requirement.
So how should i go about designing in this case.

What purpose are you trying to achieve? Do you really need to update the entire RAM on 1 clock cycle or can you break it into many clock cycles?
If you can break it up into many clocks you could do something like:
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
ram(index) <= index;
index <= index + 1;
end if;
END PROCESS;
You can initialize a block ram as well. The specifics of this depend on your FPGA vendor. So for a Xilinx FPGA look at this article:
http://www.xilinx.com/itp/xilinx10/isehelp/pce_p_initialize_blockram.htm
If you really cannot afford to break it up into multiple clocks and you want to "initialize" it more than once then you will need to use distributed ram the way you did above.

Related

Pipeline muxes in hdl

I am doing some simple tests to evaluate how clock speed increases in a digital circuit when pipelining.
I pipeline an 10to1 mux using 2 5to1 and 1 2to1. I get some clock speed increase from the fpga synthesizer (altera). Then I add one more stage, replacing the he 5to1 muxes with 2to1 and 3to1 and appropriate registers. In the latter case the clock speed drops. I don't get why adding registers and pipeline stages would drop the clock speed..any explanations?
The minimal logic gate in most FPGAs is a lookup table (LUT) They came with 3 to 6 inputs. Altera's ALMs are configurable in many ways. In either way, if a multiplexer size is lower then the equivalent LUT size, there will be no further Fmax improvement.
You could describe all multiplexer sizes as trees of 2:1 multiplexers. Synthesis will optimize the resulting equations and map them to LUT structures and configurations of your FPGA device.
You can further use a user-defined rising_edge function to create a variable pipelining:
function registered(signal Clock : std_logic; constant IsRegistered : boolean) return boolean is
begin
if IsRegistered then
return rising_edge(Clock);
end if
return TRUE;
end function;
(Source: PoC-Library - components package)
This function allows you to selectivly enable and disable pipeline stages.

How to run simulation for a set amount of clock cycles

To all,
I am new to VHDL. I have a working design however my simulation keeps running forever until I cancel the simulation. In the test bench how do I stop the simulation after x clock cycles? Is this done in the clock process?
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
Please and Thank you!
In VHDL-2008 :
if now >= clk_period * x then
std.env.stop;
end if;
In earlier revisions (still works in 2008 though!) :
assert now < clk_period * x report
"Stopping simulation : this is not a failure!" severity failure;
This latter curiosity (abusing an assert) is actually recommended in Janick Bergeron's book "Writing Testbenches"!
The clock process is as good a place as any for them but a separate process (probably sensitive to clock) for simulation control may be marginally cleaner design.

access four elements from array at the same time vhdl

how can i access four elements from a 2d array or array of array in one process at the same time?
in this sample, i am trying to access intg1 at the same time, the synthesis is taking for ever.
type img_whole is array (78 downto 0, 130 downto 0) of std_logic_VECTOR(7 downto 0);
signal img1: img_whole;
signal i1_1: integer range 0 to 79:=0;
signal j1_1:integer range 0 to 131:=0;
type intg is array (78 downto 0, 130 downto 0) of integer range 0 to 1751998;--no double??
signal intg1 : intg;
integral :process (clka,finished,finished1)
variable tempo: integer range 0 to 1751998;
begin
if clka'event and clka = '1' then
if finished="1" and finished1="0" then
if i1_1 < 78 and j1_1 <130 then
j1_1<=j1_1+1;
elsif j1_1=130 and i1_1<78 then
j1_1<=0 ;
i1_1<=i1_1+1;
elsif j1_1<130 and i1_1=78 then
j1_1<=j1_1+1;
elsif j1_1=130 and i1_1=78 then
finished1<="1";
end if;
tempo:= to_integer(unsigned('0' & img1(i1_1,j1_1)));
if i1_1-1>=0 then
tempo:=intg1(i1_1-1,j1_1)+tempo;
end if;
if j1_1-1>=0 then
tempo:=intg1(i1_1,j1_1-1)+tempo;
end if;
if i1_1-1>=0 and j1_1-1>=0 then
tempo:=tempo-intg1(i1_1-1,j1_1-1);
end if;
intg1(i1_1,j1_1)<=tempo;
end if;
end if;
end process;
i am trying to access intg1 at the same time, the synthesis is taking for ever.
this code is for getting an integral image, out of a 2d array.
There are both functional and synthesis issues in the code.
Functional issues:
finished1 is only driven to '1' in the process, but never to '0', so if the initial value is '0' then the operation in the process can only be done once after power up, since the finished1 value of '1' will then inhibit further updates due to the process enable condition.
i1_1 and j1_1 are signals that are driven in the start of the process, and then used later in the process, but since signals, the value assigned with <= is not available until next process evaluation. Is that intentional?
Use a simulator to ensure correct functionality, which can be done before synthesis.
Synthesis issues:
intg1 is a table with at least 79 * 131 > 10 K entries, each of log2(1751999) <= 18 bits, thus a pretty large table. The design requires asynchronous lookup in the table, since there is no extra cycle (clock edge) available from a new value of index e.g. i1_1 and until the output of the process is generated based on the table lookup. An asynchronous lookup in a large table requires a huge mux network, which is probably the reason for the long synthesis time. And this lookup is even done multiple times based on different index values.
Minor: finished, and finished1 are not needed in the sensitivity list of the process, since this is a process clocked by the clka.
The above list of issues may not be complete.
To fix the table lookup problem (first synthesis issue), make a pipe-lined design with cycles e.g.:
Index values i1_1 etc. are generated
intg1 table lookup synchronously
Intermediate tempo is generated, and intg1 is updated.
The current design does step 2. and 3. in a single cycle, whereby it is not possible to make a synchronous lookup in the table, since there is only one clock edge in the cycle, and this is used for writing back to the intg1 table. So by splitting the lookup and write back operation in two cycles, it is possible both to have a clock edge for reading the table (synchronous read) and for writing the table. Such a synchronous read using a clock edge is much more efficient based on the available hardware resources in typical FPGAs, since these contains large synchronous RAMs similar to the intg1 table, thus the implementation will be smaller and faster. The synchronous intg1 lookup is made by simply adding a clocked process where signals are driven directly by the intg1 output based in the required index values. All the required reads must be made, then the subsequent process can then determine which of the read value that are actually used.
The specific pipeline implementation must be adapted to the design requirements.

reading the value of input when clk ='1' in the mid way of clk

I know about rising_edge(clk) and when clk'event and clk ='1'.
i guess they detect edge.
but lets say i want to read the input when clk is high and in mid way. I guess I am able to write what I want to convey so how can we do that?
If I am not correct please explain.
thanks
In a testbench, or synthesisable into real hardware?
Assuming your clock has period clk_period, declared something like
constant clk_period : time := 100 ns; -- for 10 MHz
clk <= not clk after clk_period/2;
you can write testbench code like
wait until rising_edge(clk);
wait for clk_period/4;
value <= my_input;
However this is not synthesisable. In real hardware you need a different approach. Most FPGAs have clock generation modules (PLLs, DLLs, DCMs) which will allow you to generate phase shifted or inverted clocks, and you can use such a block to accomplish your task. More specific suggestions would depend on the actual FPGA you are using, and whether you have any faster clocks available.
For example, given clk and clk_2x which are phase aligned (so that each clk edge is a clk_2x rising edge) you can use the falling edge of clk_2x while clk is high.
process(clk_2x)
begin -- capture data
if falling_edge(clk_2x) then
if clk = '1' then
temp <= data_in;
end if;
end if;
end process;
process(clk)
begin -- resynch to main clock domain
if rising_edge(clk) then
value <= temp;
end if;
end process;
Alternative approaches can involve clocking the ADC from delayed, inverted or otherwise modified clocks, or using selectable delays in IOBs on the input data so that the (delayed) data is stable during the clock edge.
This is something that - without really good sim models of the external parts, can be quite tricky to get right in simulation, and needs thorough testing on actual hardware. I have sometimes used phase controllable clocks to external parts and mapped out the range of phases that worked before picking one phase or delay value for production.

After pulse Generate clock that last for 10 clock cycles

I am trying to create a clock that only last 10 clock cycles from a 100 MHz signal. The clock will get enabled from a pulse signal.
-Every time the pulse signal goes to 1 clock2 follows 100 MHz clock1 for 10 cycles
I am working in VHDL
Do you mean that a 100 MHz clock is given as an input?
Assuming it is, a small state machine and a counter would be a good way to approach this. The state machine could have 2 states: idle and count. The next state logic would change the state from idle to count if the pulse signal is high. While in the count state the counter will increment and when the counter reaches ten, the state will move back to idle. The output logic will forward the clock signal to the output when in the count state. When in idle the output will be '0'.
For more information on making a state machine: How to implement state machines in VHDL
For more information on making a counter: Counters in VHDL
clock_input is the 100 MHz clock that is an input and clock_generated is the output.
The output logic will involve a multiplexer:
clock_generated <= clock_input when state=count else '0';

Resources