how can i access four elements from a 2d array or array of array in one process at the same time?
in this sample, i am trying to access intg1 at the same time, the synthesis is taking for ever.
type img_whole is array (78 downto 0, 130 downto 0) of std_logic_VECTOR(7 downto 0);
signal img1: img_whole;
signal i1_1: integer range 0 to 79:=0;
signal j1_1:integer range 0 to 131:=0;
type intg is array (78 downto 0, 130 downto 0) of integer range 0 to 1751998;--no double??
signal intg1 : intg;
integral :process (clka,finished,finished1)
variable tempo: integer range 0 to 1751998;
begin
if clka'event and clka = '1' then
if finished="1" and finished1="0" then
if i1_1 < 78 and j1_1 <130 then
j1_1<=j1_1+1;
elsif j1_1=130 and i1_1<78 then
j1_1<=0 ;
i1_1<=i1_1+1;
elsif j1_1<130 and i1_1=78 then
j1_1<=j1_1+1;
elsif j1_1=130 and i1_1=78 then
finished1<="1";
end if;
tempo:= to_integer(unsigned('0' & img1(i1_1,j1_1)));
if i1_1-1>=0 then
tempo:=intg1(i1_1-1,j1_1)+tempo;
end if;
if j1_1-1>=0 then
tempo:=intg1(i1_1,j1_1-1)+tempo;
end if;
if i1_1-1>=0 and j1_1-1>=0 then
tempo:=tempo-intg1(i1_1-1,j1_1-1);
end if;
intg1(i1_1,j1_1)<=tempo;
end if;
end if;
end process;
i am trying to access intg1 at the same time, the synthesis is taking for ever.
this code is for getting an integral image, out of a 2d array.
There are both functional and synthesis issues in the code.
Functional issues:
finished1 is only driven to '1' in the process, but never to '0', so if the initial value is '0' then the operation in the process can only be done once after power up, since the finished1 value of '1' will then inhibit further updates due to the process enable condition.
i1_1 and j1_1 are signals that are driven in the start of the process, and then used later in the process, but since signals, the value assigned with <= is not available until next process evaluation. Is that intentional?
Use a simulator to ensure correct functionality, which can be done before synthesis.
Synthesis issues:
intg1 is a table with at least 79 * 131 > 10 K entries, each of log2(1751999) <= 18 bits, thus a pretty large table. The design requires asynchronous lookup in the table, since there is no extra cycle (clock edge) available from a new value of index e.g. i1_1 and until the output of the process is generated based on the table lookup. An asynchronous lookup in a large table requires a huge mux network, which is probably the reason for the long synthesis time. And this lookup is even done multiple times based on different index values.
Minor: finished, and finished1 are not needed in the sensitivity list of the process, since this is a process clocked by the clka.
The above list of issues may not be complete.
To fix the table lookup problem (first synthesis issue), make a pipe-lined design with cycles e.g.:
Index values i1_1 etc. are generated
intg1 table lookup synchronously
Intermediate tempo is generated, and intg1 is updated.
The current design does step 2. and 3. in a single cycle, whereby it is not possible to make a synchronous lookup in the table, since there is only one clock edge in the cycle, and this is used for writing back to the intg1 table. So by splitting the lookup and write back operation in two cycles, it is possible both to have a clock edge for reading the table (synchronous read) and for writing the table. Such a synchronous read using a clock edge is much more efficient based on the available hardware resources in typical FPGAs, since these contains large synchronous RAMs similar to the intg1 table, thus the implementation will be smaller and faster. The synchronous intg1 lookup is made by simply adding a clocked process where signals are driven directly by the intg1 output based in the required index values. All the required reads must be made, then the subsequent process can then determine which of the read value that are actually used.
The specific pipeline implementation must be adapted to the design requirements.
Related
I am trying to learn VHDL and struggling with some of its basics. The question is as follows:
Process statement is described to contain code that runs sequentially (one line after the other). I want to ask why can't one run concurrent code in a process statement (means all lines execute in parallel). Secondly, if process statement contains sequential code, how can it model for example, three flip-flops concurrently e.g.,
--inside process statement
Q1 <= D1;
Q2 <= Q1;
Q3 <= Q2;
Sequential relates to the order the statements are evaluated, not when the assignment takes effect.
The VHDL Simulation Cycle
Signal assignments don't take effect immediately, they are scheduled for the current or a future time and all processes sensitive to signal transactions in the current simulation cycle being are completed before the assignments take effect. (And in VHDL everything devolves into equivalent block hierarchy, processes and function calls for simulation.)
When all currently active processes complete simulation time advances to the next time a signal is active in any signal projected output waveform (a queue) unless there are events at the current simulation time, in which case we call the next simulation cycle a delta cycle.
Each process that is sensitive to a signal's transactions is executed and any further signal assignments are made to the respective projected output waveform. There is only one 'slot' in the queue for the current simulation time for each signal.
In this way there aren't any processes hitting moving targets. Only one process executes at a time, no signal assignments take effect until all processes have completed execution. This emulates concurrency, mimicking parallel execution when processes containing sequential statements are executed sequentially.
An assignment such as Q1 <= D1; is equivalent to Q1 <= D1 after 0 ns; meaning the current simulation time. If a series of sequential statements in a process contain a subsequent assignment to the same signal at the current simulation time and the assigned value is different the second assignment will replaced the first one in the projected output waveform.
When there are no more events scheduled for signals at the current simulation time, simulation time will advance to the earliest time of any transaction time in any projected output waveform queue advancing simulation time.
When there are no further queue events simulation time will advance to Time'HIGH (the highest possible simulation time) and simulation will cease.
Also simulation can be stopped by an implementation controlling how long to allow the simulation to run or by execution of an assertion statement with a SEVERITY LEVEL of FAILURE or an implementation defined severity level threshold for stopping simulation.
Waveform link included I have a confusion regarding the value assignment to signal in VHDL.
Confusion is that I have read that values to signal gets assigned at end of process.
Does the value get assigned right when the process finishes or when the process is triggered the next time?
If it is assigned at the end of the process then consider this scenario (3 flip flops in series i.e output of one flip flop is input to another) then if D1 is 1 at time 0 will not the output Q3 be 1 at the same time?
(1) Right when the process finishes. More precisely, right after this and ALL processes running alongside this process have finished, and before any processes are subsequently started. So when any signal assignment happens, no process is running.
(2) Q3 will become the value on D1 three clock cycles earlier. Whether that value was '1' or not I can't tell from your question!
The signal assignment is done only at the end of the process. After signal assignment, there may exist signal updates and because of the signal updates, the process itself or maybe other processes which are sensitive to some of the updated signals will be triggered. This is the concept of delta-cycle. It happens in a zero simulation time.
signal updates -> triggers process->at the end of the process, signals are updated
----------------------------------- -----------------------------------
this is one delta cycle starting of the second delta cycle
when there will be no signal update, the process finishes and the the simulation time increments.
Is it possible to set an inout pin to specific value when after monitoring the value in same pin.ie if we have an inout signal then if value on that signal is one then after doing specific operation can we set value of that pin to zero in vhdl.
What you are describing doesn't make a lot of sense. Are you sure you are understanding the requirements correctly?
Your load signal sounds like an external control signal that is an input into your module. You should not be trying to change the value of that signal - whoever is controlling your module should do that instead.
As long as the load signal is asserted (1), you should probably be loading your shift register with whatever value is presumably being provided on a different input signal (e.g., parallel_data). When the load signal is deasserted (0) by the external logic, you should probably start shifting out one bit of the loaded data per clock cycle to your output signal (e.g., serial_data).
Note that there is no need for bidirectional signals!
This is all based on what I would consider typical behavior for a shift register, and may or may not match what you are trying to achieve.
This doesn't sound like a good plan, and I'm not entirely sure you want to do it, but I guess if you can set things up such that:
you have a resistor pulling the wire down to ground.
your outside device drives the wire high
the FPGA captures the pin going high and then also drives it high
The outside source goes tristate once it has seen the pin go high
the FPGA can then set the pin tristate when it wants to flag it has finished (or whatever), and the resistor will pull it low again
repeat
I imagine one use for this would be for the outside device to trigger some processing and the FPGA to indicate when it has finished, in which case, the FPGA code could be something like:
pin_name <= '1' when fpga_is_processing = '1' else 'Z';
start_processing <= '1' when pin_name = '1' and pin_name_last = '0';
pin_name_last <= pin_name when rising_edge(clk);
start processing will produce a single clock pulse on the rising edge of the pin_name signal. fpga_is_processing would be an output from your processing block, which must "come back" before he external device has stopped driving the pin high.
You may want to "denoise" the edge-detector on the pin_name signal to reduce the chances of external glitches triggering your processing. There are various ways to achieve that also.
attribute ram_style: string;
attribute ram_style of ram : signal is "distributed";
type dist_ram is array (0 to 99) of std_logic_vector(7 downto 0);
signal ram : dist_ram := (others => (others => '0'));
begin
--Pseudocode
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
ram(0) <= "0";
ram(2) <= "1";
ram(3) <= "2";
ram(4) <= "3";
ram(5) <= "1";
ram(6) <= "2";
...
...
ram(99) <= "3";
end if;
END PROCESS;
In the above scenario the complete ram gets updated in 1 clock cycle, however if i use a Block ram instead, i would require a minimum of 100 clock cycles to update the entire memory as opposed to 1 clock cycle when used as a distributed ram.
I also understand that it is not advisable to use the distributed ram for large
memory as it will eat up the FPGA resources.
So what is the best design for such situation (say for few KB ram) in order to achieve the best throughput.
Should i use block ram or distributed ram assuming Xilinx FPGA. Your suggestions are highly appreciated.
Thanks for your replies, let me make it a bit more clear.
My purpose is not for ram initialization,
i have 100 x 20 (8 bits) ram block which needs to be updated after
certain computation.
After these computations i have to store and then use it back for next iteration.
This is an iterative process and i am expected to finish atleast 2 iterations
within 3000 clk cycles.
if i use the block ram to store these coefficients then to just read and write
i would need atleast (100*20) cycles with some latency which will not meet my
requirement.
So how should i go about designing in this case.
What purpose are you trying to achieve? Do you really need to update the entire RAM on 1 clock cycle or can you break it into many clock cycles?
If you can break it up into many clocks you could do something like:
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
ram(index) <= index;
index <= index + 1;
end if;
END PROCESS;
You can initialize a block ram as well. The specifics of this depend on your FPGA vendor. So for a Xilinx FPGA look at this article:
http://www.xilinx.com/itp/xilinx10/isehelp/pce_p_initialize_blockram.htm
If you really cannot afford to break it up into multiple clocks and you want to "initialize" it more than once then you will need to use distributed ram the way you did above.
From what I understand, all statements inside a PROCESS is executed sequentially. So what happens to a concurrent signal assignment(<=)? Does it work the same way as sequential assignment (:=) or does it execute after a delta delay?
If it executes after a delta delay, then how can all the statements inside PROCESS be called sequential?
If it executes immediately, then is there any difference between := and <= in a process?
The signal assignment (<=) is performed after all the sequential code in the processes are done executing. This is when all the active processes for that timestep are done.
As an example why this is:
Suppose you have an event that triggers 2 processes. These 2 processes
use the same signal, but one of them changes the value of that
signal. The simulator is only be able to perform one process at the
time due to a sequential simulation model (not to confuse with the
concurrent model of vhdl). So if process A is simulated first and A
changes the signal, B would have the wrong signal value. Therefore the
signal can only be changed after all the triggered processes are done.
The variable assignment (:=) executes immidiatly and can be used to e.g. temporarely store some data inside a process.
Sequential signal assignment (<=), as opposed to sequential variable assignment (:=), sequentially schedules an event one delta delay later for the value of the signal to be updated. You can change the scheduled event by using a sequential signal assignment on the same signal in the same process. Only the last update scheduled on a particular signal will occur. For example:
signal a : std_logic := '1'; --initial value is 1
process(clk)
variable b : std_logic;
begin
--note that the variable assignment operator, :=, can only be used to assign the value of variables, never signals
--Likewise, the signal assignment operator, <=, can only be used to assign the value of signals.
if (clk'event and clk='1') then
b := '0' --b is made '0' right now.
a <= b; --a will be made the current value of b ('0') at time t+delta
a <= '0'; --a will be made '0' at time t+delta (overwrites previous event scheduling for a)
b := '1' --b will be made '1' right now. Any future uses of b will be equivalent to replacing b with '1'
a <= b; --a will be made the current value of b ('1') at time t+delta
a <= not(a); --at time t+delta, a will be inverted. None of the previous assignments to a matter, their scheduled event have been overwritten
--after the end of the process, b does not matter because it cannot be used outside of the process, and gets reset at the start of the process
end if;
end process;
It is also important to note that while sequential processes operate sequentially from a logical perspective in the VHDL, when synthesized, they are really turned into complex concurrent statements connecting flip flops. The entire process runs concurrently as a unit between every clock cycle (processes that don't operate on a clock become pure combinational logic). Signals are the values that are actually stored into the flip flops. Variables are just aliasing to make processes easier to read. They are absorbed into combinational logic after synthesis.