Efficient synthesizable way to read inputs of variable length in one simulation - vhdl

I am new to vhdl and I have a question that I could not find an answer to.
I am trying to implement some algorithm that operates on a vector 1024 bits long. So I have to read data from a file and insert them in this temp_vector(1023 downto 0).
The file contains many inputs and for each one the algorithm operates on the output of the previous one. The problem is that the length of this input data is not constant, but each input length varies from 0 to 16000.
In each cycle I can only read 64 bits from my testbench and insert them. At specific moments I also have to append sequences of bits of variable length to the already inserted data. I use the integer bit_position to keep track of the last position of temp_vector I inserted data in, and the integer data_in_length to tell me the actual length of the data that I read in each cycle.
After temp_vector is full or I finished inserting all the input, I stop reading and the algorithm operates on temp_vector and then the reading from the file resumes.
I get correct results in simulations. The problem I have is with FPGA and ASIC synthesis. It has to do with the varying length of the input data.
I came up with two approaches in order to read the file and insert the data to temp_vector.
First
temp_vector_proc : process(clk, rst_n)
begin
if rst_n = '0' then
state <= RESET;
elsif clk'event and clk = '1' then
case state is
when RESET =>
state <= TAKING_DATA;
enable_calculations <= '0';
bit_position <= 0;
temp_vector <= (others => '0');
when TAKING_DATA =>
if take_data = '1' then
for i in 0 to 63 loop
temp_vector(i+bit_position) <= data_in_keyak(i);
end loop;
bit_position <= bit_position +data_in_length;
end if;
when ...
.....
end case;
end if; -- end rst_n = '0'
end process temp_vector_proc;
And second
when TAKING_DATA =>
if take_data = '1' then
temp_vector(63+ bit_position downto bit_position) <= data_in(63 downto 0);
bit_position <= bit_position +data_in_length;
end if;
when ...
.....
end case;
The second approach is working in FPGA but it is taking a lot of time and consuming too many Logic Elements and for design compiler it is returning an error that the range has to be constant.
The first one is also working in FPGA and returning no error in ASIC ( Design compiler), but for the ASIC it runs forever (I let it run overnight). It was stuck at beginning Pass 1 Mapping of the entity that contains the first code.
I hope I explained my problem enough and I would really appreciate some thoughts on how to implement it in a more efficient way.
I do not think I can go with generics because the file is read and operated on in one go, so the lengths are changing during the simulation. I also though about shifting but since the shifting value would be changing each time, i guess that it would still consume a lot of time/area.
Finally could my whole approach be wrong? I mean is it possible that for FPGA and especially for ASIC I need to be working on specific input sizes? That would mean that I should try to write and synthesize code that does not work for all of my file, but for some of its inputs with some specified size only.
Thanks a lot in advance for your time.

Related

Execute two chunks of code in sequence but the code itself in parallel

I have a piece of code that needs to be executed after another. For example, I have an addition slv_reg2 <= slv_reg0 + slv_reg1; and then I need the result subtracted from a number.
architecture IMP of user_logic is
signal slv_reg0 : std_logic_vector(0 to C_SLV_DWIDTH-1); --32 bits wide
signal slv_reg1 : std_logic_vector(0 to C_SLV_DWIDTH-1);
signal slv_reg2 : std_logic_vector(0 to C_SLV_DWIDTH-1);
signal slv_reg3 : std_logic_vector(0 to C_SLV_DWIDTH-1);
signal flag : bit := '0';
begin
slv_reg2 <= slv_reg0 + slv_reg1;
flag <= '1';
process (flag)
begin
IF (flag = '1') THEN
slv_reg3 <= slv_reg0 - slv_reg2;
END IF;
end process;
end IMP;
I haven't tested the code above but I would like some feedback if my thought is correct. What is contained in the process doens't need to run in sequence, how can I make this part also run in parallel?
In summary, I have two chunks of code that need to be executed in sequence but the code itselft should run in parallel.
--UPDATE--
begin
slv_regX <= (slv_reg0 - slv_reg1) + (slv_reg2 - slv_reg3) +(slv_reg4 - slv_reg5); --...etc
process(clk) -- Process for horizontal counter
begin
if(rising_edge(clk)) then
if(flag = 1) then
flag <= 0;
end if;
end if;
end process;
It's concurrent, not parallel. Just use intermediate signals
I am afraid your thought process about VHDL is not entirely correct. You acknowledge the first assignment happening 'in parallel', but you must realise that the process is also running concurrently with the rest of the statements. Only what happens inside a process is evaluated sequentially - every time any of the signals in the sensitivity list change.
I need to make a lot of assumptions about what you need, but perhaps you're just after this style:
c <= a + b;
e <= d - c;
The signal e will contain d-(a+b) at all times. If you don't immediately understand why - this is where the difference between concurrent and parallel comes in. It is being continuously evaluated - just like if you'd hook the circuit up with copper wires.
If what you actually need is for something to happen based on a clock - you're really not there yet with your example and I recommend looking up examples and tutorials first. People will be happy to help once you provide VHDL code that is more complete.
Issues in your example
Assuming you are intending to write synthesisable VHDL, there several problems with your code:
Your process sensitivity list is not correct - you would either make it sensitive to only a clock, or to all the input signals to the process.
You are both initialising your 'flag' to '1' and assigning it a '0' in concurrent code, that doesn't make sense.
Signals ..reg0 and ..reg1 are not assigned. If they are inputs, don't declare them as signals.
You have named your signals as if they are anonymous numbered 'registers', but they are not registers, just signals: wires! (if ended up with this style because you are comparing to Verilog - let me argue that the Verilog 'reg' regularly don't make sense as they often end up not being registers.)
Try it
I haven't tested the code above but
You really should. Find code examples, tutorials, go and play!

VHDL FSM set unit input and use output in same state

I am implementing a Mealy-type FSM in vhdl. I currently am using double process, although i've just read a single-process might be neater. Consider that a parameter of your answer.
The short version of the question is: May I have a state, inside of which the input of another component is changed, and, aftwerwards, in the same state, use an output of said component? Will that be safe or will it be a rat race, and I should make another state using the component's output?
Long version: I have a memory module. This is a fifo memory, and activating its reset signal takes a variable named queue_pointer to its first element. After writing to the memory, the pointer increases and, should it get out of range, it is (then also) reset to the first element, and an output signal done is activated. By the way, i call this component FIMEM.
My FSM first writes the whole FIMEM, then moves on to other matters. The last write will be done from the state:
when SRAM_read =>
READ_ACK <= '1';
FIMEM_enable <= '1';
FIMEM_write_readNEG <= '0';
if(FIMEM_done = '1') then --is that too fast? if so, we're gonna have to add another state
FIMEM_reset <= '1'; --this is even faster, will need check
data_pipe_to_FOMEM := DELAYS_FIMEM_TO_FOMEM;
next_state <= processing_phase1;
else
SRAM_address := SRAM_address + 1;
next_state <= SRAM_wait_read;
end if;
At this state, having enable and write active means data will be written on the FIMEM. If that was the last data space on the memory, FIMEM_done will activate, and the nice chunk of code within the if will take care of the future. But, will there be time enough? If not, and next state goes to SRAM_wait_read, and FIMEM_done gets activated then, there will be problems. The fact that FIMEM is totally synchronous (while this part of my code is in the asynchronous process) messes even more?
here's my memory code, just in case:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity memory is
generic (size: positive := 20);
Port ( clk,
reset,
enable,
write_readNEG: in std_logic;
done: out std_logic;
data_in: in STD_LOGIC_VECTOR(7 downto 0);
data_out: out STD_LOGIC_VECTOR(7 downto 0) );
end memory;
architecture Behavioral of memory is
subtype word is STD_LOGIC_VECTOR(7 downto 0);
type fifo_memory_t is array (0 to size-1) of word;
signal fifo_memory : fifo_memory_t :=((others=> (others=>'0')));
--Functionality instructions:
--Resetting sets the queue pointer to the first element, and done to 0
--Each cycle with enable active, a datum from the pointer position is
--written/read according to write_readNEG, and the pointer is incremented.
--If the operation was at the last element, the pointer returns to the first place
--and done is set to 1. When done is 1, enable is ignored.
Begin
process(clk,reset)
variable done_buf : std_logic;
variable queue_pointer: natural range 0 to size-1;
begin
if(reset = '1') then
queue_pointer := 0;
done_buf := '0';
elsif(rising_edge(clk)) then
if(done_buf = '0' and enable = '1') then
case write_readNEG is
when '0' =>
data_out <= fifo_memory(queue_pointer);
when '1' =>
fifo_memory(queue_pointer) <= data_in;
when others => null;
end case;
if(queue_pointer = size-1) then
done_buf := '1';
queue_pointer := 0;--check
else
queue_pointer := queue_pointer + 1;
end if;
end if; --enable x not done if
end if; --reset/rising edge end if
done <= done_buf;
end process;
End Behavioral;
More details inspired by the first comment:
The memory can read the data at the same cycle enable is activated, as seen here:
Notice how the "1", the value when enable is turned active, is actually written into the memory.
Unfortunately, the piece of code is in the asynchronous process! Although I'm VERY strongly thinking of moving to a single-process description.
In contrast to all the circuits I've designed until now, it is very hard for me to test it via simulation. This is a project in my university, where we download our vhdl programs to a xilinx spartan 3 FPGA. This time, we have been given a unit which transfers data between Matlab and the FPGA's SRAM (the functionality of which, I have no idea). Thus, I have to use this unit to transfer the data between the SRAM and my memory module. This means, in order to simulate, my testbench file will have to simulate the given unit! And this is hard..suppose I must try it, though...
first of all, whether to use a single process or a dual process type of FSM notation is a matter of preference (or company coding style rules). I find single process notation easier to write/read/manage.
your enable signal will have an effect on your memory code only after the next rising clock edge. the done signal related to the actual memory state will therefore be available one clock cycle after updating enable. I guess (and hope! but it's not visible in your posted code), your current_state<=next_state part of the FSM is synchronous! therefore your state machine will be in the SRAM_wait_read state by the time when done is updated!
btw: use a simulator! it will help to check the functionality!
thanks for adding the simulation view! strangely your done signal updates on neg. clock edge... in my simulation it updates on pos. edge; as it should, by the way!
to make the situation more clear I suggest, you move the done <= done_buf; line inside the "rising_edge-if" (this should be done anyhow when using synchronous processes!).

How are loops within a process synthesized in VHDL?

I'm working on the implementation of a FIR filter in VHDL and need some advice regarding when to use and not to use process statements. Part of the code is presented below. Specifically, I'm wodering how the resetCoeffs loop will synthesize. Will it be a sequential reset, and so be very inefficeint in both speed and area I assume, or will it be done in parallel? If the former is the case, how would I go about writing it so that is can be a parallel reset.
process (clk) is begin
if rising_edge(clk) then
if rst = '1' then
-- Reset pointer
ptr <= (others => '0');
-- Reset coefficients
resetCoeffs: for i in 0 to ORDER - 1 loop
coeffs(i) <= (others => '0');
end loop;
else
-- Increment pointer
ptr <= ptr + 1;
-- Fetch input value
vals( to_integer(ptr) ) <= ival;
-- Write coefficient
if coeff_wen = '1' then
coeffs( to_integer(ptr) ) <= ival;
end if;
end if;
end if;
end process;
It's going to be parallel. It basically has to be since the whole loop (everything in the process, really) has to operate in a single clock cycle in hardware.
I'm curious, though. Since you've made the effort to write out the whole process, why not just synthesize it with your favorite synthesizer? That's what I did, to check my answer.
Once a process begin execution (because of a change in one of the signals in its sensitivity list) it must run until it suspends, either by reaching the end, or reaching a wait statement.
Because of this defined behaviour, loops will "execute" all their code until they exit before time is allowed to move on in the simulator. The synthesiser will therefore create the same behaviour by unrolling all the loop and making sure it "all happens at once" (or at least within the clock tick for a clocked process like yours).

Counter with push button switch design using VHDL and Xilinx

I'm very new to VHDL and XILINX ISE. I use the version 13.2 for Xilinx ISE.
I want to design a very simple counter with the following inputs:
Direction
Count
The count input will be assigned to a button and I want the counter to count up or down according to direction input when the button is pressed. I have written a sample VHDL before this one. It had a clock input and It was counting according to the clock input. Now I want it to count when I press the button instead of counting synchronously.
Here's my VHDL code (please tell me if my code have a logical or any other flaw):
entity counter is
Port ( COUNT_EN : in STD_LOGIC;
DIRECTION : in STD_LOGIC;
COUNT_OUT : out STD_LOGIC_VECTOR (3 downto 0));
end counter;
architecture Behavioral of counter is
signal count_int : std_logic_vector(3 downto 0) := "0000";
begin
process
begin
if COUNT_EN='1' then
if DIRECTION='1' then
count_int <= count_int + 1;
else
count_int <= count_int - 1;
end if;
end if;
end process;
COUNT_OUT <= count_int;
end Behavioral;
I use Spartan xc3s500e and I placed the inputs accordingly. Below is my .ucf file:
#Created by Constraints Editor (xc3s500e-fg320-5) - 2013/03/18
NET "COUNT_EN" LOC = K17;
NET "COUNT_OUT[0]" LOC = F12;
NET "COUNT_OUT[1]" LOC = E12;
NET "COUNT_OUT[2]" LOC = E11;
NET "COUNT_OUT[3]" LOC = F11;
NET "DIRECTION" LOC = L13;
#Created by Constraints Editor (xc3s500e-fg320-5) - 2013/03/18
NET "COUNT_EN" CLOCK_DEDICATED_ROUTE = FALSE;
I needed to change the last line because I was getting the error:
This will not allow the use of the fast path between the IO and the Clock...
After having this error gone, I programmed the device. But the output (leds) acted crazy. They sometimes stood still for a few seconds, sometimes just flashed very fast. I could not figure out where my mistake is. I would appreciate any help, some beginner tutorials are greatly appreciated (the links i found directed me to xilinx's documentations and they seemed quite complicated for a beginner).
From your description I understand that you are not looking for an Asynchronous Counter.
What you need is counter that counts on trigger from PushButton Switch. The below RTL should work:
If any difficulty in HDL coding let me know.
You don't have a clock. Once the COUNT_EN and DIRECTION conditions are satisfied, the count_int variable is going to increase as fast as it possibly can... in fact the timing of when individual bits change will probably make the entire thing completely unstable and incorrect.
You should always use a clock... just to allow the FPGA to get the timing right.
In this case, put the clock back... then add a new signal COUNT_EN_LAST. Save the old COUNT_EN each pass through the clocked process. Only increment when COUNT_EN = '1' and COUNT_EN_LAST = '0'.
In fact, you'll next find that you need to "debounce" the input. Physical buttons/switches "bounce" and give you multiple off-on events per single button press. For that, you'd simply make COUNT_EN_LAST a vector (say 5 long), shift new values into it each time ("COUNT_EN_LAST <= COUNT_EN_LAST(3 downto 0) & COUNT_EN;"), and only increment when COUNT_EN_LAST = "01111", or right before they're all 1's. The length of the vector you need will change depending on how fast your clock is and how long the switch can bounce before settling down to the new state.

Struggling with waiting for transfer completion with VHDL

I'm needing to do continual SPI communication to read values from a dual channel ADC I have, and have written a kinda state-machine to do so. However, it doesn't seem to be getting into the state that reads the second channel and I can't figure out why. here's the VHDL...
SPI_read: process (mclk)
--command bits: Start.Single.Ch.MSBF....
constant query_x: unsigned(ADC_datawidth-1 downto 0) := "11010000000000000"; -- Query ADC Ch0 ( inclinometer x-axis)
constant query_y: unsigned(ADC_datawidth-1 downto 0) := "11110000000000000"; -- Query ADC Ch1 ( inclinometer y-axis)
begin
if rising_edge(mclk) then
-- when SPI is not busy, change state and latch Rx data from last communication
if (SPI_busy = '0') then
case SPI_action is
when SETUP =>
SPI_pol <= '0'; -- Clk low when not active
SPI_pha <= 1; -- First edge is half an SCLK period after CS activated
SPI_action <= READ_X;
when READ_X =>
SPI_Tx_buf <= query_x; -- Load in command
y_data <= "00000" & SPI_Rx_buf(11 downto 1);
SPI_send <= '1';
SPI_action <= READ_Y;
when READ_Y =>
SPI_Tx_buf <= query_y; -- Load in command
x_data <= "00000" & SPI_Rx_buf(11 downto 1);
SPI_send <= '1';
SPI_action <= READ_X;
end case;
else
SPI_send <= '0'; -- Deassert send pin
end if;
end if;
end process SPI_read;
The command is sent to the Tx buffer, and the value from the last received data is written to a signal which is output to some seven segment displays. A pulse from SPI_send is required to start the transfer, and when started, SPI_busy is set high until the transfer is completed.
Right now it'll only send the query_x over SPI, and I can know this since I can see it on the scope. Interestingly, however, It's outputting the same value to both displays which leads me to think that it's still getting into it's READ_Y state, but not changing the Tx Data it's outputting.
I've been staring at this code for hours now, and I can't figure it out. Sometimes a fresh pair of eyes makes life easier, so if you spot anything please let me know.
Also, I'm very open to suggestions of better ways to deal with this, I'm just learning VHDL so I'm not even sure I'm doing things the right way mostly!
Summarizing my comments so far into an answer.
Simulate this process/module with your SPI master component.
Your approach is generally correct but I would suggest you re-architect your state machine slightly to put explicit wait states in between each spi transaction (wait_x, wait_y) and maybe have a more robust handshake between the modules, ie stay in read_x until busy goes high, then stay in wait_x until busy goes low.
It looks like send is getting asserted for two cycles and you are transition through both read_x and read_y each cycle.
Drawn from this program
http://wavedrom.googlecode.com/svn/trunk/editor.html
with this source:
{ "signal" : [
{ "name": "clk", "wave": "P........" },
{ "name": "busy", "wave": "0..1.|0..1"},
{ "name": "SPI_Action", "wave": "====.|.==.", "data": ["SETUP", "READ_X", "READ_Y", "READ_X", "READ_Y", "READ_X", "READ_Y", ] },
{ "name": "SPI_send", "wave": "0.1.0|.1.0", "data": ["0", "1", "Load", "Start","WaitA"] },
{ "name": "SPI_Tx_buf", "wave": "x.===|..==", "data": ["query_x","query_y","query_x","query_y","query_x"],},
]}
You are thinking along the right basic lines, but there are various things not-quite-right with your state machine - as the other answers say - and these are easy to discover in simulation.
For example
when READ_X =>
SPI_Tx_buf <= query_x; -- Load in command
y_data <= "00000" & SPI_Rx_buf(11 downto 1);
SPI_send <= '1';
SPI_action <= READ_Y;
Now if SPI_Busy stays low for a second cycle, this will clearly not stay in state READ_X but transition directly to READ_Y which is probably not what you want.
A more normal state machine would treat the states as outermost, and interpret the input signals differently for each state :
case SPI_Action is
when READ_X =>
if SPI_Busy = '0' then -- Wait here if busy
SPI_Tx_buf <= query_x; -- Load in command
y_data <= "00000" & SPI_Rx_buf(11 downto 1);
SPI_send <= '1';
SPI_action <= READING_X;
end if;
when READING_X =>
if SPI_Busy = '1' then -- command accepted
SPI_action <= READ_Y; -- ready to send next one
end if;
when READ_Y =>
if SPI_Busy = '0' then -- Wait here if busy
You can see that this version treats the Busy signal as a handshake, and only progresses when Busy changes state. I am certain you can make your approach (IF outermost) work if you want, but you will have tofigure out how to apply the handshaking principle yourself.
There is also no "Idle" state where neither X or Y is being read; this SM will read X and Y alternately as fast as it can. Commonly you would read them both, then return to Idle until some other Start signal commanded you to leave Idle and perform a fresh set of reads.
You can possibly also make the state machine more robust with a "when others" clause. It's not a must, if you guarantee to cover all your defined states, but it can make maintenance easier. On the other hand, without such a clause, the compilers will let you know of any uncovered states, guarding against mistakes.
There is a myth that a "when others" clause is essential and synthesis tools generate safer but less optimal state machines from a "when others" clause. However there are synthesis attributes or command line options to control how synth tools generate state machines.
I have to agree with the issues raised by David and add a few notes.
There are a few things that are not very good with your code, first of all, you must have a "when others =>" in the list of your states.
None of your signals seems to have a default value except the SPI_send. It also not a good idea to put the state machine inside the if statement. And if you have to do it, then you must set all the signals in both cases or else you end up with latches instead of flip-flops. One easy way to do this is to set all the signals at the begging of your code to their default value and then change them when needed.
This way the synthesis tool knows how to handle them and you get correct.
Well, this is from a VHDL-for-synthesis document from Siemence:
If a signal or a variable is not assigned a value in all possible
*branches of an if statement, a latch is inferred*. If the intention is
not to infer a latch, then the signal or variable must be assigned a
value explicitly in all branches of the statement.
You can find the guide in PDF format at: http://web.ewu.edu/groups/technology/Claudio/ee360/Lectures/vhdl-for-synthesis.pdf

Resources