Continuous assignment seemingly not working - vhdl

I'm working on a FIR filter, specifically the delay line. x_delayed is initialized to all zeros.
type slv32_array is array(natural range <>) of std_logic_vector(31 downto 0);
...
signal x_delayed : slv32_array(0 to NTAPS-1) := (others => (others => '0'));
This does not work:
x_delayed(0) <= x; -- Continuous assignment
DELAYS : process(samp_clk)
begin
if rising_edge(samp_clk) then
for i in 1 to NTAPS-1 loop
x_delayed(i) <= x_delayed(i-1);
end loop;
end if; -- rising_edge(samp_clk)
end process;
But this does:
DELAYS : process(samp_clk)
begin
if rising_edge(samp_clk) then
x_delayed(0) <= x; -- Registering input
for i in 1 to NTAPS-1 loop
x_delayed(i) <= x_delayed(i-1);
end loop;
end if; -- rising_edge(samp_clk)
end process;
The problem with this "solution" is that the first element in x_delayed is delayed by one sample, which it should not be. (The rest of the code expects x_delayed(0) to be the current sample).
I'm using Xilinx ISE 13.2, simulating with ISim, but this was also confirmed simulating with ModelSim.
What gives?
Edit:
The problem was essentially that, even though x_delayed(0) didn't appear to be driven inside the process, it was.
After implementing Brian Drummond's idea it works perfectly:
x_delayed(0) <= x;
-- Synchronous delay cycles.
DELAYS : process(samp_clk)
begin
-- Disable the clocked driver, allowing the continuous driver above to function correctly.
-- https://stackoverflow.com/questions/18247955/#comment26779546_18248941
x_delayed(0) <= (others => 'Z');
if rising_edge(samp_clk) then
for i in 1 to NTAPS-1 loop
x_delayed(i) <= x_delayed(i-1);
end loop;
end if; -- rising_edge(samp_clk)
end process;
Edit 2:
I took OllieB's suggestion for getting rid of the for loop. I had to change it, since my x_delayed is indexed from (0 to NTAPS-1), but we end up with this nice looking little process:
x_delayed(0) <= x;
DELAYS : process(samp_clk)
begin
x_delayed(0) <= (others => 'Z');
if rising_edge(samp_clk) then
x_delayed(1 to x_delayed'high) <= x_delayed(0 to x_delayed'high-1);
end if; -- rising_edge(samp_clk)
end process;
Edit 3:
Following OllieB's next suggestion, it turns out the x_delayed(0) <= (others => 'Z') was unnecessary, following his previous change. The following works just fine:
x_delayed(0) <= x;
DELAYS : process(samp_clk)
begin
if rising_edge(samp_clk) then
x_delayed(1 to x_delayed'high) <= x_delayed(0 to x_delayed'high-1);
end if;
end process;

In the first case, the x_delayed(0) actually has two drivers, out outside the
process, being x_delayed(0) <= x, and an implicit one inside the DELAY
process.
The driver inside the process is a consequence of a VHDL standard concept
called "longest static prefix", described in VHDL-2002 standard (IEEE Std
1076-2002) section "6.1 Names", and the loop construction with a loop variable
i, whereby the longest static prefix for x_delayed(i) is x_delayed.
The VHDL standard then further describes drives for processes in section
"12.6.1 Drivers", which says "... There is a single driver for a given scalar
signal S in a process statement, provided that there is at least one signal
assignment statement in that process statement and that the longest static
prefix of the target signal of that signal assignment statement denotes S ...".
So as a (probably surprising) consequence the x_delayed(0) has a driver in
the DELAY process, which drives all std_logic elements to 'U' since unassigned,
whereby the std_logic resolution function causes the resulting value to be 'U',
no matter what value is driven by the external x_delayed(0) <= x.
But in the case of your code, there seems to be more to it, since there actually are some "0" values in the simulation output for x_delayed(0), for what I can see from the figures. However, it is hard to dig further into this when I do not have the entire code.
One way to see that the loop is the reason, is to manually roll out the loop by
replacing the for ... loop with:
x_delayed(1) <= x_delayed(1-1);
x_delayed(2) <= x_delayed(2-1);
...
x_delayed(NTAPS) <= x_delayed(NTAPS-1);
This is of course not a usable solution for configurable modules with NTAPS as
a generic, but it may be interesting to see that the operation then is as
intuitively expected.
EDIT: Multiple solutions are listed in "edit" sections after the question above, based on comments. A solution with variable, which allows for complex expressions if required, is shown below. If complex expression is not required, then as per OllieB's suggestion it is possible to reduce the assign to x_delayed(1 to x_delayed_dir'high) <= x_delayed(0 to x_delayed_dir'high-1):
x_delayed(0) <= x;
DELAYS : process(samp_clk)
variable x_delayed_v : slv32_array(1 to NTAPS-1);
begin
if rising_edge(samp_clk) then
for i in 1 to NTAPS-1 loop
x_delayed_v(i) := x_delayed(i-1); -- More complex operations are also possible
end loop;
x_delayed(1 to x_delayed_dir'high) <= x_delayed_v;
end if; -- rising_edge(samp_clk)
end process;

During elaboration, drivers are created for all elements in x_delayed, regardless of the range of loop iterator. Hence, x_delayed(0) has two drivers associated with it. Std_Logic and Std_Logic_Vector are resoved types(i.e., when multiple drivers are associated with the signal with these types, the resolved function will determine the value of the signal by looking up a table in std package. Please refer to VHDL Coding Styles and Methodologies for more details.

the reason you have a problem is that the logic thinks you have two things assigning into the same signal simultaneously - both the continues assignment and the register assignment loop.
keep with the register implementation.
edit
if you have modelsim, you can use the 'trace x' option and see where it comes from.
might be that the other simulator also have this feature, but for modelsim i'm certain it works

In you not working example
x_delayed(0) <= x;
is aquvalent to
process(x)
begin
x_delayed(0) <= x;
end process;
So the process will assign x_delayed(0) only when x changes. Because this is a signal asignment the x_delayed(0) will not change immediatly, it will change after a delta cycle. Therefore, when process DELAYS is called assignment for x_delayed(0) is not happened yet!
Use a variable for x_delayed in your process, if you could.
x_delayed(0) := x;

Related

VHDL record assignment through for loop

I have a for loop in process, which works fine with std_logic arrays, but not with record arrays. I use Xilinx ISE along with ISIM and the code is vhdl-93. The target will be a Spartan 3.
Here is the record definition:
TYPE spi_rx_t IS RECORD
CS : std_logic;
MOSI : std_logic;
CLK : std_logic;
END RECORD;
constant SYNC_LATCHES : integer := 2;
Here is the array definition and declaration:
type spi_rx_array_t is array (0 to SYNC_LATCHES) of spi_rx_t;
signal spi_in_array : spi_rx_array_t;
Below is the process:
spi_in_array(0).MOSI <= SPI_MOSI;
spi_in_array(0).CLK <= SPI_CLK;
spi_in_array(0).CS <= SPI_CS;
sync_p: process (clk_100)
begin
if rising_edge(clk_100) then
-- for I in 1 to SYNC_LATCHES loop
-- spi_in_array(I) <= spi_in_array(I - 1);
-- end loop;
spi_in_array(1) <= spi_in_array(0);
spi_in_array(2) <= spi_in_array(1);
end if;
end process;
The 2 lines below the commented code works exactly as expected (allowing me to synchronize external signals to clk_100), but I'd rather implement them as a for loop (such the commented one).
However, these commented lines does not produce the same result in my ISIM test bench (spi_in_array stays in unknown state when using the for loop). Why?
Please kindly help me with this.
As commented by Morten Zilmer, this is due to the VHDL concept "longest static prefix". This SO answer is similar to my issue.
In my case, the simplest way to resolve the issue was to move the assignment of the first element of the array into the same process as the for loop. I also had to decrease SYNC_LATCHES constant from 2 to 1, because spi_in_array(0) is now latched with clk_100.
sync_p: process (clk_100)
begin
if rising_edge(clk_100) then
spi_in_array(0).MOSI <= SPI_MOSI;
spi_in_array(0).CLK <= SPI_CLK;
spi_in_array(0).CS <= SPI_CS;
for I in 1 to SYNC_LATCHES-1 loop
spi_in_array(I) <= spi_in_array(I - 1);
end loop;
end if;
end process;

VHDL: Assigning one std_logic_vector to another makes '1' turn to 'X'

I have a baffling problem.. As part of a buffering process I am assigning one std_logic_vector to another, by simply doing:
dataRegister <= dataRegisterBuf;
The process is synced to a clock. See here for the full process:
--! This process buffers the data register synced to sclk when state is state_bufferingToSclk and sets registerReady when done
SclkDomainBuffering: process(sclk)
variable step: natural := 0;
begin
if (rising_edge(sclk)) then
if (state = state_bufferingToSclk) then
if (step = 0) then
dataRegister <= dataRegisterBuf;
step := 1;
elsif (step = 1) then
registerReady <= '1';
step := 2;
end if;
else
step := 0;
registerReady <= '0';
end if;
end if;
end process SclkDomainBuffering;
The problem is, when simulating this in Modelsim, dataRegister does not take the value of dataRegisterBuf, instead every '1' in the vector becomes 'X'. So for example if dataRegisterBuf is "00010", dataRegister becomes "000X0". I can't for the life of me figure out why. Here is a simulation showing it happening: http://i.imgur.com/znFgqKl.png
I have stepped through the entire code and I can't see anything out of the ordinary. At the time it happens, line 84 in the code above does indeed execute, and that is the only statement that is executed that has anything to do with the two registers in question as far as I can tell.
Here's a Minimal Complete and Verifiable example created from your question and comments:
library ieee;
use ieee.std_logic_1164.all;
entity baffling_problem is
end entity;
architecture foo of baffling_problem is
type state_type is (state_bufferingToClk, state_bufferingToSclk);
signal state: state_type; -- defaults to 'LEFT, state_bufferingToClk
signal dataRegisterBuf: std_logic_vector (31 downto 0) :=
(1 | 2 => '1', others => '0');
signal dataRegister: std_logic_vector (31 downto 0) := (others => '0');
signal registerReady: std_logic;
signal sclk: std_logic := '1';
begin
SclkDomainBuffering: process(sclk)
variable step: natural := 0;
begin
if (rising_edge(sclk)) then
if (state = state_bufferingToSclk) then
if (step = 0) then
dataRegister <= dataRegisterBuf;
step := 1;
elsif (step = 1) then
registerReady <= '1';
step := 2;
end if;
else
step := 0;
registerReady <= '0';
end if;
end if;
end process SclkDomainBuffering;
SOMEOTHERPROCESS:
process (state)
begin
if state = state_type'LEFT then -- other than state_bufferingToSclk
dataRegister <= (others => '0');
end if;
end process;
STIMULI:
process
begin
wait for 20 ns;
sclk <= '0';
wait for 5 ns;
sclk <= '1';
wait for 0 ns; -- state transitions in distinct delta cycle
state <= state_bufferingToSclk;
wait for 20 ns;
sclk <= '0';
wait for 5 ns;
sclk <= '1';
wait for 20 ns;
wait;
end process;
end architecture;
And this gives the behavior your describe:
See IEEE Std 1076-2008 14.7.3 Propagation of signal values, 14.7.3.1 General:
As simulation time advances, the transactions in the projected output waveform of a given driver (see 14.7.2) will each, in succession, become the value of the driver. When a driver acquires a new value in this way or as a result of a force or deposit scheduled for the driver, regardless of whether the new value is different from the previous value, that driver is said to be active during that simulation cycle. For the purposes of defining driver activity, a driver acquiring a value from a null transaction is assumed to have acquired a new value. A signal is said to be active during a given simulation cycle if
— One of its sources is active.
— One of its subelements is active.
— The signal is named in the formal part of an association element in a port association list and the corresponding actual is active.
— The signal is a subelement of a resolved signal and the resolved signal is active.
— A force, a deposit, or a release is scheduled for the signal.
— The signal is a subelement of another signal for which a force or a deposit is scheduled.
So the signals (dataReady(1) and dataReady(2) are active their sources is active.
An explanation of why their values are the resolved value of their drivers is found in 14.7.3.2 Driving values, none of the signals comprising dataReady are basic signals, see paragraph 3 f).
And why you see the value of dataReady as "00000000000000000000000000000XX0" is described in 14.7.3.3 Effective values.
The VHDL language describes how an elaborated design model is simulated as well as describing the syntax and semantics. An elaborated design model consists of processes described in a hierarchy interconnected by signals, and signals have history not just value. Signal updates are scheduled in projected output waveforms (see 10.5 Signal assignment statement).
A lot of users just starting out in VHDL apply what they know of the behavior of other languages to VHDL, an example is the superfluous (but not forbidden) parentheses surrounding a condition in an if statement. Knowledge of other languages doesn't address signal behaviors (determined by the architecture of simulation models driven by simulation cycles.
One of the things you'll note is that processes (11.3) suspend and resume based on explicit or implicit wait statements (10.2).
All concurrent statements are elaborated into processes and or processes and block statements (11. Concurrent statements).
Subprogram calls are either expressions (functions, 9.3.4) or statements (procedures, 10.7).
No signal value is updated while any process that is scheduled to be active (those projected output waveforms matching the current simulation time, 14.7.4 Model execution, 14.7.3.4 Signal update).
Signals driven in multiple processes represent multiple collections of hardware. The problem shows up because you've used resolved data types, if you had used unresolved data types you would have gotten an elaboration error instead (6.4.2.3 Signal declarations, paragraph 8). Resolved signals are allowed to have multiple drivers.
The resolution table for std_logic elements is found in the package body for package std_logic_1164(See footnote 15 Annex A Description of accompanying files for access to the source of VHDL packages included with the standard). The resolution table will resolve a '0' and a '1' to an 'X'.
And if all this sounds complex you can learn simple rules of thumb to prevent problems.
In this case a rule of thumb would be to always drive a signal from a single process.
As people in the comments said, the problem was that another process was driving the same data register. I did not understand that even though that other process only changed the value of the register in a different state, it would still drive the signal during every other state. I fixed the problem by moving everything related to that register into a single process.

VHDL: button debounce inside a Mealy State Machine

Hi I'm trying to implement a mealy machine using VHDL, but I'll need to debounce the button press. My problem is I'm not sure where should I implement the debouncing. My current work is like this:
process(clk)
begin
if(clk' event and clk = '1') then
if rst = '1' then
curr_state <= state0;
else
curr_state <= next_state;
end if;
end if;
end process;
process(curr_state, op1,op0,rst) --here op1,op0 and rst are all physical buttons and I need to debounce op1 and op0
begin
if rst = '1' then
...some implementation
else
...implement the debounce logic first
...process some input
case curr_state is
when state0=>...implementation
...similar stuff
end case;
end process;
I'm not sure whether I'm doing in the right way or not. In the second process, should I put the rst processing like this, or should I put it inside when state0 block? Also, as the processing of debounce requires counting, do I put it outside the case block like this? Thank you!
I would use a completely separate block of code to debounce any button signals, allowing your state machine process to focus on just the state machine, without having to worry about anything else.
You could use a process like this to debounce the input. You could of course exchange variables for signals in this example (with associated assignment operator replacements).
process (clk)
constant DEBOUNCE_CLK_PERIODS : integer := 256; -- Or whatever provides enough debouncing
variable next_button_state : std_logic := '0'; -- Or whatever your 'unpressed' state is
variable debounce_count : integer range 0 to DEBOUNCE_CLK_PERIODS-1 := 0;
begin
if (rising_edge(clk)) then
if (bouncy_button_in /= next_button_state) then
next_button_state := bouncy_button_in;
debounce_count := 0;
else
if (debounce_count /= DEBOUNCE_CLK_PERIODS-1) then
debounce_count := debounce_count + 1;
else
debounced_button_out <= next_button_state;
end if;
end if;
end if;
end process;
Another option would be to sample the bouncy_button_in at a slow rate:
process (clk)
constant DEBOUNCE_CLK_DIVIDER : integer := 256;
variable debounce_count : integer range 0 to DEBOUNCE_CLK_DIVIDER-1 := 0;
begin
if (rising_edge(clk)) then
if (debounce_count /= DEBOUNCE_CLK_DIVIDER-1) then
debounce_count := debounce_count + 1;
else
debounce_count := 0;
debounced_button_out <= bouncy_button_in;
end if;
end if;
end process;
The advantage of the first method is that it will reject glitches in the input. In either case, you would use the debounced_button_out (or whatever you want to call it, perhaps rst) in your state machine, whose code then contains only the core state machine functionality.
If you wanted even more debouncing, you could use another counter to create an enable signal for the processes above, to effectively divide down the clock rate. This could be better than setting the division constant to a very high number, because you may not be able to meet timing if the counter gets beyond a certain size.
You could even create a debounce entity in a separate file, which could be instantiated for each button. It could have a generic for the constant in the above process.
There's also hardware debouncing, but I suppose that's outside the scope of this question.
In the second process, should I put the rst processing like this, or
should I put it inside when state0 block?
Only put it in the State0 block
Also, as the processing of
debounce requires counting, do I put it outside the case block like
this?
Counting needs to be done in a clocked process. Since you are doing a two process statemachine, you cannot do it in the case block. I typically put these sort of resources in a separate clocked process anyway.
For states, you need: IS_0, TO_1, IS_1, TO_0.
The TO_1 and TO_0 are your transition states. I transition from TO_1 to IS_1 when I see a 1 for 16 ms. I transition from TO_0 to IS_0 when I see a 0 for 16 ms. Run your counter when you are in the TO_1 or TO_0 state. Clear your counter when you are in the IS_1 or IS_0 state.
This should get you stated.

Procedure call in loop with non-static signal name

In some testbench code I use a procedure to do something with a signal. I then use this procedure multiple times in sequence on different signals. This works fine as long as I explicitly define the signal; as soon as I index signals in a loop it fails with
(vcom-1450) Actual (indexed name) for formal "s" is not a static signal name.
Why is this not possible and how can I work around it?
Probably I could move this to a for ... generate, but then I want do_something to be called in a nicely defined sequence.
library ieee;
use ieee.std_logic_1164.all;
entity test is
end test;
architecture tb of test is
signal foo : std_logic_vector(1 downto 0);
begin
dummy: process is
procedure do_something (
signal s : out std_logic
) is begin
s <= '1';
report "tic";
wait for 1 ns;
-- actually we would do something more interesting here
s <= '0';
report "toc";
end procedure;
begin
-- This works well, but requires manual loop-unrolling
do_something(foo(0));
do_something(foo(1));
-- This should do the same
for i in foo'range loop
-- This is the offending line:
do_something(foo(i));
end loop;
wait; -- for ever
end process dummy;
end architecture tb;
I'm using ModelSim 10.4 PE.
Interestingly, if foo is a variable local to the process, (and s is adjusted to suit) ghdl compiles this. Which highlights the problem in the original version. The "for" loop is required to drive the whole of foo all the time because you can't make signal drivers appear or disappear at will - it can't be ambivalent about which bits it's driving, (and as you can see, the procedure tries to drive different bits at different times).
So if you can readjust your application to allow variable update semantics, and make foo a variable local to the process, that will work. (You would have to copy its value to a signal before every "wait" if you wanted to see the effect!)
Alternatively, pass the entire foo signal and the index to the subprogram, so that the latter always drives all of foo as follows...
(I've also added the missing bits and fixed the spurious concurrent "wait" : in future, PLEASE check your code example actually compiles before posting!)
library ieee;
use ieee.std_logic_1164.all;
entity test is
end test;
architecture tb of test is
signal foo : std_logic_vector(1 downto 0);
begin
dummy: process is
procedure do_something (
signal s : out std_logic_vector(1 downto 0);
constant i : in natural
) is begin
s <= (others => '0');
s(i) <= '1';
report "tic";
wait for 1 ns;
-- actually we would do something more interesting here
s(i) <= '0';
report "toc";
end procedure;
begin
-- This works well, but requires manual loop-unrolling
do_something(foo,0);
do_something(foo,1);
-- This should do the same
for i in foo'range loop
-- This is the offending line:
do_something(foo,i);
end loop;
wait; -- for ever
end process dummy;
end architecture tb;
I share your feelings about this being a silly limitation of the language. Minus the wait and report statements your example certainly has a valid hardware implementation, let alone well defined simulation behavior.
I think this situation can be avoided in most cases. For example, in your simple example you could just copy the contents of the procedure into the process body, or pass the whole vector as Brian proposed. If you really need to do it, this is one workaround:
architecture tb of test is
signal foo : std_logic_vector(1 downto 0);
signal t : std_logic;
signal p : integer := 0;
begin
foo(p) <= t;
dummy: process is
procedure do_something (
signal s : out std_logic
) is begin
s <= '1';
wait for 1 ns;
s <= '0';
end procedure;
begin
for i in foo'range loop
p <= idx;
do_something(t);
wait for 0 ns;
end loop;
wait;
end process dummy;
end architecture tb;
This only works in simulation and will result in one delta cycle delay per iteration, compared to unrolling the loop which finishes in zero time when the procedure contains no wait statements.

Breaking out of a procedure in VHDL

I am trying to figure out a way to break out of a procedure if some external event occurs. Let's say I have a procedure like this:
procedure p_RECEIVE_DATA (
o_data : out std_logic) is
begin
wait until rising_edge(i2c_clock);
o_data := i2c_data;
wait until falling_edge(i2c_clock);
end procedure P_RECEIVE_DATA;
Now what I want is if an external signal, let's call it r_STOP gets asserted at any time, I want this procedure to exit immediately. Is there a nice way to do this? I was thinking that if this Verilog I could use fork/join_any to accomplish this, but there is no equivalent to fork and join in VHDL. Does anyone have any suggestions?
First of all, the code you have here might be just fine for a test or simulation. If this is what it's for, then great. However, keep in mind that code written as you have above is not synthesizable. You can compile and run it in a simulation setup, but you almost certainly won't be able to turn this into a hardware design for an FPGA, ASIC, or any other type of physical device. (In general, procedures can be used in synthesis only when they are called in a process and have no wait statements (or, less commonly, and only in some tools, when all of the wait statements are exactly the same).)
So to answer exactly what you've asked, the way to break out of a procedure is to call return when the condition you are interested in is met. For example if you wanted a global "r_stop" signal as you suggested make this procedure exit early no matter what whenever it changed to a '1', then you'd look for that explicitly:
procedure p_RECEIVE_DATA (
o_data : out std_logic) is
begin
wait until rising_edge(i2c_clock) or r_stop = '1';
if r_stop = '1' then return; end if;
o_data := i2c_data;
wait until falling_edge(i2c_clock) or r_stop = '1';
if r_stop = '1' then return; end if;
end procedure P_RECEIVE_DATA;
Again, if this is not testbench code, but is meant to be synthesizable, you need to take a different approach and model your logic as an explicit finite state machine.
I'm not sure there's a nice solution to this in VHDL. It's a bit like making an inferred state machine with a reset signal, which is also not pleasant.
You can do:
wait until rising_edge(clk); exit when reset = 1;
within a loop.
I guess you could do:
wait until rising_edge(clk); if stop = '1' then return; end if;
but again, not pleasant!
Here my 2 cents.
You can do this in your main thread/process by calling trigger control signals for those other threads/processes.
Depending on the state of those control signals you can wait for any thread(join_any), all threads(join) or just do not wait (join_none).
Driving signals from multiple processes is a bad idea unless you really know what you're doing (multiple driver problem). Therefore the activation and deactivation signal for each thread should be different since they are controlled from different processes/drivers. That is the reason why i have written 2 control signals 1.started and 2.finished for each thread.
It is very important that a signal/interface is only written in one process in your code.
The code use waits so the same problems stated by wjl are applicable for synthesis.
If you want it synthetizable then put thread_0_active in the sensitivity list and do if rising_edge(thread_0_active) then inside the process. This will encapsulate the code and will be executed only in case of the rising condition. Of course the code inside this process should be synthetizable and contain no waits.
A state machine can only run and be in one state at a time.
I think you are interested on a direct equivalente to the systemverilog fork behavior. I have tried to make the example as close as possible.
The code below:
signal thread_0_started : std_logic:='0';
signal thread_0_finished : std_logic:='0';
signal thread_1_started : std_logic:='0';
signal thread_1_finished : std_logic:='0';
--......
p_main_father : process
variable father_v : std_logic_vector( 32-1 downto 0 );
begin
--Do other things in your main thread using variables
--
--Now fork a thread activating it.
thread_0_started <= '1';
thread_1_started <= '1';
--signals are activated and now we need to wait when they are finished
--here both threads are started at the same time concurrently.
----------------------
--a fork .. join_any would be to do an "or" until the logic
wait until (thread_0_finished = '1') or (thread_1_finished = '1');
-------------------
--a fork .. join would be to do an "and" with the logic
--wait until (thread_0_finished = '0') and(thread_1_finished = '0');
--a fork .. join_none would be to just NOT do "wait until" and ignore that the threads were activated!!!
thread_0_started <= '0';--restore started state in case this main thread want to start them again.
thread_1_started <= '0';
end process;
child_thread_0 : process
variable prdata_v : std_logic_vector( 32-1 downto 0 );
begin
if thread_0_started = '0' then
thread_0_finished <= '0';--important disable finished if not started
end if;
wait until rising_edge(thread_0_started);--trigger event
--Do your things inside de thread_0 that consume time. e.g. calling other process
thread_0_finished <= '1';
end process;
child_thread_1 : process
variable prdata_v : std_logic_vector( 32-1 downto 0 );
begin
if thread_1_started = '0' then
thread_1_finished <= '0';--important disable finished if not started
end if;
wait until rising_edge(thread_1_started);
--Do your things inside de thread_1 that consume time. e.g. calling other process
thread_1_finished <= '1';
end process;

Resources